A couple of times I have seen a git-annex sync -C . upload files to the remote that are not part of the remote's preferred content. In the most recent case, I had moved the file to another directory while the sync was downloading a previous file. I suspect that the file being removed causes preferred content checks to mess up. --Joey

Reproduced reliably as follows: Have a bigfile in the remote and a smallfile in the local repo. Have the remote's preferred content be "not (copies=1)". Have the local repo's preferred content include=*. Run git-annex sync -C. while that's running, git rm smallfile. (bigfile has to be big enough to give time to run that command)

smallfile gets sent to the remote unexpectedly. If it's not deleted first, that does not happen.


Hmm, so limitCopies uses checkKey, which for MatchingFile, uses lookupKey. And with a deleted file, lookupKey falls into a case where it uses catKeyFile, but since the file has been removed from the index, that also fails. And when it fails, that means it assumes it does not have 1 copy, and so the "not (copies=1)" evalulates to true, so it thinks it's matched as preferred content.

The preferred content is being checked via wantSend, which already knows the key in this case.

It knows the key already because sync uses seekFilteredKeys and so it's already streamed the file though and looked up the key before it's deleted. If the file got deleted before that could look up the key, it would skip it. It may be that recent changes to add this streaming for performance led to this bug.

So one fix might be to change it to use MatchingKey, and so avoid the later lookup? Investigating the git history and the code I see no reason not to do this. It didn't used to be that MatchingKey included an AssociatedFile, which is probably why it was not used in this case originally.