I keep a repo synced between machines over ssh. Assuming all the files are in sync, so no actual file transfer needs to takes place, when I do
git annex copy --fast --quiet --to blah
is quite slow, about 10 seconds, using 100% CPU on one core, just to decide nothing needs to be done. On the other hand, doing
git annex copy --fast --quiet --from blah
takes about 1 second.
I'm confused, as it seems to me that, since I'm using --fast, both transactions should use only locally available data, and both should need about the same amount of computing. Am I missing something? Can this be fixed?
How many files are in the directory tree you're copying?
copy --fast --to
does indeed avoid the check to see if the remote already has the file before copying it.However, it still needs to look in the location log to see which files are already present on the remote. Whereas
copy --from
can do a single stat of the file on disk to see if it's present in the local repo. Location log lookups are about as fast as I can make them, but they still require requesting info from out of the git repository. If you have a lot of files this otherwise minor difference in speed can stack up..That example I gave: 10 sec vs 1 sec is on a repository of pictures with about 6200 files on a SSD.
Oh, I think I understand the source of the asymmetry, now! So,
git annex copy --to
queries the location log file by file? I've tested agit grep
on the git-annex branch as followsand seems to be quite fast, less than a second on my test repo. Could git annex make use of this to speed up bulk queries to the location log?