Like --all was sped up 2x-16x in d010ab04be5a8d74fe85a2fa27a853784d1f9009, commands that ls-files the worktree may be sped up by using cat-file --buffer to get location logs (and maybe other logs) more efficiently, and precache them.

The precachelog branch adds location log precaching for git annex get only. But it benchmarks 4x slower. (Even if it were faster, it would have needed more work, because limits are matched before location log precaching, so if any limit like --in is used that uses the location log, it will actually be read twice.) This is a surprising result, and I don't understand why it's slower, but backburnered this optimisation for now.

Hmm, but git annex get does not need location log access when the object is already present. So that was the wrong thing to benchmark this with. copy --to would be a better benchmark. To speed up get, it would need to check inAnnex before caching the location log.

Benchmarking copy --fast --to, precaching location logs did yield a 30% speedup, in a 10k repo where all objects were present. But, in a 10k repo where no objects were present, it was over 14x slower, again because the inAnnex check really needs to come before the location log precaching.

So, this needs some more work, but is promising.

Second try at this, results:

  • get in a full repo is not any slower. And presumably in an empty repo, get is faster, but I didn't try it and the transfers will dominate that anyway
  • sync --content 2x speedup!
  • fsck --fast 1.5x speedup
  • whereis 1.5x speedup
  • copy --to 2x speedup when remote has all files

done

Another thing that the same cat-file --buffer approach could be used with is to cat the annex links. Git.LsFiles.inRepoDetails provides the Sha of file contents, which can be fed through cat-file --buffer to get keys. A complication is that, non-symlinks could be large files that are not annexed but in git; don't want to cat those when looking for annex links. That would probably need pre-filtering through a cat-file --buffer that only gets the size of the blob, not its content.

This was a win! Nearly 2x faster git-annex get seeking.

Some calls to lookupKey remain, and the above could be used to remove them and make it faster. The ones in Annex.View and Command.Unused seem most likely to be able to be converted.

See also faster key lookup for limits