Please describe the problem.
I need to discover a potentially existing annex-key for any file in the worktree of a Git repository. I assumed that a batch-mode lookupkey
would be faster than a find --json --batch
, but that is not the case.
My test repository has 36k files.
❯ time git ls-files | git annex lookupkey --batch > /dev/null
git ls-files 0.01s user 0.01s system 0% cpu 2:42.37 total
git annex lookupkey --batch > /dev/null 91.00s user 73.39s system 98% cpu 2:46.25 total
❯ time git ls-files | git annex find --anything --json --batch > /dev/null
git ls-files 0.01s user 0.00s system 0% cpu 4.093 total
git annex find --anything --json --batch > /dev/null 3.20s user 2.02s system 124% cpu 4.195 total
lookupkey
appears to be many times slower than find
, although the latter reports much more information.
This surprised me, hence I am reporting it here as a potential bug.
What version of git-annex are you using? On what operating system?
git-annex version: 10.20230126
This is due to lookupkey passing each filename through
git ls-files
in order to support absolute filepaths input. See cfdfe4df6c8b3fe46bbc7afcc8274237a5b2ce2aMade it only do that for absolute paths, which should make it at least marginally faster than git-annex find. I would not expect it to be much faster though, because git-annex find displaying a little more information takes negligible CPU time really.