to work around original filename on s3, i need to get the key from a file, and i'm not within the git-annex process. i know there's git annex lookupkey $FILE
, but that incurs significant overhead because the whole git annex runtime needs to fire up. in my tests, this takes around 25ms on average.
could i optimise this by simply doing a readlink
call on the git checkout? it sure looks like readlink | basename
is all I really need, and that can probably be done below 10ms (4ms in my tests). how reliable are those links anyways, and is that what lookupkey does?
similarly, i wonder if it's safe to bypass git-annex and talk straight with git to extract location tracking? i can jump from 90ms to below 10ms for such requests if I turn git annex find <file>
into the convoluted:
git annex lookupkey $file printf $key | md5sum git cat-file -p refs/heads/git-annex:$hash/${key}.log
thanks. --anarcat
Yes, that's the same, except lookupkey only operates on files that are checked into git.
(Also, lookupkey will work in a direct mode repo, while such a repo may not have a symlink to examine.)
25ms doesn't seem bad for a "whole runtime" to fire up. I think most of the overhead probably involves reading the git config and running git-ls-files.
Note that lookupkey can be passed a whole set of files, so you could avoid the startup overhead that way too.
And yes, it's fine to bypass git-annex when querying git.
Or even when manipulating the git-annex branch, so long as you either delete or update .git/annex/index. git-annex is not intended to be magical, see internals.
great, thanks for the feedback!
i agree that 25ms is quite fast to fire up a 52MB binary. i am just saying that if this is going to end up as part of building a webpage, i need something faster, or cache the results somewhere.
duly noted for the other points, thanks again. i see a great intimate relationship building between
git cat-file
and me.