Hey folks (and Joey), I am trying to understand the performance impact of changes in v6 -> v7 -> v8 mode. Apologies since I haven't kept up with the changes (was using older version for quite a bit) and some of these might already be well documented/known.
Essentially, back in v6 and earlier, I was pretty happy with the design idea the git annex doesn't use smudge/clean filters since their performance is far from ideal. However, I see that in newer versions of repos, this has become more of a thing. I have read a few docs (https://git-annex.branchable.com/todo/git_smudge_clean_interface_suboptiomal/, https://git-annex.branchable.com/todo/only_pass_unlocked_files_through_the_clean47smudge_filter/) but there's still a few thing I don't understand.
1) Are smudge/clean filter used all the time now? Does this mean that we are taking a performance hit compared to older git annex versions? 2) Can someone explain when smudge/clean filters get used? Is it only in repos that use unlock/adjust? I don't use either of them, and would love to know if these are being used unnecessarily.
Thanks in advance!
There is no change at all for performance when annexed files are symlinks.
If the annexed file is unlocked, it will use smudge/clean filters.
If you have non-annexed files in the git repo, git will also run those through the smudge/clean filters.
git calls the clean filter when it sees a file has been modified. Things like git status and git add. It caches the result until the file is modified again. The smudge filter is only called when git is checking out a file (git checkout or a change from a git pull etc).
Thanks for the super quick response Joey.
That then sounds like
git add
andgit annex add
do get a performance impact (since they run smudge/clean which is not mandatory, or is no-op) even if I don't use unlock or adjust. Is that right? Is there a repo setting that will make it not use them?My understanding is that git pre/post commit hooks are the ones that are useful for standard symlink approach and dont need smudge/clean at all. Please correct me if I am wrong.
Also, just want to say I appreciate all the effort that you do for this. I have tons of data managed with this and should probably write up that story of how I manage my files.
git annex add
does not result in the smudge filter being run.You could configure git not to run the filters ever (see git docs) but then git-annex features involving unlocked files will of course break.
Could I suggest that, in general, you wait until you have actually seen git-annex behave badly before worrying about it? (This is in the context of also having seen some other comments you made elsewhere.) That also lets you file concrete bug reports, rather than posting worries that may be ungrounded.