I have some v5 indirect annexes with important data, one of which has hundreds of thousands of files, and I'm concerned about whether upgrading to v7 will result in data loss.
What exactly does the upgrade to v7 do? What is the likelihood of errors or losing data in any way? Will upgrading my biggest annex take a long time?
In detail, this is exactly what is entailed by the upgrade process of a repository that is not in direct mode:
[filter "annex"]
section to .git/config and installing a .git/info/attributes file to contain "filter=annex" (or modifying it if the repo already has one).git annex smudge --update
. (If you happened to already have installed something in those hooks, it will not modify them and will display a warning instead.)Notice that this process does not touch the work tree at all, or the annex objects, so even if it somehow completely exploded, you cannot possibly lose data. It is entirely reversable by undoing the git config changes I listed.
And it does not break interoperation with other clones of the repository that still use v5. So if you have qualms, my advice would be to make a clone and try it out for yourself and see. You can prevent accidential upgrade of any repos by
git config --global annex.autoupgraderepository false
Just for completeness, here's what upgrading a direct mode repository to v7 entails:
git annex adjust --unlock
The only potentially expensive part is the scan for unlocked files. That involves running
git ls-tree -r HEAD
, so will scale with the number of files in the repository. But it would need to be a huge repository indeed for it to take a long time.annex.largefiles=nothing
explicitly for each of them in .gitattributes? (Also then setannex.largefiles=(largerthan=100kb)
in .gitattributes rather than in git config, since that overrides .gitattributes). But in general it would be better if files already in git were not annexed even if they matchannex.largefiles
."it would be better if files already in git were not annexed even if they match annex.largefiles" -- actually not sure: what if a file was in git but gets modified to something much larger? If you have "a few files just a bit larger" than your
annex.largefiles
setting, maybe just increase that setting? You could also setannex.gitaddtoannex=false
to preventgit add
from annexing previously non-annexed files, and usegit annex add
to annex new files.Also, if goal is compatibility with v5, you can lock files after annexing. If auto-lock files after one edit gets implemented at some point, that could also help this case.