Joey,
I have git-annex now to manage many of the repositories on my system. I have them both on my local machine, and on a very large file server, and a backup system on the Internet.
Today I went to look at a file in one of my annexes and it wasn't there. This really surprised me. But what surprised me most is that around 90% of the files in all of my annexes on both my local system and my file server are completely missing. Only the Internet backup system has them.
How could something like this happen, when I haven't been interacting with these annexes at all during this time? Can you think of any scenario that might lead to this? This is pretty much the absolute worst case scenario for an archival data system.
I am running on Mac OS X 10.8, using GHC 7.6.3 to build git-annex, and I keep my git-annex binary updated often.
Thanks, John
Wait, I think this comes from a backend switch. I changed my .gitattributes file at one point to read:
I thought this would just affect new files, not existing annexed content. Could this do it?
If you change the backend, and then in one repository you run
git annex migrate
, other repositories that have the old keys will not know about the new names. For this reason, then multiple repositories have the files, it's best to run it redundantly in each repository.TBH, migration is a bit of a PITA because of this. Best to aovid it in most cases.
Git-annex will never perform a migration begind your back. You must have run
git annex migrate
at some point. You can check the git history for details.Just to confirm, this wasn't a git-annex problem at all, but just a misstep during migration as you suggested.
I think what I'm going to do now is to just wipe the slate clean and start over again, by using
unannex -fast
on all the files, wiping.git
, and then adding everything back in using my new default backend of SHA512E. The bigger pain is doing the same thing on all the servers where I have this data (to avoid having to upload it again), but in such a way that I'm not replicating file history. I think I should be able to just clone,mv $OLDREPO/.git/annex/objects objects
,git annex add objects
,git rm -r --cached objects
, and then everything should be good without even needing a new commit on the remote machine, just a git-annex sync.