My collaborators and I use git annex to track various large data files (among some smaller metadata files managed by ordinary git). Some of these data files need to change completely -- the old ones were just wrong. So I do a git checkout, but don't git annex get
because it would just be a waste of time and bandwidth. This means that my "data files" are just broken symlinks. Now, I find that by making the necessary directories under .git/annex/objects/
, I can write to these files in the usual way. The symlinks are preserved, and the files they link to now exist and are full of my corrected data. This seems like it's a problem because the hash has presumably changed. (I'm still a little fuzzy on how exactly git-annex works.) Also, git/git-annex doesn't seem to realize that anything has changed. Is this recoverable?
Would it have been better to just git rm
(or something) the original version of the file, commit that, and then add the new data? And if so, how should I go about this now that I've created these many very large files? If not, what would be the preferred way to do this?
I think you're making this more complicated than it needs to be. You don't need to mess around with .git/annex/objects at all. You can replace git-annex symlinks with new files and git annex add the new content.
For example: