Hello,
git-annex looks very interesting and I would like it to version large binary artifacts for testing in our source code repository.
My question:
I want to have/can have multiple clones of the same repository on the same machine. However, as the binary files can be huge, I would like to store the files only exactly ONCE per machine and not again in the .git/annex/objects folder of each similar cloned repository.
To achieve that, I first created in
/tmp/repo-clone1/.git/annex/objects
and then symlinked
ln -s /tmp/repo-clone1/.git/annex/objects /tmp/repo-clone2/.git/annex/objects
such that
/tmp/repo-clone1
/tmp/repo-clone2
share the same big files and the big files are only once on the machine.
Is this a good idea? Is there a better way to achieve this? Looks a bit hacky. Would be nicer if you can specify a dedicated "objects" folder from the start?!
Thanks and Regards, J
that is a pretty bad idea! git-annex will believe it has two copy of the files and could allow you to drop the last copy, and loose data.
instead, you should clone the repo with
--shared
, like this:according to the git-annex manpage, this will set the
annex.hardlink
setting and mark the repo as "untrusted". files will be hardlinked between the two repositories, using only the space once.see also ?wishlist: use hardlinks for local clones. --anarcat
Hi, I'm new to git and we might be adopting it on next project solely because of git annex! As our files are quite big, having users cloning the repository with --shared is very very nice! We use Linux, so no issue with hardlinks. I did some testing and so far so good. No duplicated space is very very nice! But I'm wondering: is there any disadvantage on this approach?
@davicastro using --shared makes git-annex not trust the shared clone, which is necessary to avoid situations that could result in data loss. The downside though, is that the lack of trust can change git-annex behavior in some situations.
For example, normally you can run
git annex get myfile
and thengit annex drop myfile --from someremote
will remove it from the remote, since you now have a local copy. But, with the shared clone being untrusted, the drop will fail if it would be the only remaining copy of the file. In this situation, you would need to firstgit annex copy myfile --to origin
or something like that before dropping.Of course, that copy would run fast and cheap since it only has to make a hardlink!