Hi,
I'm using git-annex across a number of (indirect) repositories, making heavy use of deduplication for organizing files according to various different aspects.
Now I want to keep part of the files also on a VFAT device, which doesn't let me use indirect mode. In direct mode, however, git-annex "get" or "copy" places a separate copy of each file in the repository, whereas in indirect mode, it would just keep a single copy and maintain a number of (inexpensive) symbolic links. Since space on the VFAT drive is limited, I would like to just keep one, specific copy, not caring about the others. If I "drop" an unneeded copy of the file, it also gets replaced by the ASCII "link" in all other places that contained the same file. Therefore, I can either have multiple copies of the same data or none at all.
Imagine you have a bunch of photos sorted into a directories in meant to make it easy to find them (same file name means same file content):
./photo1.jpg ./photo2.jpg ./by-date/2014-10-27/photo1.jpg ./by-date/2014-10-28/photo2.jpg ./by-event/holiday-by-the-sea/photo1.jpg ./by-event/her-birthday/photo2.jpg
I want to keep a copy of ./photo?.jpg in the VFAT repository, but not the other (identical) files. How do I do that? Or is there really no way of doing this?
Thanks.
There is really no way to do this.
We could consider hard-linking the files, but then modifying one would modify the other, which is likely to be confusing. And, FAT doesn't support hard links anyway.
I don't want to complicate git-annex's notion of whether an object is present or not with the possibility that it might be present for some files but not for others. For example,
git annex get
would then need to make a copy of content that was already locally present, while currently it knows that if the file is locally present, it has nothing to do.I think that the solution is to use either a better filesystem which can support the suprerior indirect mode, or to switch your repository to use the WORM backend which does not do deduplication.