If file A is annexed and dropped, and B is a relative symlink to A, then git annex get B should result in A being fetched, but currently doesn't. This would especially help if B is deep within some dir 'mydir', and you do git annex get mydir: annexed files under mydir get fetched, but not annexed files elsewhere in the repository to which symlinks under mydir point. So such symlinks under mydir will continue to remain broken.
Supporting this would require that git-annex stat every file that non-git-annex symlinks point to, which seems like it could have a performance impact.
Also, git-annex is often processing information from git in a pipe, which can include the link target. In such cases it can very efficiently see if the link target is an annex object, but to support symlinks to symlinks it would have to do an additonal round trip through git, which would be much more innefficient than statting a symlink.
And then there's the question of symlinks which point outside the git repository, or to another git-annex repository, or symlink loops. Now we have potentially security sensitive filename parsing.
It seems like a really big can of worms to open, I am not eager to do this. You'd have to have some extemely compelling use cases. So far, my response to the use cases provided is: "Doctor, Doctor… It Hurts When I Do This!"
The main reason for wanting git-annex to follow symlinks, is that the semantics of its commands (get/add/copy) operating on directories would be much more intuitive. I want to know that after 'git annex get subdir' all annexed files accessible under subdir/ are available; that after 'git annex add subdir && git annex move subdir --to my-remote' all large files accessible under subdir/ are at my-remote; etc. I want to be able to treat a subdirectory as a self-contained unit. But this isn't possible if relative symlinks stored undir subdir/ but pointing outside subdir/ might still be broken after 'git annex get subdir', etc.
E.g. I have
proj1/ big_file.dat: symlink to ../.git/annex/objects/.... proj2/ big_file.dat: symlink to ../proj1/big_file.dat then if I do 'git annex get proj1', I can safely work on proj1 knowing all its files have been fetched; but if I do 'git annex get proj2', I can't safely work on it because the symlink proj2/big_file.dat is still broken. I can of course make proje2/big_file.dat a direct link to the annex, but that loses the relationship between proj1 and proj2. It's pretty common to want to create a "variant" of a project by making most files in proj2/ symlinks to corresponding files in proj1/ , except for a few files that differ.
With 'git annex add', it'd help a lot to know that 'git annex add /my/dir' will definitely store everything under /my/dir. In fact, if a symlink under /my/dir points outside the repository, git-annex could still store the target file in the annex and check in a symlink to that (or perhaps warn the user about the situation). Then I can 'git annex add' my local setup, even if it points to some absolute paths, and check it out on another machine, with all links working.
"Supporting this would require that git-annex stat every file that non-git-annex symlinks point to" -- only ones that point outside the subdir being worked on. If the target of such a symlink is a directory, you'd need to process that directory too. But if this all is off by default, and turned on by a flag, then the normal operation won't be affected.
p.s. you're right that symlinks to outside the repository would be a security risk. They should not be added to the annex without explicit options. But it's important to at least warn about them, so e.g. the user knows that 'git annex add' did not add everything, or that 'git annex get subdir/' did not get everything under subdir/ . It would be useful to handle relative symlinks into submodules though. Symlink loops would of course be errors.
git annex get
files where I'm not actually in a git annex repository but in a symlink farm that fans out (often via directory-level indirections) into git annices (is there a canonical plural?). There, what I use is a short script thatreadlink
s the to-be-fetched files until it contains a.git/annex/objects
part and then fetches that in its original annex. If there were a way to do that with a standardgit annex get
, possibly (if after some deliberation there remain security concerns) behind a --follow-symlinks switch, I'd appreciate that.