todo/symlinks to symlinks to the annexgit-annexhttp://git-annex.branchable.com/todo/symlinks_to_symlinks_to_the_annex/git-annexikiwiki2018-09-18T19:15:11Zcomment 1http://git-annex.branchable.com/todo/symlinks_to_symlinks_to_the_annex/comment_1_54bae401b8de13c9973ef5e6d2cf7e88/Ilya S2018-09-07T18:04:51Z2018-09-07T18:04:51Z
Also relative symlinks can point to other subdirs in the repository, in addition to pointing to files. Basically, it would be good to add a command-line flag so that when git-annex-get or other command operates on a path in the repository, it would also operate on paths pointed to by relative symlinks under the given path.
comment 2http://git-annex.branchable.com/todo/symlinks_to_symlinks_to_the_annex/comment_2_f5c08ecd3b0b4099186d78c85b7f1e6f/joey2018-09-11T17:30:12Z2018-09-11T17:04:29Z
<p>Supporting this would require that git-annex stat every
file that non-git-annex symlinks point to, which seems
like it could have a performance impact.</p>
<p>Also, git-annex is often processing information from git in a pipe,
which can include the link target. In such cases it can very efficiently
see if the link target is an annex object, but to support symlinks to
symlinks it would have to do an additonal round trip through git, which
would be much more innefficient than statting a symlink.</p>
<p>And then there's the question of symlinks which point outside the git
repository, or to another git-annex repository, or symlink loops.
Now we have potentially security sensitive filename parsing.</p>
<p>It seems like a really big can of worms to open, I am not eager to do this.
You'd have to have some <em>extemely</em> compelling use cases. So far, my
response to the use cases provided is:
"Doctor, Doctor… It Hurts When I Do This!"</p>
comment 3http://git-annex.branchable.com/todo/symlinks_to_symlinks_to_the_annex/comment_3_409de39da33b7ecfe5b8c7b2e866225e/Ilya_Shlyakhter2018-09-18T18:41:01Z2018-09-18T18:41:01Z
<p>The main reason for wanting git-annex to follow symlinks, is that the semantics of its commands (get/add/copy) operating on directories would be much more intuitive.
I want to know that after 'git annex get subdir' all annexed files accessible under subdir/ are available; that after 'git annex add subdir && git annex move subdir --to my-remote' all
large files accessible under subdir/ are at my-remote; etc. I want to be able to treat a subdirectory as a self-contained unit. But this isn't possible if relative symlinks stored
undir subdir/ but pointing outside subdir/ might still be broken after 'git annex get subdir', etc.</p>
<p>E.g. I have</p>
<p>proj1/
big_file.dat: symlink to ../.git/annex/objects/....
proj2/
big_file.dat: symlink to ../proj1/big_file.dat
then if I do 'git annex get proj1', I can safely work on proj1 knowing all its files have been fetched; but if I do 'git annex get proj2', I can't safely work on it because
the symlink proj2/big_file.dat is still broken. I can of course make proje2/big_file.dat a direct link to the annex, but that loses the relationship between proj1 and proj2.
It's pretty common to want to create a "variant" of a project by making most files in proj2/ symlinks to corresponding files in proj1/ , except for a few files that differ.</p>
<p>With 'git annex add', it'd help a lot to know that 'git annex add /my/dir' will definitely store <em>everything</em> under /my/dir. In fact, if a symlink under /my/dir points outside the
repository, git-annex could still store the target file in the annex and check in a symlink to that (or perhaps warn the user about the situation). Then I can 'git annex add' my local
setup, even if it points to some absolute paths, and check it out on another machine, with all links working.</p>
<p>"Supporting this would require that git-annex stat every file that non-git-annex symlinks point to" -- only ones that point outside the subdir being worked on. If the target of
such a symlink is a directory, you'd need to process that directory too. But if this all is off by default, and turned on by a flag, then the normal operation won't be affected.</p>
comment 4http://git-annex.branchable.com/todo/symlinks_to_symlinks_to_the_annex/comment_4_818c85d1a199d40daca7371fefb18bc3/Ilya_Shlyakhter2018-09-18T18:48:06Z2018-09-18T18:48:06Z
<p>p.s. you're right that symlinks to outside the repository would be a security risk. They should not be added to the annex without explicit options. But it's important to at least warn about them, so e.g. the user knows that 'git annex add' did not add everything, or that 'git annex get subdir/' did not get everything under subdir/ . It <em>would</em> be useful to handle relative symlinks into submodules though.
Symlink loops would of course be errors.</p>
symlinks into git-annexhttp://git-annex.branchable.com/todo/symlinks_to_symlinks_to_the_annex/comment_5_94b84cd27c36d14dc16837484d232d72/chrysn2018-09-18T19:15:11Z2018-09-18T19:15:11Z
I often work with scenarios where I want to <code>git annex get</code> files where I'm not actually in a git annex repository but in a symlink farm that fans out (often via directory-level indirections) into git annices (is there a canonical plural?). There, what I use is a short script that <code>readlink</code>s the to-be-fetched files until it contains a <code>.git/annex/objects</code> part and then fetches that in its original annex. If there were a way to do that with a standard <code>git annex get</code>, possibly (if after some deliberation there remain security concerns) behind a --follow-symlinks switch, I'd appreciate that.