Please describe the problem.
When using git-annex-import
to import from a directory remote, if the directory tree has directory symlinks pointing to directories outside the directory tree of the directory remote, the targets of these symlinks get imported. This can lead to importing much more than was intended; such symlinks should probably get imported as symlinks by default, with a command-line option to import their targets. There might even be a security issue with unexpectedly importing and sharing content outside the explicitly specified directory tree.
What version of git-annex are you using? On what operating system?
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
$ git annex version
git-annex version: 10.20220322-g7b64dea
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.0 ghc-8.10.7 http-client-0.7.9 persistent-sqlite-2.13.0.3 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.1.2
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 8
$ uname -a
Linux ctchpcpx163.merck.com 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 GNU/Linux
# End of transcript or log.
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
That I use it enough to run into corner-case issues shows its continued usefulness
git-annex import $dir
also follows symlinks inside $dir. So importing has been behaving this way since long before the directory special remote supported importtree.This is not a security hole, because if an attacker wants to make you import
/foo
when importing/bar
, and they have write access to bar, they are not limited to making a/bar/foo -> /foo
symlink. They can justcp -a /foo /bar
instead.I don't really think it would make much sense for any import to import symlinks as symlinks. If the symlink points outside the imported directory, that would result in a symlink that points outside the git repository, which is not something one often wants to check into a git repository.
I don't know if I would really consider this a bug either. It at least seems plausible that there might be users who import from
~/disk
which is a symlink to/media/somethinglong
, and rely on it following the symlink. I often make symlink aliases for mount points like that, though I have not imported from them.I meant "security hole" more in the sense of the user themselves inadvertently importing (and then sharing) more than they meant to. E.g. I was importing a large subtree created by others, and had no clue it included symlinks outside the subtree; I only noticed this by accident, when the import started taking too long.
In conjunction with the
mv
semantics, this seems risky... had I been using the original directory import, I'd have inadvertently deleted a large dataset (to which there was a symlink in the imported tree) from a place others expect to find it. The unixmv
command doesn't even have a flag to follow symlinks (nevermind defaulting to that).It's certainly plausible; question is, should it be the default? It's not the default in
cp
(you have to use-L
) or intar
(have to use-h
). I think most people assume that importing from a directory remote is akin to doing acp -r
ortar cf
from it.One other scenario where the result might not be what users expect is the following: if a directory special remote is configured with both
importtree=yes
andexporttree=yes
, and the directory contains symlinks pointing outside the tree, then an import followed by an export will replace the symlink in the original directory with a copy of the content.Such a symlink is already not something one often has
But if one does, then the repo is likely for one's own usage, or for the usage by people with access to the shared filesystem where the link works, so adding the link to git as-is makes sense. Logically, it's likely that the out-of-tree link target represents some separate tree of files that you don't think of as part of the tree (or you'd have put them under the tree); if you did want to import them, you'd make a separate repo for them and import them as a submodule.
Also, what happens if the target tree of the out-of-tree link has a symlink back to the original tree -- could this cause infinite recursion?
The
git-annex-import
man page says the command imports "a tree of files". It seems simplest if this description was always strictly true, regardless of what's in the tree. But if you decide to keep the current default, maybe clarify the web page?Thanks again for all your work.