2a8fdfc7d (Display a warning message when asked to operate on a file
inside a directory that's a symbolic link to elsewhere, 2020-05-11)
made git annex add
error like git add
does rather than silently
doing nothing in a scenario like below.
# <repo>
# |-- a
# | `-- f0
# `-- alink -> a
cd "$(mktemp -d "${TMPDIR:-/tmp}"/gx-link-XXXXXXX)"
git init && git annex init
mkdir a
ln -sT a alink
echo 0 >alink/f0
git annex add alink/f0
Unlike git, however, git-annex's warning also applies when the symbolic link is upstream of the repository and git-annex is passed an absolute path.
# .
# |-- a
# | `-- repo
# | `-- f0
# `-- alink -> a
cd "$(mktemp -d "${TMPDIR:-/tmp}"/gx-link-XXXXXXX)"
mkdir a
ln -sT a alink
git init alink/repo
(
cd alink/repo
git annex init
echo 0 >f0
git annex add "$(pwd)"/f0
)
[...]
git-annex: /tmp/gx-link-Wxqnzo8/alink/repo/f0 is beyond a symbolic link
add f0
ok
(recording state in git...)
The file does appear to be annexed and added to the index despite the warning:
$ git diff --cached
diff --git a/f0 b/f0
new file mode 120000
index 0000000..7a725b9
--- /dev/null
+++ b/f0
@@ -0,0 +1 @@
+.git/annex/objects/Xj/V5/SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa/SHA256E-s2--9a271f2a916b0b6ee6cecb2426f0b3206ef074578be55d9bc94f6f3fe3ab86aa
\ No newline at end of file
Looking at Seek.workTreeItems'
, my first thought was that the above
scenario should not error: relPathCwdToFile
should turn the absolute
path into f0
when it compares it against the current working
directory, /tmp/gx-link-Wxqnzo8/alink/repo/
. However, it looks like
System.Directory.getCurrentDirectory
returns a resolved path for the
current working directory, so relPathCwdToFile
turns the absolute
path into ../../alink/repo/f0
, which gets flagged by the
viasymlink
check.
This error message is of course easy enough to work around on the caller's side, but since it looks like an unintended consequence of 2a8fdfc7d, the change in behavior seems worth mentioning.
Really, the shell is the only thing that knows cd has been used with a symlink, the real cwd is what getCurrentDirectory returns, same as getcwd(3).
(I think this is a bad design decision on the part of shells, it often leads to various confusion and odd behavior. A good example of this is that, if you cd bar/dir, which is a symlink to foo/dir, and both bar/file and foo/file exist, what does rm ../file delete? Not what the shell prompt's
bar>
suggests!)What this code needs to do is stop traversing the path and checking for symlinks when it reaches the top of the working tree. But I'm not currently seeing a good way to do that. It would need to examine the path for symlinks and resolve them. Like canonicalizePath, but don't want to canonicalize the part of the path inside the working tree, eg a symlink "./foo" should not be canonicalized to "./bar". But how does it know where inside working tree part begins when there are symlinks involved? Probably I'm missing something simple which git does to deal with this.
(Also, canonicalizePath or anything like it does add a certian amount of overhead for every file passed to git-annex on the command line, just for this edge case. The current code does 2 stats, but this would be stats all the way up the path.)
It also looks like the nearby code
hidden currbranch relf
will not work in this same case, because relf is "../../alink/repo/f0". In that case it will display "not found" and really fail to process the file.In a way the really surprising thing is not that it rejects this file with "is beyond a symbolic link" but that the file then goes on to be processed anyway. Because this code only displays a warning, and is written with the naive hope that it will manage to replicate git's porcelain behavior. But then git ls-files goes on to list the file anyway, so it's processed anyway.
I think it should certianly, if it warns about a file, not pass it into git ls-files. So the warning really becomes an error and that surprise is avoided.
That might be enough to consider this dealt with, without using canonicalizePath. There would then be an inconsistency that
git add
etc can be used with that path, whilegit annex add
etc reject it as beyond a symbolic link. But git-annex is not obliged to copy git's behavior in what path situations it will accept and rejecting this particular one does not seem like a hardship to the user.I've made it skip processing the file in this case.
Remain unsure if I want to close it now that the behavior is at least not weird, or try to more closely replicate git's behavior of what symlinks it's ok to be behind and which not.
Doh, indeed. Thanks.
It looks like git handles this in
setup.c:abspath_part_inside_repo
. That function takes the absolute, unresolved path that was given as an argument (/tmp/gx-link-Wxqnzo8/alink/repo/f0
, to continue with the original example). Then it compares that to the working tree path (/tmp/gx-link-Wxqnzo8/a/repo
).If the path doesn't already have the working tree as a prefix, the core logic seems to be:
Take the realpath of the leftmost directory in the received path (
/tmp
).If that fully matches the path for the working tree, chop those leading components off of the received path, returning what remains.
If not, tack on the next directory component (
/tmp/gx-link-Wxqnzo8/
) and repeat.For this specific case, that'd go until
/tmp/gx-link-Wxqnzo8/alink/repo/
, whose realpath would match the working tree, and then returnf0
.That of course may not be worth the overhead.
AFAICS, git does something like cache the stat of the directory at the top of the working tree, and uses that to know which part to check for symlinks. There's a lstat cache in the relevant code anyway.
Ok, implemented it that way in git-annex, which I was able to do w/o doing any more work per file than comparing a dev and an inode.