git rename detection on file move

It's unfortunate that git-annex sorta defeats git's rename detection.

When an annexed file is moved to a different directory (specifically, a directory that is shallower or deeper than the old directory), the symlink often has to change. And so git log cannot --follow back through the rename history, since all it has to go on is that symlink, which it effectively sees as a one line file containing the symlink target.

One way to fix this might be to do the git annex fix after the rename is committed. This would mean that a commit would result in new staged changes for another commit, which is perhaps startling behavior.

The other way to fix it is to stop using symlinks, see ?smudge.

RSS Atom

use mini-branches

if you go for the two-commits version, small intermediate branches (or git-commit-tree) could be used to create a tree like this:

*   commit 106eef2
|\  Merge: 436e46f 9395665
| | 
| |     the main commit
| |   
| * commit 9395665
|/  
|       intermediate move
|  
* commit 436e46f
| 
|     ...

while the first commit (436e46f) has a "/subdir/foo → ../.git-annex/where_foo_is", the intermediate (9395665) has "/subdir/deeper/foo → ../.git-annex/where_foo_is", and the inal commit (106eef2) has "/subdir/deeper/foo → ../../.git-annex/where_foo_is".

--follow uses the intermediate commit to find the history, but the intermediate commit would neither show up in git log --first-parent nor affect git diff HEAD^.. & co. (there could still be confusion over git show, though).

Comment by chrysn — Wed Mar 9 23:47:48 2011

Remove comment

Use variable symlinks, relative to the repo's root ?

It all boils down to the fact that the path to a relative symlink's target is determined relative to the symlink itself.

Now, if we define the symlink's target relative to the git repo's root (eg. using the $GIT_DIR environment variable, which can be a relative or absolute path itself), this unfortunately results in an absolute symlink, which would -for obvious reasons- only be usable locally:

user@host:~$ mkdir -p tmp/{.git/annex,somefolder}
user@host:~$ export GIT_DIR=~/tmp
user@host:~$ touch $GIT_DIR/.git/annex/realfile
user@host:~$ ln -s $GIT_DIR/.git/annex/realfile $GIT_DIR/somefolder/file
user@host:~$ ls -al $GIT_DIR/somefolder/
total 12
drwxr-x--- 2 user group 4096 2011-03-10 16:54 .
drwxr-x--- 4 user group 4096 2011-03-10 16:53 ..
lrwxrwxrwx 1 user group   33 2011-03-10 16:54 file -> /home/user/tmp/.git/annex/realfile
user@host:~$

So, what we need is the ability to record the actual variable name (instead of it's value) in our symlinks.

It is possible, using variable/variant symlinks, yet I'm unsure as to whether or not this is available on Linux systems, and even if it is, it would introduce compatibility issues in multi-OS environments.

Thoughts on this?

Comment by praet — Thu Mar 10 16:50:28 2011

Remove comment

comment 3

Interesting, I had not heard of variable symlinks before. AFAIK linux does not have them.

Comment by joey — Wed Mar 16 03:03:19 2011

Remove comment

Brainfart

Haven't given these any serious thought (which will become apparent in a moment) but hoping they will give birth to some less retarded ideas:

Bait'n'switch

pre-commit: Replace all staged symlinks (when pointing to annexed files) with plaintext files containing the key of their respective annexed content, re-stage, and add their paths (relative to repo root) to .gitignore.
post-commit: Replace the plaintext files with (git annex fix'ed) symlinks.

In doing so, the blobs to be committed can remain unaltered, irrespective of their related files' depth in the directory hierarchy.

To prevent git from reporting ALL annexed files as unstaged changes after running post-commit hook, their paths would need to be added to .gitignore.

This wouldn't cause any issues when adding files, very little when modifying files (would need some alterations to "git annex unlock"), BUT would make git totally oblivious to removals...

Manifest-based (re)population

Keep a manifest of all annexed files (key + relative path)
DON'T track the symlinks (.gitignore)
Populate/update the directory structure using a post-commit hook.

... thus circumventing the issue entirely, yet diffstats (et al.) would be rather uninformative.

Wide open to suggestions, criticism, mocking laughter and finger-pointing

Comment by praet — Sun Mar 20 20:11:27 2011

Remove comment

comment 5

In the meantime, would it be acceptable to split the pre-commit hook into two discrete parts?

This would allow to (if preferred) defer "git annex fix" until post-commit while still keeping the safety net for unlocked files.

Comment by praet — Mon Mar 21 19:58:34 2011

Remove comment

extra level of indirection

Surely this could be handled with an extra layer of indirection?

git-annex would ensure that every directory containing annexed data contains a new symlink .git-annex which points to $git_root/.git/annex. Then every symlink to an annexed object uses a relative symlink via this: .git_annex/objects/xx/yy/ZZZZZZZZZZ. Even though this symlink is relative, moving it to a different directory would not break anything: if the move destination directory already contained other annexed data, it would also already contain .git-annex so git-annex wouldn't need to do anything. And if it didn't, git-annex would simply create a new .git-annex symlink there.

These .git-annex symlinks could either be added to .gitignore, or manually/automatically checked in to the current branch - I'm not sure which would be best. There's also the option of using multiple levels of indirection:

foo/bar/baz/.git-annex -> ../.git-annex
foo/bar/.git-annex -> ../.git-annex
foo/.git-annex -> ../.git-annex
.git-annex -> .git/annex

I'm not sure whether this would bring any advantages. It might bring a performance hit due to the kernel having to traverse more symlinks, but without benchmarking it's difficult to say how much. I'd expect it only to be an issue with a large number of deep directory trees.

Comment by Adam — Mon Dec 19 12:45:18 2011

Remove comment

comment 7

That seems an excellent idea, also eliminating the need for git annex fix after moving.

However, I think CVS and svn have taught us the pain associated with a version control system putting something in every subdirectory. Would this pain be worth avoiding the minor pain of needing git annex fix and sometimes being unable to follow renames?

Comment by joey — Mon Dec 19 18:22:25 2011

Remove comment

comment 8

Personally I'd rather have working rename detection but I agree it's not 100% ideal to be littering multiple directories like this, so perhaps you could make it optional, e.g. based on a git config setting?

Here are a few more considerations, some in defence of the approach, some against it:

.git-annex is hidden; CVS/ is not.
Unlike CVS/ and .svn/, it's only a symlink, not a directory containing other files.
It doesn't contain any data specific to that directory and could easily be regenerated if deleted accidentally or otherwise.
If a whole directory containing .git-annex was moved within the repository:
- git-annex would need to fix up these symlinks if and only if it's moved to a different depth within the tree.
- However, if the multi-level indirection approach is used, .git-annex in any subdirectory is always a symlink to ../.git-annex so instead you would need to check that all of the new ancestors contain this symlink too, and optionally remove any no longer needed symlinks.
- In either case, git-annex already goes to the trouble of fixing symlinks, and if anything, I think this approach would reduce the number of symlinks which need checking (right?)
find $git_root/foo -follow, diff -r etc. would traverse into $git_root/.git/annex

This last point is the only downside to this approach I can think of which gives me any noticeable cause for concern. However, people are already use to working around this from CVS and svn days, e.g. diff -r -x .svn so I don't think it's anywhere near bad enough to rule it out.

Comment by Adam — Tue Dec 20 12:00:11 2011

Remove comment

comment 9

Git can follow the rename fine if the file is committed before git annex fix (you can git commit -n to see this), so making git-annex pre-commit generate a fixup commit before the staged commit would be one way. Or the other two ways I originally mentioned when writing down this minor issue. I like all those approaches better than .git-annex clutter.

Comment by joey — Tue Dec 20 14:56:12 2011

Remove comment

comment 10

Won't git itself be fixed on this issue? It was on my plans to look into that, however I don't know how difficult it will be.

Comment by Rafael — Tue May 15 07:36:25 2012

Remove comment

Add a comment