It's unfortunate that git-annex sorta defeats git's rename detection.
When an annexed file is moved to a different directory (specifically, a directory that is shallower or deeper than the old directory), the symlink often has to change. And so git log cannot --follow back through the rename history, since all it has to go on is that symlink, which it effectively sees as a one line file containing the symlink target.
One way to fix this might be to do the git annex fix
after the rename
is committed. This would mean that a commit would result in new staged
changes for another commit, which is perhaps startling behavior.
The other way to fix it is to stop using symlinks, see ?smudge.
if you go for the two-commits version, small intermediate branches (or git-commit-tree) could be used to create a tree like this:
while the first commit (436e46f) has a "
/subdir/foo → ../.git-annex/where_foo_is
", the intermediate (9395665) has "/subdir/deeper/foo → ../.git-annex/where_foo_is
", and the inal commit (106eef2) has "/subdir/deeper/foo → ../../.git-annex/where_foo_is
".--follow
uses the intermediate commit to find the history, but the intermediate commit would neither show up ingit log --first-parent
nor affectgit diff HEAD^..
& co. (there could still be confusion overgit show
, though).It all boils down to the fact that the path to a relative symlink's target is determined relative to the symlink itself.
Now, if we define the symlink's target relative to the git repo's root (eg. using the $GIT_DIR environment variable, which can be a relative or absolute path itself), this unfortunately results in an absolute symlink, which would -for obvious reasons- only be usable locally:
So, what we need is the ability to record the actual variable name (instead of it's value) in our symlinks.
It is possible, using variable/variant symlinks, yet I'm unsure as to whether or not this is available on Linux systems, and even if it is, it would introduce compatibility issues in multi-OS environments.
Thoughts on this?
Haven't given these any serious thought (which will become apparent in a moment) but hoping they will give birth to some less retarded ideas:
Bait'n'switch
In doing so, the blobs to be committed can remain unaltered, irrespective of their related files' depth in the directory hierarchy.
To prevent git from reporting ALL annexed files as unstaged changes after running post-commit hook, their paths would need to be added to .gitignore.
This wouldn't cause any issues when adding files, very little when modifying files (would need some alterations to "git annex unlock"), BUT would make git totally oblivious to removals...
Manifest-based (re)population
... thus circumventing the issue entirely, yet diffstats (et al.) would be rather uninformative.
Wide open to suggestions, criticism, mocking laughter and finger-pointing
In the meantime, would it be acceptable to split the pre-commit hook into two discrete parts?
This would allow to (if preferred) defer "git annex fix" until post-commit while still keeping the safety net for unlocked files.
Surely this could be handled with an extra layer of indirection?
git-annex would ensure that every directory containing annexed data contains a new symlink
.git-annex
which points to$git_root/.git/annex
. Then every symlink to an annexed object uses a relative symlink via this:.git_annex/objects/xx/yy/ZZZZZZZZZZ
. Even though this symlink is relative, moving it to a different directory would not break anything: if the move destination directory already contained other annexed data, it would also already contain.git-annex
so git-annex wouldn't need to do anything. And if it didn't, git-annex would simply create a new.git-annex
symlink there.These
.git-annex
symlinks could either be added to.gitignore
, or manually/automatically checked in to the current branch - I'm not sure which would be best. There's also the option of using multiple levels of indirection:I'm not sure whether this would bring any advantages. It might bring a performance hit due to the kernel having to traverse more symlinks, but without benchmarking it's difficult to say how much. I'd expect it only to be an issue with a large number of deep directory trees.
That seems an excellent idea, also eliminating the need for git annex fix after moving.
However, I think CVS and svn have taught us the pain associated with a version control system putting something in every subdirectory. Would this pain be worth avoiding the minor pain of needing git annex fix and sometimes being unable to follow renames?
Personally I'd rather have working rename detection but I agree it's not 100% ideal to be littering multiple directories like this, so perhaps you could make it optional, e.g. based on a git config setting?
Here are a few more considerations, some in defence of the approach, some against it:
.git-annex
is hidden;CVS/
is not.CVS/
and.svn/
, it's only a symlink, not a directory containing other files..git-annex
was moved within the repository:.git-annex
in any subdirectory is always a symlink to../.git-annex
so instead you would need to check that all of the new ancestors contain this symlink too, and optionally remove any no longer needed symlinks.$git_root/foo -follow
,diff -r
etc. would traverse into$git_root/.git/annex
This last point is the only downside to this approach I can think of which gives me any noticeable cause for concern. However, people are already use to working around this from CVS and svn days, e.g.
diff -r -x .svn
so I don't think it's anywhere near bad enough to rule it out.Git can follow the rename fine if the file is committed before
git annex fix
(you can git commit -n to see this), so making git-annex pre-commit generate a fixup commit before the staged commit would be one way. Or the other two ways I originally mentioned when writing down this minor issue. I like all those approaches better than .git-annex clutter.