bugs/git rename detection on file movegit-annexhttp://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/git-annexikiwiki2013-11-27T22:47:37Zuse mini-brancheshttp://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_1_0531dcfa833b0321a7009526efe3df33/chrysn2013-11-27T22:47:37Z2011-03-09T23:47:48Z
<p>if you go for the two-commits version, small intermediate branches (or git-commit-tree) could be used to create a tree like this:</p>
<pre><code>* commit 106eef2
|\ Merge: 436e46f 9395665
| |
| | the main commit
| |
| * commit 9395665
|/
| intermediate move
|
* commit 436e46f
|
| ...
</code></pre>
<p>while the first commit (436e46f) has a "<code>/subdir/foo → ../.git-annex/where_foo_is</code>", the intermediate (9395665) has "<code>/subdir/deeper/foo → ../.git-annex/where_foo_is</code>", and the inal commit (106eef2) has "<code>/subdir/deeper/foo → ../../.git-annex/where_foo_is</code>".</p>
<p><code>--follow</code> uses the intermediate commit to find the history, but the intermediate commit would neither show up in <code>git log --first-parent</code> nor affect <code>git diff HEAD^..</code> & co. (there could still be confusion over <code>git show</code>, though).</p>
Use variable symlinks, relative to the repo's root ?http://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_2_7101d07400ad5935f880dc00d89bf90e/praet2013-11-27T22:47:37Z2011-03-10T16:50:28Z
<p>It all boils down to the fact that the path to a relative symlink's target is determined relative to the symlink itself.</p>
<p>Now, if we define the symlink's target relative to the git repo's root (eg. using the $GIT_DIR environment variable, which can be a relative or absolute path itself), this unfortunately results in an absolute symlink, which would -for obvious reasons- only be usable locally:</p>
<pre><code>user@host:~$ mkdir -p tmp/{.git/annex,somefolder}
user@host:~$ export GIT_DIR=~/tmp
user@host:~$ touch $GIT_DIR/.git/annex/realfile
user@host:~$ ln -s $GIT_DIR/.git/annex/realfile $GIT_DIR/somefolder/file
user@host:~$ ls -al $GIT_DIR/somefolder/
total 12
drwxr-x--- 2 user group 4096 2011-03-10 16:54 .
drwxr-x--- 4 user group 4096 2011-03-10 16:53 ..
lrwxrwxrwx 1 user group 33 2011-03-10 16:54 file -> /home/user/tmp/.git/annex/realfile
user@host:~$
</code></pre>
<p>So, what we need is the ability to record the actual variable name (instead of it's value) in our symlinks.</p>
<p>It <em>is</em> possible, using <a href="http://en.wikipedia.org/wiki/Symbolic_link#Variable_symbolic_links">variable/variant symlinks</a>, yet I'm unsure as to whether or not this is available on Linux systems, and even if it is, it would introduce compatibility issues in multi-OS environments.</p>
<p>Thoughts on this?</p>
comment 3http://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_3_57010bcaca42089b451ad8659a1e018e/joey2013-11-27T22:47:37Z2011-03-16T03:03:19Z
Interesting, I had not heard of variable symlinks before. AFAIK linux does not have them.
Brainfarthttp://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_4_79d96599f757757f34d7b784e6c0e81c/praet2013-11-27T22:47:37Z2011-03-20T20:11:27Z
<p>Haven't given these any serious thought (which will become apparent in a moment) but hoping they will give birth to some less retarded ideas:</p>
<hr />
<h3>Bait'n'switch</h3>
<ul>
<li>pre-commit: Replace all staged symlinks (when pointing to annexed files) with plaintext files containing the key of their respective annexed content, re-stage, and add their paths (relative to repo root) to .gitignore.</li>
<li>post-commit: Replace the plaintext files with (git annex fix'ed) symlinks.</li>
</ul>
<p>In doing so, the blobs to be committed can remain unaltered, irrespective of their related files' depth in the directory hierarchy.</p>
<p>To prevent git from reporting ALL annexed files as unstaged changes after running post-commit hook, their paths would need to be added to .gitignore.</p>
<p>This wouldn't cause any issues when adding files, very little when modifying files (would need some alterations to "git annex unlock"), BUT would make git totally oblivious to removals...</p>
<hr />
<h3>Manifest-based (re)population</h3>
<ul>
<li>Keep a manifest of all annexed files (key + relative path)</li>
<li>DON'T track the symlinks (.gitignore)</li>
<li>Populate/update the directory structure using a post-commit hook.</li>
</ul>
<p>... thus circumventing the issue entirely, yet diffstats (et al.) would be rather uninformative.</p>
<hr />
<p><strong><em>Wide open to suggestions, criticism, mocking laughter and finger-pointing <img src="http://git-annex.branchable.com/smileys/smile.png" alt=":)" /></em></strong></p>
comment 5http://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_5_d61f5693d947b9736b29fca1dbc7ad76/praet2013-11-27T22:47:37Z2011-03-21T19:58:34Z
<p>In the meantime, would it be acceptable to split the pre-commit hook
into two discrete parts?</p>
<p>This would allow to (if preferred) defer "git annex fix" until
post-commit while still keeping the safety net for unlocked files.</p>
extra level of indirectionhttp://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_6_f63de6fe2f7189c8c2908cc41c4bc963/Adam2013-11-27T22:47:37Z2011-12-19T12:45:18Z
<p>Surely this could be handled with an extra layer of indirection?</p>
<p>git-annex would ensure that every directory containing annexed data contains a new symlink <code>.git-annex</code> which points to <code>$git_root/.git/annex</code>. Then every symlink to an annexed object uses a relative symlink via this: <code>.git_annex/objects/xx/yy/ZZZZZZZZZZ</code>. Even though this symlink is relative, moving it to a different directory would not break anything: if the move destination directory already contained other annexed data, it would also already contain <code>.git-annex</code> so git-annex wouldn't need to do anything. And if it didn't, git-annex would simply create a new <code>.git-annex</code> symlink there.</p>
<p>These <code>.git-annex</code> symlinks could either be added to <code>.gitignore</code>, or manually/automatically checked in to the current branch - I'm not sure which would be best. There's also the option of using multiple levels of indirection:</p>
<pre><code>foo/bar/baz/.git-annex -> ../.git-annex
foo/bar/.git-annex -> ../.git-annex
foo/.git-annex -> ../.git-annex
.git-annex -> .git/annex
</code></pre>
<p>I'm not sure whether this would bring any advantages. It might bring a performance hit due to the kernel having to traverse more symlinks, but without benchmarking it's difficult to say how much. I'd expect it only to be an issue with a large number of deep directory trees.</p>
comment 7http://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_7_7f20d0b2f6ed1c34021a135438037306/joey2013-11-27T22:47:37Z2011-12-19T18:22:25Z
<p>That seems an excellent idea, also eliminating the need for git annex fix after moving.</p>
<p>However, I think CVS and svn have taught us the pain associated with a version control system putting something in every subdirectory. Would this pain be worth avoiding the minor pain of needing git annex fix and sometimes being unable to follow renames?</p>
comment 8http://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_8_6a00500b24ba53248c78e1ffc8d1a591/Adam2013-11-27T22:47:37Z2011-12-20T12:00:11Z
<p>Personally I'd rather have working rename detection but I agree it's not 100% ideal to be littering multiple directories like this, so perhaps you could make it optional, e.g. based on a git config setting?</p>
<p>Here are a few more considerations, some in defence of the approach, some against it:</p>
<ul>
<li><code>.git-annex</code> is hidden; <code>CVS/</code> is not.</li>
<li>Unlike <code>CVS/</code> and <code>.svn/</code>, it's only a symlink, not a directory containing other files.</li>
<li>It doesn't contain any data specific to that directory and could easily be regenerated if deleted accidentally or otherwise.</li>
<li>If a whole directory containing <code>.git-annex</code> was moved within the repository:
<ul>
<li>git-annex would need to fix up these symlinks if and only if it's moved to a different depth within the tree.</li>
<li>However, if the multi-level indirection approach is used, <code>.git-annex</code> in any subdirectory is <em>always</em> a symlink to <code>../.git-annex</code> so instead you would need to check that all of the new ancestors contain this symlink too, and optionally remove any no longer needed symlinks.</li>
<li>In either case, git-annex already goes to the trouble of fixing symlinks, and if anything, I <em>think</em> this approach would reduce the number of symlinks which need checking (right?)</li>
</ul>
</li>
<li>find <code>$git_root/foo -follow</code>, <code>diff -r</code> etc. would traverse into <code>$git_root/.git/annex</code></li>
</ul>
<p>This last point is the only downside to this approach I can think of which gives me any noticeable cause for concern. However, people are already use to working around this from CVS and svn days, e.g. <code>diff -r -x .svn</code> so I don't think it's anywhere near bad enough to rule it out.</p>
comment 9http://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_9_75e0973f6d573df615e01005ebcea87d/joey2013-11-27T22:47:37Z2011-12-20T14:56:12Z
<p>Git can follow the rename fine if the file is committed before <code>git annex fix</code> (you can git commit -n to see this), so
making git-annex pre-commit generate a fixup commit before the staged commit would be one way. Or the other two ways I originally mentioned when writing down this minor issue. I like all those approaches better than .git-annex clutter.</p>
comment 10http://git-annex.branchable.com/bugs/git_rename_detection_on_file_move/comment_10_5ec2f965c80cc5dd31ee3c4edb695664/Rafael2013-11-27T22:47:37Z2012-05-15T07:36:25Z
Won't git itself be fixed on this issue? It was on my plans to look into that, however I don't know how difficult it will be.