Please describe the problem.
Direct mode repositories seem to initially ignore hard linked files and then when changes are done to them sync them as separate files. However, changes to one file are only propagated to that file and not to any of the others that are hardlinked to it.
What steps will reproduce the problem?
Inside a direct mode repository linked to a ssh remote:
$ ls -l
total 0
$ echo "something" > foo
$ ln foo bar
$ ls -l
total 8
-rw-r--r-- 2 pedrocr pedrocr 10 May 29 12:08 bar
-rw-r--r-- 2 pedrocr pedrocr 10 May 29 12:08 foo
$ tail .git/annex/daemon.log
6c0fbd7..0bb8ef9 git-annex -> synced/git-annex
0bae1b4..bfedc45 master -> synced/master
sent 77 bytes received 31 bytes 72.00 bytes/sec
total size is 10 speedup is 0.09
[2013-05-29 12:08:03 WEST] Transferrer: Uploaded foo
Already up-to-date.
[2013-05-29 12:08:05 WEST] Pusher: Syncing with golias
To ssh://golias.git-annex/home/pedrocr/testsync
0bb8ef9..2ce5013 git-annex -> synced/git-annex
$ git status
# On branch master
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# typechange: foo
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# bar
no changes added to commit (use "git add" and/or "git commit -a")
On the remote repository:
$ ls -l
total 4
-rw-r--r-- 1 pedrocr pedrocr 10 May 29 12:08 foo
If I now just touch the linked file on the repository:
$ touch bar
$ tail .git/annex/daemon.log
(merging synced/git-annex into git-annex...)
(Recording state in git...)
add bar (checksum...) [2013-05-29 12:12:49 WEST] Committer: Committing changes to git
[2013-05-29 12:12:49 WEST] Pusher: Syncing with golias
Already up-to-date.
To ssh://golias.git-annex/home/pedrocr/testsync
2ce5013..d36166b git-annex -> synced/git-annex
bfedc45..ee3a7a1 master -> synced/master
Already up-to-date.
On the remote repository:
$ ls -l
total 8
-rw-r--r-- 1 pedrocr pedrocr 10 May 29 12:08 bar
-rw-r--r-- 1 pedrocr pedrocr 10 May 29 12:08 foo
Note that now bar has been synced as a new file and not a hardlink as it should be (the 1's after the permissions).
The sync also isn't acting properly on the linked files. For example.
First in the origin repository:
$ cat bar
something
$ cat foo
something
$ echo "someotherthing" > bar
$ cat bar
someotherthing
$ cat foo
someotherthing
The result in the destination:
$ cat bar
someotherthing
$ cat foo
something
So even if the intended behavior is for hardlinked files to be synced as two separate files the sync isn't correct because the two files changed in the origin and only one of them changed in the destination. This probably needs to be fixed with actual hard links for real filesystems and with some copying for crippled filesystems.
What version of git-annex are you using? On what operating system?
$ git annex version
git-annex version: 4.20130516.1
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV Inotify DBus XMPP
local repository version: 4
default repository version: 3
supported repository versions: 3 4
upgrade supported from repository versions: 0 1 2
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise
Implementing hard links will probably also require handling the edge case where the user has setup two repositories where one of them spans filesystems. So:
If the user in repository A does a "ln /somefile /someotherdirectory/otherfile" you'd need to treat this as in the crippled case as repository B can't do the hardlink spanning filesystems.
Another edge case may also be that OSX supports hardlinked directories for use with their TimeMachine feature. The feature isn't exposed through the normal ln command but users may sometimes hack around that[1]. Also, any TimeMachine backups will naturally have hardlinked directories.
[1] http://stackoverflow.com/questions/1432540/creating-directory-hard-links-in-macos-x
It would be possible to make the assistant use the inotify CREATE event (which it currently ignores) to add a file to the repository when a hard link is created. However, when a hard linked file is modified, inotify only sends an event for the file that was changed, not for other hard links to it. So, without keeping track of all hard links that exist on my own, there's no way for the assistant to automatically handle that case. And even if it tried to, hard links to files in the repository from outside the repository would still allow modifying them without the assistant being able to detect it.
Since hard links cannot be propigated over git anyway, I don't want to get into this mess. It's best to wontfix this I think.
I agree with your assessment for the traditional git-annex, as it's an extension of git. For the assistant (in direct mode at least) it seems like broken behavior, there will be files that exist on one side but not the other and files that have the same content on one side but not the other.
I just had a look and unfortunately there doesn't seem to be a general way to do inode to path lookup in UNIX, so an inode cache would really be needed. It would look something like:
This does get pretty hairy with the corner cases. Right now the simple case of "sync between two non-crippled filesystems" shows pretty surprising results though so I'd argue something needs to change. I think that for direct mode to really be a "transparent folder sync" kind of solution this should be fixed.
To fix ?Problems with syncing gnucash, I have made some changes to how hard links are dealt with.
Assistant will now notice when a hard link is created, and add the same thing to git it would add for any other file. The hard link is not propigated to other repositories.
Files remain hard linked locally. This means that a change to one will affect the contents of the other. The assistant, lacking a hard link cache, will not notice this, and so will commit the change to the file that was written to, but not commit its hard link. Running
git annex add
manually (or restarting the assistant) will make it finally notice the other file has changed.So, the assistant still does not keep hard links in sync on an ongoing basis. This bug is still unsolved.
This bug is not specific to direct mode; in v7 adjusted unlocked branch the assistant behaves the same as it did in my last comment above.