There are multiple issues that have been reported that are related to using git-annex on networked file systems. We're generally talking about NFS, which we'll cover here, but this may also be the case on SMB filesystems.
Locking issues
Here is the prior art here:
All of those issues but the first are related to locking on NFS filesystems, which is notoriously bad. However, the problems with it are not insurmountable and git-annex can actually be used, even if unreliably, on NFS filesystems.
The problem I mainly hit with NFS filesystems is with unreliable locking. If you have similar platforms (both running Linux for example, NFS locking doesn't work in BSD systems), locking should work, but sometimes fails without reason. This problem and the solution is well described in this stackoverflow answer, taken from this excellent blog. Basically, you need to restart a bunch of NFS daemon that get stuck on the server side and then locking works again. This generally fixed it for me:
service nfs-kernel-server stop service rpcbind stop service nfs-common stop service rpcbind start service nfs-common start service nfs-kernel-server start
This needs to be run as root on the server side. Having a simple test script to see if locking works is also useful, i use the following:
#! /usr/bin/perl -w use Fcntl qw(LOCK_SH LOCK_EX LOCK_UN); $child = fork(); open(TESTLCK, ">testlock"); if ($child == 0) { # in child print "locking exclusively\n"; flock(TESTLCK, LOCK_EX) || die "failed to lock exclusively: $!"; print "holding exclusively lock for 3 seconds\n"; sleep 3; flock(TESTLCK, LOCK_UN) || die "failed to unlock exclusively: $!"; print "done locking exclusively\n"; } else { # in parent print "locking shared\n"; flock(TESTLCK, LOCK_SH) || die "failed to lock shared: $!"; print "holding shared lock for 3 seconds\n"; sleep 3; flock(TESTLCK, LOCK_UN) || die "failed to unlock shared: $!"; print "done locking shared, waiting for child to finish\n"; wait; }
Also note that the NFS FAQ (currently offline, thanks to Sourceforge, see this archive) also has interesting snippets about NFS locking. In short: it's a mess, but it can be worked around! -- anarcat
Socket issues
Another thing that may fail is the "ssh caching code". Examples:
- git annex sync dies (sometimes)
- NTFS usb on linux unable to connect to ssh remote
- ?git-annex ignores GIT_SSH
- ?git-annex-shell doesn't work as expected
As you can see, this affects way more than NFS, which often just works there. But it can be that the SSH client can't create a socket for the SSH multiplexing that git-annex uses. Normally, git-annex should detect that and fallback properly, but sometimes this fails, especially with older versions of git-annex. A workaround is to disable the feature:
git config annex.sshcaching false
The tradeoff is that syncs are faster, but it works. -- anarcat
Stray files issue
This is a completely different issue, but could be related to file locking: ?huge multiple copies of '.nfs*' and '.panfs*' being created. Basically, tons of files are left behind by git-annex when it is ran on an NFS server. It is yet unclear how this problem happens and how to resolve it. But it has been reproduced and could affect you, so until it is resolved, it is still an open issue here... -- anarcat
Hi Joey,
Since this issue is > year old, and some fixes/workarounds were done to how locking mechanisms, could you please update on the status of "NFS support". Thank you in advance
git-annex will probe to detect if the filesystem does not support FIFOs and disables
annex.sshcaching
in that case. It's done so since 2013. So I would be surprised if NFS had any problems with annex.sshcaching.git config annex.pidlock true
will make git-annex avoid FCNTL locking, and so work on filesystems that don't support that. It should also avoid the ".nfs" files.It's not enabled by default on NFS because I don't currently have a good way to probe if a given directory is on NFS.
Also, annex.pidlock makes git-annex significantly slower and less safe. But if you're using NFS, speed and safety must have already been de-prioritized.
Seriously, my main advice for using git-annex on NFS is: Don't. Make local clones of repositories and use git-annex to distribute the files around. Unless your institution forces you to use a networked filesystem to access gobs of disk space, and you need to have more files present in a repository than will fit locally.
I have many dozens of .nfs files that I cannot seem to remove. I have had IT reboot the machine I was using with git-annex, as well as the file server in hopes of killing the process that have the files open. The files stubbornly remain, and cannot be removed with 'rm -f .nfsXXXX' with resulting "rm: cannot remove ‘.nfsXXXX’: Permission denied", even after the reboots.
Any thoughts are appreciated, as I have a few hundred gigabytes tied up in these files.
My next step is to see about working with IT to put the file server in single-user mode, and getting root access to see if we can remove the files. But, I'm hoping maybe there are some other suggestion before taking such a drastic step.