Please describe the problem.
There appears to be some locking-related pain with git annex v10 on NFS.
Specifically, git status, git annex status, and seemingly everything that uses git-annex filter-process will hang forever.
What steps will reproduce the problem?
(approximately)
git init && git annex init --version 10
(add some large file, commit)
git annex sync
git status # hangs forever
git annex status # hangs forever
git annex info # actually works, and produces typical output
git diff path/to/normal-file.txt # also hangs
git annex version # actually works, and produces typical output
What version of git-annex are you using? On what operating system?
Git annex installed from the latest standalone amd64 tarball. I verified that both which git
and which git-annex
point to the scripts/binaries in that tarball.
$ git annex version
git-annex version: 10.20220725-g9ac72bff2
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10
Running Ubuntu 20.04, with ~/ (where binaries are) and /workspaces (where the repo is) on NFS. mount -v
shows the following (clipped and IPs obscured)
srv:/path on /workspaces type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=XXXX,local_lock=none,addr=XXXX)
.git/config contains (anonymised)
[annex]
uuid = UUID_HERE
version = 10
pidlock = true
Please provide any additional information below.
As a workaround, I deleted all files from .git/annex EXCEPT for the objects directory, then did git annex init --version=9
, and reset the UUID to what it was before. I then ran git annex fsck, and it all seemed to function correctly with no hangs. But, a few commands later, I noticed we were back at version=10, and the problem recurred. I set annex.autoupgraderepository = false
, again nuked .git/annex, set version=8 this time in the config file and with git annex init --version 8 and it seems were good and stable for now.
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
A great deal! We're managing tens of TB of plant genomics data with git annex, and it's a life saver, so many many thanks Joey et al. I know NFS is a bit of an edge case in git-annex land, but for the majority of git-annex's scientific users, NFS is all we have.
(many apologies, I buggered up the markdown formatting of the following section. Fixed below)
What steps will reproduce the problem?
(approximately)
If filter-process is the problem, it does not make sense that repo v9 v9 would not have the problem, because v9 enables filter-process. If v9 is ok and only v10 has the problem then it must have something to do with the locking changes in v10.
Also, you can upgrade to v9/v10 and still disable git-annex filter-process, by running this after the upgrade:
(Note that git-annex had a bug that caused v9 to upgrade to v10 without waiting the year it was supposed to. That bug is why your v9 repo upgraded. That was fixed in today's git-annex release.)
FWIW, I don't see this problem in a v10 repo on ext4. I set annex.pidlock in case it being set somehow causes a locking problem.
Try running this, in a repo where it hangs, to debug at what point in the git-annex filter-process it is hanging.
Then try disabling filter-process and confirm if it hangs or not:
So I think I have this issue completely wrong, apologies.
The problem recurred again today, on a v8 repo, and with yesterday's release of git-annex. There is definitely a problem with locking, but it can't be due to the changes in version 10.
To debug further, I removed only some files at a time from the .git/annex directory. The offending files were the keysdb (I did
rm -rf keysdb*
, so I'm not sure which of the specific files were the offenders).I'll keep you posted as I debug it further, but I thought I should let you know immediately that my original diagnosis was wrong.
Thanks for confirming my feeling that something about the bug as reported was not quite right.
It it was keysdb.cache or keysdb.lck, then it might be a locking problem in git-annex.
But it seems more likely it was
keysdb/*
.. And that is a sqlite database. And if sqlite has problems with its internal locking that don't work on your filesystem, it would be hard to fix that in git-annex. Not using sqlite or switching the database from WAL mode to non-WAL mode are about all git-annex could do and both would be major undertakings.A recently added feature to git-annex is the annex.dbdir git config. That can be set to a directory on another filesystem, and git-annex will store all its sqlite databases in that directory. I think this option is your best chance of avoiding the problem.