Please describe the problem.
Originally reported and troubleshooted against datalad. With older git-annex, behavior was even worse -- the .nfs*
file was left behind, forbidding datalad remove that repository directory entirely. With later git-annex (eg. current 7.20200219+git135-g34d726cba-1~ndall+1
) situation improved: No .nfs*
file, but the KEY directory itself is still there:
Full protocol with details on NFS etc (click to expand)
/tmp > mkdir -p /tmp/nfsmount{,_}
/tmp > cat /etc/exports
/tmp/nfsmount_ localhost(rw)
/tmp > sudo exportfs -a
exportfs: /etc/exports [1]: Neither 'subtree_check' or 'no_subtree_check' specified for export "localhost:/tmp/nfsmount_".
Assuming default behaviour ('no_subtree_check').
NOTE: this default has changed since nfs-utils version 1.0.x
/tmp > sudo mount -t nfs localhost:/tmp/nfsmount_ /tmp/nfsmount
/tmp > mount | grep /tmp/nfsmount
localhost:/tmp/nfsmount_ on /tmp/nfsmount type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=sys,clientaddr=::1,local_lock=none,addr=::1)
/tmp > cat /home/yoh/proj/datalad/trash/nfsdrop
#!/bin/bash
set -xeu
git annex version 2>&1 | head -n 1
if [ -e nfsdrop ]; then
chmod +w -R nfsdrop
rm -rf nfsdrop
fi
mkdir nfsdrop
cd nfsdrop
git init
# git config --add annex.pidlock true
git annex init
echo 123 > file1
git annex add file1
git commit -m dummy
tree -a .git/annex/objects
git annex drop --force file1
tree -a .git/annex/objects
/tmp > git annex version
git-annex version: 7.20200219+git135-g34d726cba-1~ndall+1
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.0.0 ghc-8.4.4 http-client-0.5.13.1 persistent-sqlite-2.8.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
/tmp > cd nfsmount
/tmp/nfsmount > /home/yoh/proj/datalad/trash/nfsdrop
+git annex version
+head -n 1
git-annex version: 7.20200219+git135-g34d726cba-1~ndall+1
+'[' -e nfsdrop ']'
+mkdir nfsdrop
+cd nfsdrop
+git init
Initialized empty Git repository in /tmp/nfsmount/nfsdrop/.git/
+git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
+echo 123
+git annex add file1
add file1
ok
(recording state in git...)
+git commit -m dummy
[master (root-commit) f9539b4] dummy
1 file changed, 1 insertion(+)
create mode 120000 file1
+tree -a .git/annex/objects
.git/annex/objects
└── G6
└── qW
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
3 directories, 1 file
+git annex drop --force file1
drop file1 ok
(recording state in git...)
+tree -a .git/annex/objects
.git/annex/objects
└── G6
└── qW
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
3 directories, 0 files
/tmp/nfsmount > cat nfsdrop/.git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[annex]
uuid = 7c5d4d5f-c2da-4616-8a91-25f9d932d7aa
version = 8
[filter "annex"]
smudge = git-annex smudge -- %f
clean = git-annex smudge --clean -- %f
What was noticed is that git-annex did not decide to have pidlock enabled.
If we enable it via config -- no KEY directory is left behind:
/tmp/nfsmount > cat /home/yoh/proj/datalad/trash/nfsdrop
#!/bin/bash
set -xeu
git annex version 2>&1 | head -n 1
if [ -e nfsdrop ]; then
chmod +w -R nfsdrop
rm -rf nfsdrop
fi
mkdir nfsdrop
cd nfsdrop
git init
git config --add annex.pidlock true
git annex init
echo 123 > file1
git annex add file1
git commit -m dummy
tree -a .git/annex/objects
git annex drop --force file1
tree -a .git/annex/objects
/tmp/nfsmount > /home/yoh/proj/datalad/trash/nfsdrop
+git annex version
+head -n 1
git-annex version: 7.20200219+git135-g34d726cba-1~ndall+1
+'[' -e nfsdrop ']'
+chmod +w -R nfsdrop
+rm -rf nfsdrop
+mkdir nfsdrop
+cd nfsdrop
+git init
Initialized empty Git repository in /tmp/nfsmount/nfsdrop/.git/
+git config --add annex.pidlock true
+git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
+echo 123
+git annex add file1
add file1
ok
(recording state in git...)
+git commit -m dummy
[master (root-commit) ff91d78] dummy
1 file changed, 1 insertion(+)
create mode 120000 file1
+tree -a .git/annex/objects
.git/annex/objects
└── G6
└── qW
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
└── SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b
3 directories, 1 file
+git annex drop --force file1
drop file1 ok
(recording state in git...)
+tree -a .git/annex/objects
.git/annex/objects
0 directories, 0 files
merged with https://git-annex.branchable.com/bugs/huge_multiple_copies_of___39__.nfs__42____39___and___39__.panfs__42____39___being_created/ done --Joey
You should always enable pidlock on NFS.
AFAIK, there has been no particular change to git-annex that prevents .nfs files from causing problems when git-annex uses posix file locking. There might have been some change in the behavior of a nfs server, eg it might delete the .nfs file eventually, but only after git-annex has tried, and failed, to delete the non-empty key directory. Seems like the kind of thing that network latency etc could make nondeterministic.
There are already at least two other open bug reports wishing that git-annex somehow discover when a repo is on NFS and enable pidlock. https://git-annex.branchable.com/bugs/huge_multiple_copies_of___39__.nfs__42____39___and___39__.panfs__42____39___being_created/ https://git-annex.branchable.com/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/
(The second of those is really about something else, but it's been waiting on a fix being confirmed for 3 years, so the only useful part of it is I guess the last message in the thread, which discusses the difficulty of detecting NFS.)
So I'm inclined to merge this bug report with the first of those.
Just if that could shine any light, in aforementioned datalad issue, this specific comment, I state (although show only for the recent annex) that without any change to NFS server/mount, and on a local nfs mount (so network is largely out of question) we ended with .nfs* files with older gitannex and without them in newer. It might be some change in how git-annex deleted files/directories, but no change to NFS.
As for the git-annex version differences, inbetween I believe there was this fix 7.20190819-98-g94c75d2bd AKA 7.20190912~21 for regression: fails to detect need for pidlock on an NSF mount which might or not be related -- didn't check
FWIW, that is ok with me.
Sure, some minor change in git-annex could tickle non-deterministic behavior, eg something to do with timing of when it deletes the file vs the later closing of the file that drops the lock.
That regression made git-annex init fail when NFS doesn't have posix lock support enabled. It can detect that, and enables pidlock, but it can't detect NFS when it is pretending to support posix locks.