Please describe the problem.
I am unable to copy files from one git-annex repo to another, either using copy --from=...' from the destination repo or
--to=...' from the source repo.
Even with the added output from --debug
I have no idea what is going wrong. The example below focusses on one file, but I think no files work.
git annex unused
shows partially transferred data
after the failures. As far as I can tell, the rsync commands that copy the files to .git/annex/objects/tmp
are working, so I guess the failure happens after that.
From the source repo, located at /home/james/w/annex-neu
:
james copter annex-neu $ git remote show -n nature
* remote nature
Fetch URL: /d/nature-ext2/falsifian/w/annex-neu
Push URL: /d/nature-ext2/falsifian/w/annex-neu
HEAD branch: (not queried)
Remote branches: (status not queried)
git-annex
master
synced/git-annex
synced/master
Local ref configured for 'git push' (status not queried):
(matching) pushes to (matching)
james copter annex-neu $ git annex copy --to=nature datasets/reddit/files.pushshift.io/reddit/daily/RS_2019_10_01.gz
copy datasets/reddit/files.pushshift.io/reddit/daily/RS_2019_10_01.gz (to nature...)
failed to send content to remote
failed to send content to remote
failed
git-annex: copy: 1 failed
And with --debug
:
james copter annex-neu $ git annex copy --debug --to=nature datasets/reddit/files.pushshift.io/reddit/daily/RS_2019_10_01.gz [2021-06-05 20:24:58.016674809] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2021-06-05 20:24:58.021106113] process done ExitSuccess
[2021-06-05 20:24:58.021381484] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2021-06-05 20:24:58.025170816] process done ExitSuccess [2021-06-05 20:24:58.025705324] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..803d71e5e3f15ee4a2a30e63c5e0e9faccf0defe","--pretty=%H","-n1"] [2021-06-05 20:24:58.029655191] process done ExitSuccess [2021-06-05 20:24:58.030257913] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","log","refs/heads/git-annex..4a7de0d2b9a7f4a2b9cf88c57636a8388
6f6b340","--pretty=%H","-n1"]
[2021-06-05 20:24:58.034210664] process done ExitSuccess
[2021-06-05 20:24:58.034470983] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","log","refs/heads/git-annex..bbfde8eb1c05e46c1a4e25719c8eb0418
9fa7cf2","--pretty=%H","-n1"]
[2021-06-05 20:24:58.038954155] process done ExitSuccess
[2021-06-05 20:24:58.039404429] chat: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","cat-file","--batch"]
[2021-06-05 20:24:58.039881197] chat: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(object
size)"]
[2021-06-05 20:24:58.050584672] read: git ["config","--null","--list"]
[2021-06-05 20:24:58.053085725] process done ExitSuccess
[2021-06-05 20:24:58.053918467] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","symbolic-ref","-q","HEAD"]
[2021-06-05 20:24:58.056598571] process done ExitSuccess
[2021-06-05 20:24:58.056750274] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","show-ref","refs/heads/master"]
[2021-06-05 20:24:58.059953296] process done ExitSuccess
[2021-06-05 20:24:58.060270357] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","ls-files","--stage","-z","--","datasets/reddit/files.pushshif
t.io/reddit/daily/RS_2019_10_01.gz"]
[2021-06-05 20:24:58.060602372] chat: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(object
size)","--buffer"]
[2021-06-05 20:24:58.061149131] chat: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)"
,"--buffer"]
[2021-06-05 20:24:58.061754169] chat: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)"
,"--buffer"]
copy datasets/reddit/files.pushshift.io/reddit/daily/RS_2019_10_01.gz (to nature
...)
[2021-06-05 20:24:58.124257351] read: cp ["--reflink=always","--preserve=timesta
mps",".git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47b
c1bf3a46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be
3400a47bc1bf3a46253646b710579b9162560cea087.gz","../../../../nature-ext2/falsifi
an/w/annex-neu/.git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc
1bf3a46253646b710579b9162560cea087.gz"]
[2021-06-05 20:24:58.125931854] process done ExitFailure 1
[2021-06-05 20:24:58.12621772] read: rsync ["--progress","--inplace","--perms","
.git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a
46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be3400a4
7bc1bf3a46253646b710579b9162560cea087.gz","../../../../nature-ext2/falsifian/w/a
nnex-neu/.git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a4
6253646b710579b9162560cea087.gz"]
100% 288.37 MiB 330 MiB/s 0s [2021-06-05 20:24:59.003840882] process done
ExitSuccess
[2021-06-05 20:24:59.004436111] read: rsync ["--progress","--inplace","--perms",
".git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3
a46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be3400a
47bc1bf3a46253646b710579b9162560cea087.gz","../../../../nature-ext2/falsifian/w/
annex-neu/.git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a
46253646b710579b9162560cea087.gz"]
100% 288.37 MiB 463 MiB/s 0s [2021-06-05 20:24:59.629830876] process done
ExitSuccess
failed to send content to remote
[2021-06-05 20:24:59.631304163] read: rsync ["--progress","--inplace","--perms",
".git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3
a46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be3400a
47bc1bf3a46253646b710579b9162560cea087.gz","../../../../nature-ext2/falsifian/w/
annex-neu/.git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a
46253646b710579b9162560cea087.gz"]
100% 288.37 MiB 462 MiB/s 0s [2021-06-05 20:25:00.257185993] process done
ExitSuccess
[2021-06-05 20:25:00.257891486] read: rsync ["--progress","--inplace","--perms",
".git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3
a46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be3400a
47bc1bf3a46253646b710579b9162560cea087.gz","../../../../nature-ext2/falsifian/w/
annex-neu/.git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a
46253646b710579b9162560cea087.gz"]
100% 288.37 MiB 437 MiB/s 0s [2021-06-05 20:25:00.92075523] process done E
xitSuccess
failed to send content to remote
failed
[2021-06-05 20:25:00.921692771] process done ExitSuccess
[2021-06-05 20:25:00.921771903] process done ExitSuccess
[2021-06-05 20:25:00.921837622] process done ExitSuccess
git-annex: copy: 1 failed
Trying in the other direction (note /home/falsifian is a symlink to james):
james copter annex-neu $ git remote show -n local * remote local
Fetch URL: /home/falsifian/w/annex-neu
Push URL: /home/falsifian/w/annex-neu
HEAD branch: (not queried)
Remote branches: (status not queried)
git-annex
master synced/git-annex synced/master Local ref configured for 'git push' (status not queried): (matching) pushes to (matching) james copter annex-neu $ git annex get --debug --from=local datasets/reddit/file
s.pushshift.io/reddit/daily/RS_2019_10_01.gz
[2021-06-05 20:26:43.459693059] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","show-ref","git-annex"]
[2021-06-05 20:26:43.4637773] process done ExitSuccess
[2021-06-05 20:26:43.464083545] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2021-06-05 20:26:43.467747946] process done ExitSuccess
[2021-06-05 20:26:43.468371444] read: git ["--git-dir=.git","--work-tree=.","--l
iteral-pathspecs","log","refs/heads/git-annex..2585bc984868658a30bac238f343e14ee
0ab4a40","--pretty=%H","-n1"]
[2021-06-05 20:26:43.472230994] process done ExitSuccess
[2021-06-05 20:26:43.472821833] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..4026e21e431b23f3ed8bab4a59d076a07eb38aba","--pretty=%H","-n1"]
[2021-06-05 20:26:43.476524704] process done ExitSuccess
[2021-06-05 20:26:43.476704328] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..4a7de0d2b9a7f4a2b9cf88c57636a83886f6b340","--pretty=%H","-n1"]
[2021-06-05 20:26:43.480419871] process done ExitSuccess
[2021-06-05 20:26:43.480903262] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2021-06-05 20:26:43.481438522] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2021-06-05 20:26:43.484464329] read: git ["config","--null","--list"]
[2021-06-05 20:26:43.484899963] read: git ["config","--null","--list"]
[2021-06-05 20:26:43.494876192] process done ExitSuccess
[2021-06-05 20:26:43.495606623] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2021-06-05 20:26:43.498087081] process done ExitSuccess
[2021-06-05 20:26:43.498254949] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2021-06-05 20:26:43.50112263] process done ExitSuccess
[2021-06-05 20:26:43.501446299] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--","datasets/reddit/files.pushshift.io/reddit/daily/RS_2019_10_01.gz"]
[2021-06-05 20:26:43.501858278] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2021-06-05 20:26:43.503867361] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2021-06-05 20:26:43.504653623] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2021-06-05 20:26:43.50750836] chat: git ["--git-dir=.git","--work-tree=.","--li
teral-pathspecs","cat-file","--batch"]
[2021-06-05 20:26:43.50796496] chat: git ["--git-dir=.git","--work-tree=.","--li
teral-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
get datasets/reddit/files.pushshift.io/reddit/daily/RS_2019_10_01.gz (from local...)
[2021-06-05 20:26:43.568982372] read: cp ["--reflink=always","--preserve=timesta
mps","../../../../../home/falsifian/w/annex-neu/.git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz",".git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz"]
[2021-06-05 20:26:43.571638865] process done ExitFailure 1
[2021-06-05 20:26:43.572002434] read: rsync ["--progress","--inplace","--perms","../../../../../home/falsifian/w/annex-neu/.git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz",".git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz"]
100% 288.37 MiB 351 MiB/s 0s [2021-06-05 20:26:44.396269512] process don
e ExitSuccess
failed to retrieve content from remote
[2021-06-05 20:26:44.397675637] read: rsync ["--progress","--inplace","--perms","../../../../../home/falsifian/w/annex-neu/.git/annex/objects/vv/Km/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz",".git/annex/tmp/SHA256E-s302380112--fb63e37bf5af350af35be3400a47bc1bf3a46253646b710579b9162560cea087.gz"]
0% 32 KiB 23 MiB/s 12s[2021-06-05 20:26:45.0364163] process done Ex
itSuccess
failed to retrieve content from remote
failed
[2021-06-05 20:26:45.037805309] process done ExitSuccess
[2021-06-05 20:26:45.03799324] process done ExitSuccess
[2021-06-05 20:26:45.038069266] process done ExitSuccess
[2021-06-05 20:26:45.038244936] process done ExitSuccess
[2021-06-05 20:26:45.03831897] process done ExitSuccess
git-annex: get: 1 failed
What steps will reproduce the problem?
I don't know how to reproduce this on a different repo. This is the first time I've seen this. I haven't had trouble in the past.
What version of git-annex are you using? On what operating system?
james copter annex-neu $ git annex version
git-annex version: 8.20200810
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.27 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
local repository version: 8
james copter annex-neu $ uname -a
Linux copter 4.9.0-15-amd64 #1 SMP Debian 4.9.258-1 (2021-03-08) x86_64 GNU/Linux
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
git-annex is awesome! I am using it a lot.
From the --deubg you provided, the important part is that rsync is succeeding to copy the file, but then git-annex for some reason thinks it failed, or has some other problem. Eg, if the annexed object was corrupt, it could copy it with rsync and then fail to verify the copy.
Your git-annex version is a year out of date, so you should upgrade. The code that is failing has definitely been changed a lot in the meantime, though not that I can remember to fix a bug like this.
I tried again with 8.20210428 and saw a similar error.
git annex fsck
thinks the file is okay. It doesn't exist on the remote (elbow
) in this case (confirmed that the symlink is broken).copy&paste showing
version
,copy --debug
andfsck
:If it helps: I see a progress indicator go from 0% to 100% four times when I run the copy command. The "failed to send content to remote" messages appear after the second and fourth times.
Also, I was able to copy that reddit dataset file (from my original comment) to this remote (
elbow
) today. So I guess it is not a property of the particular file on the source repository?I've taken a close look at the error handling in the relevant code path. Normally when there's an exception it gets displayed. It is possible for the transfer to fail without displaying anything and without throwing an exception. When I synthesize such a failure, I see the same output you described.
One way that could happen is if git-annex thinks the file got modified while it was being sent. If the file was unlocked, and was touched by something else while the copy was running, it would be enough to make git-annex think that, even though the content is still good. I've made it display a reason for the failure in that case.
The only other way I found was if the hash fails to verify, which seems already ruled out in your case, but it did result in the same hard to understand output, so I've fixed that too.
It would be good to know if one of these changes has fixed it for you. The autobuild for linux should be updated within an hour. While the scenario with the file having been touched seems unlikely to happen repeatedly with different files, it does otherwise match your circumstances and seems like the most likely fit.
Hm, the mtime of the target of
software_releases/openbsd/68/amd64/install68.img
is later than the time of my last comment:I can't imagine what would have touched it in the meantime, but there it is. So, I suspect your theory about the mtime changing is right. (I double-checked that git annex fsck is still happy with the file today.)
Next time I see this behaviour, I'll try running your updated code to confirm the cause, and also try to figure out what's touching the file.
See also similar report here https://git-annex.branchable.com/forum/get_fails/
I'm having the same problem with the latest git-annex from Debian bullseye (
8.20210223-2
). I have a flaky hardware clock on this computer, which NTP regularly adjusts by a few seconds (!):so maybe that's what's going on? i understand that having a working clock is kind of ... expected of computers nowadays but maybe that could be the source of this problem...
Whether the timestamp is right or not doesn't matter, but if something is causing it to change, it could result in this error.
All I can suggest is you upgrade git-annex to a current release, to get a hopefully better error message.
I have a repository with a file where this happens reliably for
get
requests by different users to different locations on that system (but not forget
requests issued from different systems).If it helps, we could arrange for access Joey.
@mih so the new version's output confirms my suspicion that git-annex sees a change in the stat information of the file (mtime, size, inode) from before it started the transfer to when it finished.
I could imagine eg, that somehow the filesystem is not preserving stable inode numbers, and so the inode might appear different, without anything having actually changed.
It also seems possible that git-annex might somehow fail to stat the file either before or afterwards. In either case that would result in the same message. Maybe there's some way that the file gets deleted at the same time it's being transferred. Maybe the stat call fails for some other reason.
A strace of git-annex seems like the next step, if you can reproduce this somewhat reliably. I suggest a
strace -v
, which will display the full results of the stat() calls. The stats of the object file done before and after are the only part of the strace that we should need.I don't think it's a case of the file being deleted file it's being transferred. It is possible to do that, just
git annex drop
while a transfer is in progress. But that transfer fails in a different way:(This stat check is not strictly necessary, at least when annex.verify is enabled, so removing it is a possibility. Or perhaps not making a failure to stat be treated as a problem. But I need to understand what's happening first.)
Also seems possible that the keys db contains an inode cache for the object that is somehow out of date. It would then only stat the object once, which would succeed, but would differ from the previous value.
I ran
PATH=/tmp/dl-build-eyhm_8vt/git-annex.linux:$PATH strace -v -f -e trace=%%stat git annex get sub-001/meg/sub-001_task-memento_proc-sss_meg-1.fif &> annex_strace_stat.txt
to produce a log usinggit-annex version: 8.20210715-g39d747936
.-f
was needed as the real action is taking place in a fork. And here is thestat
evolution of the transfer file from that log (leaving the "changed" messages in to give a sense, when those happen):So without having a clear understanding of what is happening inside the key properties
mtime
,size
, andinode
ae stable (except that the file vanishes in some intermediatestat
attempts). What changes isatime
(no surprise, I guess), but alsoctime
, and the permissions.You grepped for stats of a file in
tmp
directory but those stats are not relevant to this problem. Since you ran git-annex get, it should be statting the annex object file in the remote repository, which I assume is another git repo on the same computer. (Otherwise it would not be able to fail this way AFAICS.) It's the stat of that file that's relevant.I've added some debugging now, so --debug with a fresh git-annex build should explain what's going on with the stats. Use -c annex.debugfilter=Annex.InodeSentinal,Annex.Content to limit debug output to the relevant ones.
I can certianly reproduce this if I manually get the keys database into a state where it has the wrong inode recorded for the object file.
I have not found a way to make that happen using git-annex, but I'm guessing it might involve annex.thin being set, and almost certianly involves unlocked files. Some sequence of events is happening in the git repo that later prevents getting content from it, even though the content is still there.
It kind of looks like Command.Lock might neglect to update the keys database when it repopulates the annex object from an unmodified associated file in the case where it was hard linked to another, unlocked file and so got modified. I have not quite managed to make that reproduce it though.
But that's just guessing and the real situation could be something different. Someone needs to figure out how to reproduce this, until then I'm probably stuck.
(Enough other things could break due to this kind of database inconsistency that simply removing the check when transferring the file would only paper over the problem.)
One way to reproduce this is to replace the git repository with a copy produced with eg
cp -r
(not-a
), with the file unlocked before that point. Eg:I doubt this is what people have been doing to see the behavior though.
I've made it fall back to checking the file's content when the inode cache is somehow stale. I expect this will solve the problem. But, I would still like to know how to reproduce the problem, because if something is making the inode cache go stale, this fallback check will need to hash the file, which could make git-annex significantly more expensive.
I tried with the update from yesterday and it checksumed the file first, and only then started the download -- which matches the code change, I think.
After the update in the source dataset, I can now also
get
the file with git annex 8.20210310.I am afraid that I have no idea how to replicate the original state. It was a rather organic process that involved computation across a cluster. Each compute node operated on its own branch of a clone, the branches were pushed back and merged. A final
git annex fsck
seemingly confirmed that all is kosher.I tried this with two affected files, but I have a few more left. If you let me know what exactly would be meaningful input for you to get a sense of what is going on, I'd be happy to provide that. Would a full
strace
log (analog to the above) be suitable?I don't think a strace will tell me anything now.
Is this the same repository that this issue was regarding? https://github.com/datalad/datalad/issues/5613 If not, that does seem like the same problem so perhaps we could learn something from that repository.
Was this repository upgraded from git-annex repository format 7 to 8? If so, I see one way this could have happened. That upgrade regenerates the keys db, but does not add inode caches for annex objects.
I posted an update to https://github.com/datalad/datalad/issues/5613#issuecomment-888813877 Below is an edited version:
The repository this issue is referring to is indeed old (started in 2015). So an upgrade from 7 (or earlier) is a potential cause. It is currently a version 8 repo.
Moreover, I can still replicate the exact same issue with
git-annex version: 8.20210310
. I have not attempted to use the newest git-annex to preserve the state of this repo.I can confirm, however, that this issue is unrelated to the special datalad special remote implementation. I created a plain new remote on the local filesystem, and I can see the same behavior when attempting to copy the key to it:
That being said, the cause cannot (only) be a v7 to v8 upgrade. The repository concerning https://github.com/datalad/datalad/issues/5750 was created in April 2021 (v8 from the very start), and there are more recent examples too.
In both cases a
git annex fsck
was attempted in the repository from which a key was to be transferred from. In both cases that did not fix the issue. As far as I understand the fix made in 8.20210715-g3b5a3e168 forces a file checksuming before transfer. This is an expensive change, and thefsck
should have already performed the same checksuming. Shouldfsck
not also be able to update the respective key property caches?There are probably multiple causes. v8 upgrade does seem likely to be the main one, but I see that adswa seems to have found a way to reproduce it from scratch: https://github.com/datalad/datalad/issues/5750#issuecomment-888819347 It's looking like perhaps inode instability in their case, due to use of a networked filesystem in a cluster.
I have done some detailed auditing, and I'm pretty sure git-annex doesn't just forget to record the inode in some case. Well, except for potentially one case involving git-annex lock and annex.thin, which I fixed but was not able to actually quite get to reproduce the problem.
(I did find a some potential races, that would cause other problems but not this one, so the auditing was good for something..)
I agree, fsck should notice and correct this problem. The current fix papers over the problem it in a way that will prevent users noticing it. Fsck being noisy about it will help avoid sweeping it under the rug and forgetting about it. Implemented that.
The performance impact here is relatively small. Aside from whenever this problem occurs, the extra checksumming will only happen when files are unlocked and the object file is modified. So annex.thin would need to be set and a file modified in a way that affects the object file. Several other parts of git-annex also checksum in that situation; that's a perf price you pay for using annex.thin. When this problem does occur, it will only checksum the object file once, and then will get its inode information cached.