annex.stalldetection prevents git-annex get from restaging unlocked files

Please describe the problem.

git status reports having staged changes and no changes from index

(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   .dandi/assets.json

no changes added to commit (use "git add" and/or "git commit -a")

(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json

although git shows no diff and sha256 checksum corresponds to the key:

(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date:   Fri Sep 16 22:22:29 2022 +0000

    [backups2datalad] 66 files added

diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14  .dandi/assets.json

I think may be the tricky part is that I have it of

(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config annex.version
10

although I thought that we kept it at 8 but I have user wider config setting

(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config filter.annex.process
git-annex filter-process

I was recommended to speed up operations while avoiding upgrade to 10, but I guess running most recent version once lead to the upgrade since all the other repos are still at 8 as I thought it would be

(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ grep -h '\<version =' ../*/.git/config | sort | uniq -c
      1         version = 10
    186         version = 8

having it reported modified causes our script which does sanity check to operate only on clean repo to fail.

git reset --hard seems mitigated that

(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git reset --hard
HEAD is now at b859efed7d [backups2datalad] 66 files added
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

nothing to commit, working tree clean

all. I will now rerun our script and see in what state I would end up (although, once again, I ended up in version 10 of the repo already, so may be behavior would be different).

What steps will reproduce the problem?

I think I get it after I annex move and then annex get that file back. Just for my own reference -- git-annex repo is result of the https://github.com/dandi/dandisets/blob/draft/tools/backups2datalad-update-cron

What version of git-annex are you using? On what operating system?

10.20220822-g84f1875 (conda build), originally observed on earlier 10.20220724-ge30d846

fixed --Joey

RSS Atom

comment 1

Is .dandi/assets.json an unlocked file?

git diff --cached seems like the wrong thing to run, because that would show changes that you have staged for commit. This change is one that has not been staged for commit. So git diff should show it.

Comment by joey — Wed Sep 21 17:05:51 2022

Remove comment

comment 2

d'oh forgot to show that I have tried that one too. Here is everything at once again with git diff and again doing checksums (that should have been different in my prev examples as well if different only in tree but not in index):

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   .dandi/assets.json


It took 3.19 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
no changes added to commit (use "git add" and/or "git commit -a")
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14  .dandi/assets.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date:   Fri Sep 16 22:22:29 2022 +0000

    [backups2datalad] 66 files added

diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json

Comment by yarikoptic — Wed Sep 21 18:46:50 2022

Remove comment

comment 3

the workaround you suggest elsewhere for "cosmetic" problem works here too

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   .dandi/assets.json

no changes added to commit (use "git add" and/or "git commit -a")
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git update-index -q --refresh .dandi/assets.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

nothing to commit, working tree clean

but since we are relying on output from status, it is not just a "cosmetic" issue. IMHO if such update-index is needed, it should have been done by git-annex automagically somehow/sometime.

Comment by yarikoptic — Wed Sep 21 18:49:06 2022

Remove comment

comment 4

So you can reproduce this? I am pretty sure it's not as simple as a drop followed by a get, so more information about reproducing it seems crucial.

I assume you are not seeing the "This is only a cosmetic problem affecting git status" message?

I expect that running git update-index --refresh .dandi/assets.json
will fix git status. Can you confirm?

The only way I know of that this can happen without the message is if a drop or a get is still running, or gets interrupted. One of the last things git-annex before exiting is restage all the unlocked files that it has updated.

Short of that, it seems like it would have to be a bug that prevents restagePointerFile from working. Which might not be a bug in git-annex, if the problem involves git's handling of timestamps in the index, for example. (Which is known to have some odd behaviors.)

(git-annex could be improved to do the restaging later when interrupted and possibly after such a bug. But there's no way to make it recover in git status, because git doesn't run it in this situation.)

Comment by joey — Wed Sep 21 19:19:08 2022

Remove comment

comment 5

Seems likely that the --time-limit option, when combined with -J, could result in git-annex exiting before a worker thread gets a chance to call stagePointerFile. I have not verified this, and it would be unlikely to result in the same file being affected reproducibly.

Comment by joey — Wed Sep 21 22:06:49 2022

Remove comment

comment 6

may be it one of those options, in my case - it is just a straight get on that single unlocked file:

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

nothing to commit, working tree clean
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ cat .dandi/assets.json
/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex get .dandi/assets.json
get .dandi/assets.json (from dandi-dandisets-dropbox...)
(checksum...) ok
(recording state in git...)
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   .dandi/assets.json

no changes added to commit (use "git add" and/or "git commit -a")

Comment by yarikoptic — Thu Sep 22 01:03:18 2022

Remove comment

comment 7

sorry I have not mentioned your earlier comment 4 but my clarification above I think gives the answers to your questions ;)

FWIW here is the get --debug output

[2022-09-21 21:29:59.904218] (Utility.Process) process [3968193] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--",".dandi/assets.json"]
[2022-09-21 21:29:59.904725] (Utility.Process) process [3968194] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-09-21 21:29:59.905645] (Utility.Process) process [3968195] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-09-21 21:29:59.906012] (Utility.Process) process [3968196] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2022-09-21 21:29:59.907578] (Utility.Process) process [3968196] done ExitSuccess
[2022-09-21 21:29:59.907891] (Utility.Process) process [3968197] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2022-09-21 21:29:59.913611] (Utility.Process) process [3968197] done ExitSuccess
[2022-09-21 21:29:59.914676] (Utility.Process) process [3968198] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..5f5efa8544ff02c9261dd1590425dcea37a55526","--pretty=%H","-n1"]
[2022-09-21 21:29:59.916707] (Utility.Process) process [3968198] done ExitSuccess
[2022-09-21 21:29:59.916968] (Utility.Process) process [3968199] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..18497e6e9cab7754a85256416c361fee36ba65b2","--pretty=%H","-n1"]
[2022-09-21 21:29:59.918722] (Utility.Process) process [3968199] done ExitSuccess
[2022-09-21 21:29:59.919069] (Utility.Process) process [3968200] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
get .dandi/assets.json [2022-09-21 21:29:59.921463] (Utility.Process) process [3968202] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
(from dandi-dandisets-dropbox...) [2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex ["transferrer","-c","annex.debug=true"]
[2022-09-21 21:29:59.93162] (Annex.TransferrerPool) > d rdandi-dandisets-dropbox SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json .dandi/assets.json
[2022-09-21 21:29:59.942599] (Annex.TransferrerPool) < opb

[2022-09-21 21:29:59.942718] (Annex.TransferrerPool) < ops 69507227
[2022-09-21 21:30:03.103409] (Annex.TransferrerPool) < ope
[2022-09-21 21:30:03.103539] (Annex.TransferrerPool) < om (checksum...) 
(checksum...) [2022-09-21 21:30:03.768599] (Annex.TransferrerPool) < t
[2022-09-21 21:30:03.768843] (Annex.Branch) read 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
[2022-09-21 21:30:03.770259] (Annex.Branch) set 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
ok
[2022-09-21 21:30:03.770361] (Utility.Process) process [3968200] done ExitSuccess
[2022-09-21 21:30:03.770425] (Utility.Process) process [3968195] done ExitSuccess
[2022-09-21 21:30:03.770484] (Utility.Process) process [3968194] done ExitSuccess
[2022-09-21 21:30:03.770531] (Utility.Process) process [3968193] done ExitSuccess
[2022-09-21 21:30:03.771187] (Utility.Process) process [3968452] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","hash-object","-w","--stdin-paths","--no-filters"]
[2022-09-21 21:30:03.77319] (Utility.Process) process [3968453] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-index","-z","--index-info"]
[2022-09-21 21:30:04.063182] (Utility.Process) process [3968453] done ExitSuccess
[2022-09-21 21:30:04.063779] (Utility.Process) process [3968463] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2022-09-21 21:30:04.065352] (Utility.Process) process [3968463] done ExitSuccess
(recording state in git...)
[2022-09-21 21:30:04.06587] (Utility.Process) process [3968464] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","write-tree"]
[2022-09-21 21:30:04.407935] (Utility.Process) process [3968464] done ExitSuccess
[2022-09-21 21:30:04.408528] (Utility.Process) process [3968468] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","56c62dcc21145201f9454a2dd6e75cc37f072ee4","--no-gpg-sign","-p","refs/heads/git-annex"]
[2022-09-21 21:30:04.410591] (Utility.Process) process [3968468] done ExitSuccess
[2022-09-21 21:30:04.413623] (Utility.Process) process [3968469] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","c3a1f9208649b47621b1424b055bd9871aa2fc79"]
[2022-09-21 21:30:04.415318] (Utility.Process) process [3968469] done ExitSuccess
[2022-09-21 21:30:04.416301] (Utility.Process) process [3968202] done ExitSuccess
[2022-09-21 21:30:04.416574] (Utility.Process) process [3968452] done ExitSuccess
[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1

Comment by yarikoptic — Thu Sep 22 01:33:24 2022

Remove comment

comment 8

I've fixed the issue I found with --timestamp combined with -J. Which I do think could have resulted in the same kind of problem. But you've shown that is not the cause in your case..

Comment by joey — Thu Sep 22 17:02:04 2022

Remove comment

comment 9

Thanks for the --debug. It shows that git-annex is not running git update-index --refresh at all.

And it shows that the transfer happens in a git-annex transferrer process. So, I think you have annex.stalldetection set.

[2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex ["transferrer","-c","annex.debug=true"]

And interestingly, that transferrer process fails at the end:

[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1

Aha! I can reproduce it by setting annex.stalldetection.

Comment by joey — Thu Sep 22 17:04:35 2022

Remove comment

comment 10

damn, I should have shared my config! I also do have annex.stalldetection set!

[annex]
    stalldetection = 1KB/120s

never thought it might be related. We should look into having some matrix test run with such config set.

Comment by yarikoptic — Thu Sep 22 17:34:35 2022

Remove comment

comment 11

Yeah, a whole git-annex test run with stalldetection set would have found this bug. Which seems a bit heavy-weight for the test suite to try as a separate pass by default. But then again, stalldetection does significantly change how git-annex operates since it has to fork off child processes that it can kill when they stall.

Comment by joey — Thu Sep 22 17:38:45 2022

Remove comment

comment 12

So, git-annex transferrer, after downloading the content, does handle populating pointer files. So it calls restagePointerFile to register a cleanup action.

Whatever is making that process exit 1 must be preventing the cleanup action from being run. And I think what that is, is that its stdout handle gets closed at the same time its stdin handle is closed. I tried running git-annex transferrer manually and feeding it a transfer request on stdin. After its stdin was closed, it proceeded to send "om (recording state in git...)\n" to stdout, and that would fail with stdout already closed.

Worse, I suspect there's another problem.. When a stall actually is detected, git-annex kills the git-annex transferrer process that has stalled. But suppose that process has already successfully downloaded some content and populated pointer files. Killing it would prevent it from running restagePointerFile on those. It seems that to solve this, it would need to communicate back to the parent what pointer files need to be restaged. (Which would also solve the exit 1 problem, although not necessarily in the best way.)

Also, I think that multiple processes running the restagePointerFile cleanup action at the same time can be a problem, because one will lock the index and the rest will fail to restage. Not what's happening here, but with -J, there would be multiple git-annex transferrer processes doing that at the same time at the end.

Comment by joey — Thu Sep 22 17:40:57 2022

Remove comment

comment 12

Adding a matrix run where I initiated a custom config settings to our datalad/git-annex CI run. Let's see how that goes. May be some other interesting config settings to add there? e.g. retries etc? or global ~/.gitconfig is not used/mocked away during tests? (e.g. we do that in datalad, so I had to trick that in PR against datalad to test against this setting being set)

Comment by yarikoptic — Thu Sep 22 18:14:15 2022

Remove comment

comment 13

Avoided the early stdout handle close, and that did fix this bug as reported.

The related problems I identified in comment #12 are still unfixed, so leaving this open for now.

I think what ought to be done to wrap this up is make restagePointerFile record the files that need to be restaged in a log file. Then at shutdown, git-annex can read the log file, and restage everything listed in it. This will solve multiple problems:

When a previous git-annex process was interrupted after a get/drop of an unlocked file, the file will be in the log, so git-annex can notice that and handle the restaging.
When a stalled git-annex transferrer is killed, the parent git-annex will read the log and handle the restaging that it was not able to do.
When multiple processes are trying to restage files at the same time, an exclusive lock can be used to make only one of them run, and it can handle restaging the files that the others have recorded in the log too.
As a bonus, in the situations where git-annex is legitimately unable to restage files, it can still record them to be restaged later. And the "only a cosmetic problem" message can tell the user to run a single simple git-annex command, rather than a complicated git update-index command per file.

Comment by joey — Thu Sep 22 18:16:22 2022

Remove comment

comment 15

@yarikoptic oh, git-annex test does prevent global gitconfig from influeencing the tests. So your matrix test won't work if you're running git-annex test in it. If you're running other git-annex commands in datalad's test suite, it would work though.

I've opened ?specify gitconfig for test suite.

Comment by joey — Thu Sep 22 18:42:06 2022

Remove comment

status update

I've implemented the log file. The stalled transferrer case is now handled. This bug is fixed.

As to a few other cases I considered in comments upthread:

When a get/drop was interrupted before it could restage, the next get/drop will cause the necessary restaging for the interrupted process to happen. However, this doesn't help if there's nothing left to get/drop. Should git-annex always run restagePointerFiles on shutdown? That would make any git-annex command handle the restaging. But it doesn't seem right for query commands to do potentially a lot of work to handle this case. Anyway, I don't think this needs to be dealt with in this bug report.

When multiple processes try to restage at the same time, one will restage everything that all of them logged. The others will still display a warning to the user that they couldn't restage. It would be hard to avoid displaying that warning, since it does need to warn when it was unable to restage because git has the index locked at the time. Anyway, I think it's ok to display the message despite the files having been restaged, because it's the same as a later git-annex process handling the restaging. (It does seem like two transferrers belonging to the same parent could collide in this way, and one display the warning, which isn't great..)

I also implemented a "git-annex restage" command that is an easier way to restage in the cases where git-annex is not able to do it itself.

Comment by joey — Fri Sep 23 19:57:38 2022

Remove comment

Add a comment