Please describe the problem.
git status reports having staged changes and no changes from index
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use "git add" and/or "git commit -a")
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json
although git shows no diff and sha256 checksum corresponds to the key:
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date: Fri Sep 16 22:22:29 2022 +0000
[backups2datalad] 66 files added
diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
I think may be the tricky part is that I have it of
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config annex.version
10
although I thought that we kept it at 8 but I have user wider config setting
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config filter.annex.process
git-annex filter-process
I was recommended to speed up operations while avoiding upgrade to 10, but I guess running most recent version once lead to the upgrade since all the other repos are still at 8 as I thought it would be
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ grep -h '\<version =' ../*/.git/config | sort | uniq -c
1 version = 10
186 version = 8
having it reported modified causes our script which does sanity check to operate only on clean repo to fail.
git reset --hard
seems mitigated that
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git reset --hard
HEAD is now at b859efed7d [backups2datalad] 66 files added
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
all. I will now rerun our script and see in what state I would end up (although, once again, I ended up in version 10 of the repo already, so may be behavior would be different).
What steps will reproduce the problem?
I think I get it after I annex move
and then annex get
that file back. Just for my own reference -- git-annex repo is result of the https://github.com/dandi/dandisets/blob/draft/tools/backups2datalad-update-cron
What version of git-annex are you using? On what operating system?
10.20220822-g84f1875 (conda build), originally observed on earlier 10.20220724-ge30d846
Is .dandi/assets.json an unlocked file?
git diff --cached
seems like the wrong thing to run, because that would show changes that you have staged for commit. This change is one that has not been staged for commit. Sogit diff
should show it.d'oh forgot to show that I have tried that one too. Here is everything at once again with
git diff
and again doing checksums (that should have been different in my prev examples as well if different only in tree but not in index):the workaround you suggest elsewhere for "cosmetic" problem works here too
but since we are relying on output from
status
, it is not just a "cosmetic" issue. IMHO if suchupdate-index
is needed, it should have been done by git-annex automagically somehow/sometime.So you can reproduce this? I am pretty sure it's not as simple as a drop followed by a get, so more information about reproducing it seems crucial.
I assume you are not seeing the "This is only a cosmetic problem affecting git status" message?
I expect that running
git update-index --refresh .dandi/assets.json
will fix git status. Can you confirm?
The only way I know of that this can happen without the message is if a drop or a get is still running, or gets interrupted. One of the last things git-annex before exiting is restage all the unlocked files that it has updated.
Short of that, it seems like it would have to be a bug that prevents restagePointerFile from working. Which might not be a bug in git-annex, if the problem involves git's handling of timestamps in the index, for example. (Which is known to have some odd behaviors.)
(git-annex could be improved to do the restaging later when interrupted and possibly after such a bug. But there's no way to make it recover in
git status
, because git doesn't run it in this situation.)Seems likely that the --time-limit option, when combined with -J, could result in git-annex exiting before a worker thread gets a chance to call stagePointerFile. I have not verified this, and it would be unlikely to result in the same file being affected reproducibly.
may be it one of those options, in my case - it is just a straight
get
on that single unlocked file:sorry I have not mentioned your earlier comment 4 but my clarification above I think gives the answers to your questions ;)
FWIW here is the get --debug output
I've fixed the issue I found with --timestamp combined with -J. Which I do think could have resulted in the same kind of problem. But you've shown that is not the cause in your case..
Thanks for the --debug. It shows that git-annex is not running
git update-index --refresh
at all.And it shows that the transfer happens in a
git-annex transferrer
process. So, I think you have annex.stalldetection set.And interestingly, that transferrer process fails at the end:
Aha! I can reproduce it by setting annex.stalldetection.
damn, I should have shared my config! I also do have
annex.stalldetection
set!never thought it might be related. We should look into having some matrix test run with such config set.
Yeah, a whole git-annex test run with stalldetection set would have found this bug. Which seems a bit heavy-weight for the test suite to try as a separate pass by default. But then again, stalldetection does significantly change how git-annex operates since it has to fork off child processes that it can kill when they stall.
So,
git-annex transferrer
, after downloading the content, does handle populating pointer files. So it calls restagePointerFile to register a cleanup action.Whatever is making that process exit 1 must be preventing the cleanup action from being run. And I think what that is, is that its stdout handle gets closed at the same time its stdin handle is closed. I tried running
git-annex transferrer
manually and feeding it a transfer request on stdin. After its stdin was closed, it proceeded to send"om (recording state in git...)\n"
to stdout, and that would fail with stdout already closed.Worse, I suspect there's another problem.. When a stall actually is detected, git-annex kills the
git-annex transferrer
process that has stalled. But suppose that process has already successfully downloaded some content and populated pointer files. Killing it would prevent it from running restagePointerFile on those. It seems that to solve this, it would need to communicate back to the parent what pointer files need to be restaged. (Which would also solve the exit 1 problem, although not necessarily in the best way.)Also, I think that multiple processes running the restagePointerFile cleanup action at the same time can be a problem, because one will lock the index and the rest will fail to restage. Not what's happening here, but with -J, there would be multiple
git-annex transferrer
processes doing that at the same time at the end.~/.gitconfig
is not used/mocked away during tests? (e.g. we do that in datalad, so I had to trick that in PR against datalad to test against this setting being set)Avoided the early stdout handle close, and that did fix this bug as reported.
The related problems I identified in comment #12 are still unfixed, so leaving this open for now.
I think what ought to be done to wrap this up is make restagePointerFile record the files that need to be restaged in a log file. Then at shutdown, git-annex can read the log file, and restage everything listed in it. This will solve multiple problems:
git-annex transferrer
is killed, the parent git-annex will read the log and handle the restaging that it was not able to do.git update-index
command per file.@yarikoptic oh,
git-annex test
does prevent global gitconfig from influeencing the tests. So your matrix test won't work if you're runninggit-annex test
in it. If you're running other git-annex commands in datalad's test suite, it would work though.I've opened specify gitconfig for test suite.
I've implemented the log file. The stalled transferrer case is now handled. This bug is fixed.
As to a few other cases I considered in comments upthread:
When a get/drop was interrupted before it could restage, the next get/drop will cause the necessary restaging for the interrupted process to happen. However, this doesn't help if there's nothing left to get/drop. Should git-annex always run restagePointerFiles on shutdown? That would make any git-annex command handle the restaging. But it doesn't seem right for query commands to do potentially a lot of work to handle this case. Anyway, I don't think this needs to be dealt with in this bug report.
When multiple processes try to restage at the same time, one will restage everything that all of them logged. The others will still display a warning to the user that they couldn't restage. It would be hard to avoid displaying that warning, since it does need to warn when it was unable to restage because git has the index locked at the time. Anyway, I think it's ok to display the message despite the files having been restaged, because it's the same as a later git-annex process handling the restaging. (It does seem like two transferrers belonging to the same parent could collide in this way, and one display the warning, which isn't great..)
I also implemented a "git-annex restage" command that is an easier way to restage in the cases where git-annex is not able to do it itself.