Please describe the problem.
Today I noticed odd commits happening such as
❯ git show 4a157861f3d27a40b38ae441dfe306e45e448c66
commit 4a157861f3d27a40b38ae441dfe306e45e448c66
Author: ReproStim User <changeme@example.com>
Date: Wed Apr 17 09:22:04 2024 -0400
git-annex in reprostim@reproiner:/data/reprostim
diff --git a/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log b/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
index fc930f54..92b79020 100644
--- a/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
+++ b/Videos/2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
@@ -1 +1 @@
-/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log
+/annex/objects/MD5E-s69--08983cc11522233e5d4815e4ef62275a.mkv.log
-- today is April but commits are for files in March...
There is git annex webapp
running which is configured to offload all content to another host.
And actual patch shows that it pretty much annexed the "unlocked link" file after the file was offloaded to remote host.
Do not have a minimal reproducer yet, but I think it happened while
- I had initially .log files which are text going to git
- then I added to
.gitattributes
*.log annex.largefiles=anything
but it was never committed (? I assumed that annex webapp/assistant would do that -- it didn't) -- only now I did that.
- not sure how this morning was special...
The most interesting is that if I annex get
-- I do get correct file...
It is like an inception!!!
On the fresh clone, if I look inside that file I see short key:
❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
/annex/objects/MD5E-s69--08983cc11522233e5d4815e4ef62275a.mkv.log
then, if I annex get
it -- I get content with long key
❯ git annex get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log (from rolando...)
ok
(recording state in git...)
❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log
then upon subsequent get -- I will get the actual content:
❯ git annex get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
get 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log (from rolando...)
ok
(recording state in git...)
❯ head -n 1 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
2024-03-17 14:09:12.551 [info] [685899] Session logging begin : reprostim-videocapture 1.5.0.119, session_logger_2024.03.17.14.09.12.550, start_ts=2024.03.17.14.09.12.550
and dropping it would lead me just to the "long key"
❯ git annex drop 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
drop 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log (locking rolando...) ok
(recording state in git...)
❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log
and will not be able to come out into reality from the 2nd level of inception:
❯ git annex drop 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
❯ cat 2024.03.17.14.09.12.550_2024.03.17.14.09.18.818.mkv.log
/annex/objects/MD5E-s68799--29541299bea3691f430d855d2fb432fb.mkv.log
What version of git-annex are you using? On what operating system?
on original server with webapp: 10.20240227-1~ndall+1
on intermediate server through which transfer of files happens: I think it might be old
[bids@rolando VIDS] > git annex version
git-annex version: 6.20180808-ga1327779a
on laptop where I dive into inception: 10.20240129
with bunch of commits moving files from annex into git etc. Need to kill assistant for now. System has
10.20240129-1~ndall+1
interestingly on the client
git restore --staged PATH
managed to recover the link to become "proper". Andgit-annex restage
did nothing to fix situation withModified
file:First I wanted to see if I could get this to happen without the assistant.
So no, it must be only the assistant that can mess up and add an annexed link to the annex.
Secondly, here's a way to manually create a repository with this behavior w/o using the assistant.
Nothing has gone wrong yet, funky is an unlocked file and it happens to have the content of an annex pointer file, but git-annex is not treating that content as an annex pointer file. If it were, the
git-annex get funky
above would get the SHA256 key from remote x.But in a fresh clone, it's another story:
Which reproduces what you showed. I think this on its own is a bug, leaving aside whatever caused the assistant to generate this.
git-annex add
(and smudge) useisPointerFile
to check if a file that is being added is an annex pointer file. And in that case they stage the pointer file, rather than injecting it into the annex.The assistant also checks
isPointerFile
though. And in the simple case, it also commits a newly added pointer file correctly:So this makes me think of a race condition. What if the file is not a pointer file when the assistant checks
isPointerFile
. But then it gets turned into one before it ingests it.In
git-annex add
, it first stats the file before checking if it's a pointer file, and later it checks if the file has changed while it was being added, which should avoid such races.Looking at the assistant, I'm not at all confident it handles such a race.
It might even be another thread of the assistant that triggered the race. Could be that something caused the assistant to drop the file, then get it again, then drop it again. (Eg something wrong with configuration causing a non-stable state... like "not present" in preferred content).
I've tried running a get/drop/get/drop loop while the assistant is running, and have not seen this happen to a file yet. But the race window is probably small. An interesting thing I did notice is that sometimes when such a loop runs for a while, the file will be left as a pointer file after
git-annex get
.Looking at the behavior of
git-annex get
, the first one leaves the index in a diff state:To the second
git-annex get
, this is indistinguishable from a different unlocked file having been moved over top of funky. So the behavior of the second one is fine.The problem is with the first
git-annex get
leaving the index in that state.What's happening is, it doesn't restage the index, because the restage itself can't tell the difference between this state and an unlocked file having been moved over top of funky. In particular,
git update-index --refresh --stdin
when run after the firstgit-annex get
, and fed "funky", leaves the index in diff state.So git update-index is running
git-annex filter-process
, which is doing the same asgit-annex smudge --clean funky
in this case. And in Command.Smudge.clean, there is aparseLinkTargetOrPointerLazy'
call which is intended to avoid storing a pointer file in the annex... The very thing that the assistant is somehow incorrectly doing. In this case though, that notices that funky's content looks like an annex pointer file, so it outputs that pointer. So git stages that pointer.To avoid this, the first
git-annex get
would need to notice that the content it got looks like a pointer file. And it would need to communicate that through thegit update-index
somehow togit-annex filter-process
. Then when that saw the same pointer file, it could output the original key, and this situation would be avoided. Also bear in mind that thegit update-index
can be interrupted and get restarted later and it would still need to remember that it was dealing with this case then. This seems... doable, but it will not be easy.PS, Full script to synthesize a repository with this situation follows:
Added a 60 second sleep right after the assistant checks isPointerFile, then started the assistant and ran:
Result was 2 commits, first:
Followed by:
574def is the sha256sum of the annex link that I wrote to the file. So this does replicate the bug. Although it's odd that it then put back the annex link in the subsequent commit.
I've fixed this race in the assistant.
Question now is, can this bug be closed, or does it need to be left open, and git-annex made to recover from this situation? Given the complexity of making git-annex notice this, I'm sort of inclined to not have it auto-recover. Manual recovery seems pretty simple, just delete the file and re-add it with the right key.
Thoughts?
I got back to this issue, since even after upgrade of git-annex to
10.20240831-1~ndall+1
and trying on a sample file which I guess was screwed upso, I need to figure out how to actually get that key/file here.
but may be it is actually a separate issue of the unlocked mode since it does drop the file
but then when I get it, it does not actually copy into the tree:
Re git-annex auto recovering, I was talking about that when I wrote:
That seems like a huge can of worms to open. Especially at this late date.
As for manually recovering, you were able to see that
Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
is a file with the problem. The content of the file is an annex pointer file, and the key that points to is the content you want that file to have. So then a simple recovery script is:The other way to recover, which works for me in the test case I posted in comment 5, is to just run
git-annex get
twice on the file. Then rungit commit
on the file.