Please describe the problem.
I have found a really weird commit in my git-annex branch:
* a59dd1c update (il y a 8 heures) <Antoine Beaupré>
* 57f887a update (recovery from race) (recovery from race) (recovery from race) [...]
it repeats that for a looong time. about 12 000 times, to be more precise:
anarcat@marcos:video$ git show 57f887a | wc
5 12686 88850
What steps will reproduce the problem?
Now i have absolutely no idea how I managed that. I got through some pretty dark moments last night trying various levels of git-annex voodoo (including a duplicate repo which was rsync'd to a backup drive so the unique identifier applied to two distinct paths), so I have no idea exactly what happened here.
What version of git-annex are you using? On what operating system?
debian jessie amd64 5.20140412
Please provide any additional information below.
anarcat@marcos:video$ git show 57f887a | tail -c 100
very from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race)
anarcat@marcos:video$ git show 57f887a | head -c 512
commit 57f887a9d766829d00832ad1ee23b2785212d055
Author: Antoine Beaupré <anarcat@koumbit.org>
Date: Sat Apr 19 01:48:18 2014 -0400
update (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery from race) (recovery
that's 80KB for only one commit here - maybe that should be cleaned up? --anarcat
Ah! more information: it seems that 01:48 was the moment i shutdown the assistant in yet another panic...
anarcat@marcos:video$ ls -al .git/annex/daemon.log*
-rw-r--r-- 1 anarcat anarcat 17075 avril 19 09:28 .git/annex/daemon.log
-rw-r--r-- 1 anarcat anarcat 128367 avril 19 01:48 .git/annex/daemon.log.1
an extract from that second logfile:
19/Apr/2014:01:31:38 -0400 [Error#yesod-core] unknown response from git cat-file ("9a73bf01-ed01-450d-a0ab-f20fff47ed32 encryption=none name=stephc rsyncurl=192.168.0.104:video/ type=rsync timestamp=1397865844.925354s","refs/heads/git-annex:remote.log") @(yesod-core-1.2.3:Yesod.Core.Class.Yesod ./Yesod/Core/Class/Yesod.hs:471:5)
19/Apr/2014:01:31:50 -0400 [Error#yesod-core] unknown response from git cat-file ("fe428a7a-25a2-4c2e-b01f-315c490cbe45 encryption=none name=myrsync rsyncurl=/home/anarcat/video/ type=rsync timestamp=1397868063.038898s","refs/heads/git-annex:remote.log") @(yesod-core-1.2.3:Yesod.Core.Class.Yesod ./Yesod/Core/Class/Yesod.hs:471:5)
19/Apr/2014:01:31:57 -0400 [Error#yesod-core] unknown response from git cat-file ("","refs/heads/git-annex:remote.log") @(yesod-core-1.2.3:Yesod.Core.Class.Yesod ./Yesod/Core/Class/Yesod.hs:471:5)
[2014-04-19 01:32:03 EDT] TransferScanner: Syncing with test, mnt
Depuis /mnt/video
* [nouvelle branche] synced/git-annex -> test/synced/git-annex
* [nouvelle branche] synced/master -> test/synced/master
fatal: 'mnt' does not appear to be a git repository
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Already up-to-date.
[2014-04-19 01:32:21 EDT] main: warning git-annex has been shut down
(Recording state in git...)
(Recording state in git...)
(Recording state in git...)
(Recording state in git...)
the last line repeats about 4000 times.
i would love to paste the daemon.log.1 file, but it seems like it containts encryption credentials... which i have no idea how to get rid of or change.
Is there one commit with this long message, or 12 thousand commits each adding another (recovery from race) to the pyramid?
If there is only one message, then it must have tried 12k times to commit to the git-annex branch and each time something else pushed or commited to the git-annex branch and overwrote its commit. This seems statistically unlikely. (Also there's locking to prevent multiple local git-annex processes from committing at the same time.)
There have been a few other unexplained reports of this race detection code repeatedly triggering.
This implies some hasty, perhaps unusual shutdown method, and some unusual situation. I think you could tell me more about what was going on.
this was a single commit.
i am not sure i can extract much more information from my memory: details are hazy as i was working late on a problem that night... sorry!
I've been experiencing the exact same problem and searching for recovery from race lead me to this bug report. Thanks for reporting it!
For a few months, a repo storing ~19'000 files (mostly immutable pictures) started to launch memory hungry "git log" processes. For example:
Thanks to the hints found in this report, I was able to find many huge commit messages such as this one:
There were probably many new files added on Jun 14th and looking for a way to increase to sync speed, especially to a S3-like remote, I found the solution on this wiki for multiple concurrent transfers.
This looks like a likely culprit for generating race conditions. What do you think?
git annex forget
repair stuck on ls-tree command is another case of that, and I got ahold of that repository for analysis.
In that case, there was indeed an inverse pyramid effect where each commit added one more " (recovery from race)" to its parent commit.
The code can clearly loop if it keeps detecting a race and somehow fails to recover from it. Leading to a whole stack of commits with progressively longer messages. I don't see any way to get just one commit with a long message, which comment #1 seems to say happened.
Apparently loops for a while and then succeeds in recovering from the race, since it then stops looping.
I have added additional debug info to the commit message, in hopes of detecting what might be going wrong that causes it to loop.
Seems to me there are two possibilities.
One is that something else is continually changing the git-annex branch in a way that keeps triggering the race. If so, it might make sense for git-annex to do a brief random sleep (a few hundredths of a second) before recovering, to let whatever it is quiet down. I've done so.
The other is some kind of bug where it detects a race when none occurred. Perhaps it's misparsing the commit or git cat-file is failing to output it, and so it's not finding the expected parent refs, for example. But in that case, why would it detect a race for many commits in a row, and then eventually not detect a race anymore?
Also, I've made these messages no longer stack up even if it does go into a loop, which will at least help with the object size bloat, though not with the number of commits bloat.
The only other thing I can think to do about this is to make git-annex block direct pushes to the git-annex branch, by setting up a pre-receive hook.
Then, there should be no way for the race to occur (except for a locking failure, or a pull into the branch?), and git-annex wouldn't need to check for it