bugs/Assistant's transfer bar often stuck at 100% and only SHA instead of file content git-annexhttp://git-annex.branchable.com/bugs/Assistant__39__s_transfer_bar_often_stuck_at_100__37___and_only_SHA_instead_of_file_content_/git-annexikiwiki2020-06-17T01:18:32Zcomment 1http://git-annex.branchable.com/bugs/Assistant__39__s_transfer_bar_often_stuck_at_100__37___and_only_SHA_instead_of_file_content_/comment_1_f88d08035baf75efa1dbbddc296b591a/joey2020-06-17T01:18:32Z2020-05-01T17:53:57Z
<p>I was able to reproduce the progress bar sticking at 100%. I made two local
repos in the assistant, each a remote of the other (not using ssh), and
added a file to one. It reached 100% fairly quickly, and then sat there
for a minute or so, before getting unstuck.</p>
<p>(The bug submitter's log includes "hPutStr: illegal operation (handle is
closed)" and I did not always see that in my test. May be unrelated.)</p>
<p>While stuck like that, I noticed that git-annex info also said the transfer
was still in progress. So apparently transferkeys is being delayed from
finishing the transfer for some reason.</p>
<p>Looking at ps, I noticed that "git-annex smudge --clean -- thefile" was a
child process of transferkeys. (The file was unlocked, and was the file I
had just added and that it was stuck on.) This may be a red herring; I
don't always see that when it's stuck.</p>
<p>On a hunch, I stopped the assistant process that was running in the sibling
repo of the one where I was adding files. So there was only a single
assistant running, not two. That seems to have gotten rid of the delay, or
at least most of it. It still takes around 6s longer than I would have
expected at the end.</p>
<p>Running transferkeys manually, and feeding it requests in its internal
protocol, transfers happened in well under a second. It did not do any
smudging then.</p>
<p>My guess so far is that something about the two assistants causes a race,
which it has to delay for a while to recover from.</p>
comment 2http://git-annex.branchable.com/bugs/Assistant__39__s_transfer_bar_often_stuck_at_100__37___and_only_SHA_instead_of_file_content_/comment_2_c1b86731f0e2fe8ce70931fe4a0d1656/joey2020-06-17T01:18:32Z2020-05-01T18:57:22Z
<p>stracing transferkeys after it got stuck, it was stuck until
this completed:</p>
<pre><code>wait4(461218, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 461218
</code></pre>
<p>That pid was "git --git-dir=../annex2/.git --work-tree=../annex2
--literal-pathspecs --literal-pathspecs -c core.safecrlf=false update-index
-q --refresh -z --stdin", and it did have a git-annex smudge --clean that
seemed stuck for a while.</p>
<p>Then it proceeded doing this:</p>
<pre><code>rename("../annex2/.git/annexindex.0/index", "../annex2/.git/index") = 0
</code></pre>
<p>annex2 is the remote. That temp directory and the git update-index process
says restagePointerFile is what got stuck.</p>
<p>I managed to strace the stuck git-annex smudge --clean, and it's in a loop:</p>
<pre><code>rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
</code></pre>
<p>(Just before that loop, it seemed to be waiting for some
child processes, but I didn't catch which ones they were.)</p>
<p>After many iterations:</p>
<pre><code>rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
munmap(0x7f4438549000, 41947136) = 0
munmap(0x7f443ad4a000, 41947136) = 0
exit_group(0) = ?
+++ exited with 0 +++
</code></pre>