get from p2phttp sometimes stalls

git-annex get from a p2phttp remote sometimes stalls out.

This has been observed when using loopback. Eg, run in one repo, which contains about 1000 annexed files of size 1 mb each:

git-annex p2phttp -J2 --bind 127.0.0.1 --wideopen

Then in a clone:

git config remote.origin.annexUrl annex+http://localhost/git-annex/
while true; do git-annex get --from origin -J20; git-annex drop; done

The concurrency is probably not strictly needed to reproduce this. But it makes it more likely to occur sooner, at least.

The total stall looks like this:

1%    7.82 KiB          6 MiB/s 0s

Here is another one:

1%    7.82 KiB          6 MiB/s 0s

The progress display never updates. Every time I've seen the total stall, it's been at 7.82 KiB, which seems odd.

Looking at the object in .git/annex/tmp, it has the correct content, but is 4368 bytes short of the full 1048576 byte size. I've verified this is the case every time. So it looks like the client didn't get the final chunk of the file in the response.

Note that, despite p2phttp being run with -J2, so only supporting 2 concurrent get operations, interrupting the git-annex get that stalled out and running it again does not block waiting for the server. So p2phttp seems to have finished processing the request. Or possibly failed in a way that returns a worker to the pool. --Joey

Initial investigation in serveGet seems to show it successfully sending the whole object. At least up to fromActionStep, I've not verified servant always does the right thing with that or doesn't close the connection early sometimes.

Using curl as the client and seeing if it always receives the whole object would be a good next step. --Joey

fixed --Joey

RSS Atom

comment 1

I saw this bug with git-annex built using haskell packages from current debian unstable.

On a hunch, I tried a stack build, and it does not stall. However, I am seeing this from the http server at about the same frequency as the stall, and occuring during the git-annex get:

thread blocked indefinitely in an STM transaction

And at the same time, this is reported on the client side:

get 27 (from origin...)
  HttpExceptionRequest Request {
    host                 = "localhost"
    port                 = 9417
    secure               = False
    requestHeaders       = [("Accept","application/octet-stream")]
    path                 = "/git-annex/a697daef-f8c3-4e64-a3e0-65927e36d06b/v4/k
    queryString          = "?clientuuid=9bc0478c-a0ff-4159-89ab-14c13343beb9&ass
    method               = "GET"
    proxy                = Nothing
    rawBody              = False
    redirectCount        = 10
    responseTimeout      = ResponseTimeoutDefault
    requestVersion       = HTTP/1.1
    proxySecureMode      = ProxySecureWithConnect
  }
   IncompleteHeaders
ok

(I assume that it succeeded because it did an automatic retry when the first download was incomplete.)

I also tried using the stack build for the server, and the cabal build for the client, with the same result. With the cabal build for the server and stack build for the client, it stalls as before.

So it's a bug on the server side, and whatever it is causes one of the threads to get killed in a way that causes another STM transaction to deadlock. And the runtime happenes to detect the deadlock and resolve it when built with stack.

Comment by joey — Thu Nov 13 17:22:33 2025

comment 2

Using DebugLocks, found that the deadlock is in checkvalidity, the second time it calls putTMVar endv ().

That was added in 7bd616e169827568c4ca6bc6e4f8ae5bf796d2d8 "a bugfix to serveGet, it hung at the end".

Looks like a race between checkvalidity and waitfinal, which both fill endv. waitfinal does not deadlock when endv is already full, but checkvalidity does.

Comment by joey — Thu Nov 13 17:53:48 2025

Add a comment