Recent comments posted to this site:

comment 1

Seems I was misremembering details of how ghc's "capabilities" work. From its manual:

Each capability can run one Haskell thread at a time, so the number of capabilities is equal to the number of Haskell threads that can run physically in parallel. A capability is animated by one or more OS threads; the runtime manages a pool of OS threads for each capability, so that if a Haskell thread makes a foreign call (see Multi-threading and the FFI) another OS thread can take over that capability.

Currently git-annex raises the number of capabilities to the -J value.

Probably the thread pool starts at 2 threads to have one spare preallocated for the first FFI call, explaining why each -J doubles the number of OS threads.

I think would make sense to have a separate option that controls the number of capabilities. Then you could set that to eg 2, and set a large -J value, in order to have git-annex p2phttp allow serving a large number of concurrent requests, threaded on only 2 cores.

Also, it does not seem to make sense for the default number of capabilities, with a high -J value, to exceed the number of cores. As you noticed, each capability uses some FDs, for eventfd, eventpoll, I'm not sure what else.

Comment by joey
comment 1

I don't reproduce this:

joey@darkstar:~/tmp/bench/r>git-annex export --fast master --to rsync
(recording state in git...)

Nothing is exported by export --fast (which matches its documentation), and examining the files in the remote's directory, none are deleted or overwritten.

When I later run git-annex push, all files in the tree get exported. In cases where the remote's directory already contained a file with the same name, it is overwritten. That is as expected.

My plan was to add the directory as an exporttree remote, make git-annex think that the current main branches tree should be available there via the fast export, and then do a git annex fsck --from <remote> to discover what's actually there

You could do that with a special remote also configured with importtree=yes. No need to do anything special, just import from the remote, and git-annex will learn what files are on it.

Unfortunately, importtree is not supported by the rsync special remote

Your use case sounds like it might be one that importtree only remotes would support.

Comment by joey
comment 15

Maybe I am overthinking this a bit though, and I should just bump the number up by one or more factors of 2 and see what happens.

I think that would make sense, or even by 1 or 2 orders of magnitude.

Comment by joey
comment 14

Re the number of threads, -J will affect the number of green threads used. (Which will be some constant-ish multiple of the -J value.) Green threads won't show up in htop, only OS-native theads will.

The maximum number of OS-native threads should be capped at the number of cores.

Exactly how many OS-native threads spawn is under the control of the haskell runtime, and it probably spawns an additional OS-native thread per green thread up to the limit.

(It would be possible to limit the maximum number of OS-native threads to less than the number of cores, if that would somehow be useful. It would need a new config setting.)

Comment by joey
comment 13

I might have some misunderstandings about what the -J flag does exactly... So far I assumed that it just sets the number of OS threads that are used as a worker pool to handle requests. In Forgejo-aneksajo it is set to -J2 because of that assumption and there being one p2phttp process per repository (if p2phttp has recently been used with the repository, it is started on demand and stopped after a while of non-usage), so larger values could multiply pretty fast. Your description sounds like it should actually just be a limit on the number of requests that can be handled concurrently, independent of the size of the worker pool. What I am observing when I increase it is that htop shows two new threads when I increment the value by one though.

Could there be a fixed-size (small) worker pool, and a higher number of concurrent requests allowed? I agree that limiting the total resource usage makes a lot of sense, but does it have to be tied to the thread count?

Maybe I am overthinking this a bit though, and I should just bump the number up by one or more factors of 2 and see what happens.

Comment by matrss
comment 12

Re serving more requests than workers, the point of limiting the number workers is that each worker can take a certian amount of resources. The resource may only be a file descriptor and a bit of cpu and memory usually; with proxying it could also include making outgoing connections, running gpg, etc. The worker limit is about being able to control the total amount of resources used.


It would be possible to have an option where p2phttp does not limit the number of workers at all, and the slowloris attack prevention could be left enabled in that mode. Of course then enough clients could overwhelm the server, but maybe that's better for some use cases.

IIRC forgejo-aneksajo runs one p2phttp per repository and proxies requests to them. If so, you need a lower worker limit per p2phttp. I suppose it would be possible to make the proxy enforce its own limits to the number of concurrent p2phttp requests, and then it might make sense to not have p2phttp limit the number of workers.


p2phttp (or a proxy in front if it) could send a 503 response if it is unable to get a worker. That would avoid this slowlaris attack prevention problem. It would leave it up to the git-annex client to retry. Which depends on the annex.retry setting currently. It might make sense to have some automatic retrying on 503 in the p2phttp client.

One benefit of the way it works now is a git-annex get -J10 will automatically use as many workers as the p2phttp server has available, and if 2 people are both running that, it naturally balances out fairly evenly between them, and keeps the server as busy as it wants to be in an efficient way. Client side retry would not work as nicely, there would need to be retry delays, and it would have to time out at some point.

Comment by joey
comment 11

The kill -SIGINT was my mistake; I ran the script using dash and it was its builtin kill that does not accept that.

So, your test case was supposed to interrupt it after all. Tested it again with interruption and my fix does seem to have fixed it as best I can tell.

Comment by joey
comment 10

kill -s SIGINT is not valid syntax (at least not with procps's kill), so kill fails to do anything and a bunch of git-annex processes stack up all trying to get the same files. Probably you meant kill -s INT

That's weird, I checked and both the shell built-in kill that I was using as well as the kill from procps-ng (Ubuntu's build: procps/noble-updates,now 2:4.0.4-4ubuntu3.2 amd64) installed on my laptop accept -s SIGINT.

Anyway, thank you for investigating! I agree being susceptible to DoS attacks is not great, but better than accidentally DoS'ing ourselves in normal usage...

I wonder, would it be architecturally possible to serve multiple requests concurrently with less workers than requests? E.g. do some async/multitasking magic between requests? If that was the case then I suspect this issue wouldn't come up, because all requests would progress steadily instead of waiting for a potentially long time.

Comment by matrss
comment 9

Disabled the slowloris protection. :-/

I also checked with the original test case, fixed to call kill -s INT, and it also passed. I'm assuming this was never a bug about interruption..

Comment by joey