Beating my head against the threaded runtime some more. I can reproduce one of the hangs consistently by running 1000 git annex add commands in a loop. It hangs around 1% of the time, reading from git cat-file.

Interestingly, git cat-file is not yet running at this point -- git-annex has forked a child process, but the child has not yet exec'd it. Stracing the child git-annex, I see it stuck in a futex. Adding tracing, I see the child never manages to run any code at all.

This really looks like the problem is once again in MissingH, which uses forkProcess. Which happens to come with a big warning about being very unsafe, in very subtle ways. Looking at the C code that the newer process library uses when sparning a pipe to a process, it messes around with lots of things; blocking signals, stopping a timer, etc. Hundreds of lines of C code to safely start a child process, all doing things that MissingH omits.

That's the second time I've seemingly isolated a hang in the GHC threaded runtime to MissingH.

And so I've started converting git-annex to use the new process library, for running all its external commands. John Goerzen had mentioned process to me once before when I found a nasty bug in MissingH, as the cool new thing that would probably eliminate the System.Cmd.Utils part of MissingH, but I'd not otherwise heard much about it. (It also seems to have the benefit of supporting Windows.)

This is a big change and it's early days, but each time I see a hang, I'm converting the code to use process, and so far the hangs have just gone away when I do that.


Hours later... I've converted all of git-annex to use process.

In the er, process, the --debug switch stopped printing all the commands it runs. I may try to restore that later.

I've not tested everything, but the test suite passes, even when using the threaded runtime. MILESTONE

Looking forward to getting out of these weeds and back to useful work..


Hours later yet.... The assistant branch in git now uses the threaded runtime. It works beautifully, using proper threads to run file transfers in.

That should fix the problem I was seeing on OSX yesterday. Too tired to test it now.

--

Amazingly, all the assistant's own dozen or so threads and thread synch variables etc all work great under the threaded runtime. I had assumed I'd see yet more concurrency problems there when switching to it, but it all looks good. (Or whatever problems there are are subtle ones?)

I'm very relieved. The threaded logjam is broken! I had been getting increasingly worried that not having the threaded runtime available would make it very difficult to make the assistant perform really well, and cause problems with the webapp, perhaps preventing me from using Yesod.

Now it looks like smooth sailing ahead. Still some hard problems, but it feels like with inotify and kqueue and the threaded runtime all dealt with, the really hard infrastructure-level problems are behind me.