Back working on
git annex get --jobs=N today. It was going very well,
until I realized I had a hard problem on my hands.
The hard problem is that the AnnexState structure at the core of git-annex is not able to be shared amoung multiple threads at all. There's too much complicated mutable state going on in there for that to be feasible at all.
In the git-annex assistant, which uses many threads, I long ago worked around this problem, by having a single shared AnnexState and when a thread needs to run an Annex action, it blocks until no other thread is using it. This worked ok for the assistant, with a little bit of thought to avoid long-duration Annex actions that could stall the rest of it.
That won't work for concurrent
get etc. I spent a while investigating maybe
making AnnexState thread safe, but it's just not built for it. Too many
ways that can go wrong. For example, there's a CatFileHandle in the
AnnexState. If two threads are running, they can both try to talk to the
git cat-file --batch command at once, with bad results. Worse, yet,
some parts of the code do things like modifying the AnnexState's Git repo
to add environment variables to use when running git commands.
It's not all gloom and doom though. Only very isolated parts of the code
change the working directory or set environment variables. And the
assistant has surely smoked out other thread concurrency problems already.
git-annex programs can be run concurrently with no problems
at all; it uses file locking to avoid different processes getting in
each-others' way. So AnnexState is the only remaining obstacle to concurrency.
So, here's how I've worked around it: When
git annex get -J10 is run,
it will start by allocating 10 job slots. A fresh AnnexState will be
created, and copied into each slot. Each time a job runs, it uses its
slot's own AnnexState. This means 10
git cat-file processes,
and maybe some contention over lock files, but generally, a nice, easy,
and hopefully trouble-free multithreaded mode.
And indeed, I've gotten
git annex get -J10 working robustly!
And from there it was trivial to enable -J for
The only real blocker to merging the concurrentprogress branch is some bugs in the ascii-progress library that make it draw very scrambled progress bars the way git-annex uses it.