Are there locks that git-annex holds for the entirety of e.g. git-annex-get --batch
?
As part of auto-fetching annexed files on open(), I'd like to keep one or more long-running git-annex-get --batch
processes and pipe the keys I need fetched as needed, rather than start a new git-annex process for each key. But, this only makes sense if, while waiting to read more keys from stdin, git-annex is no holding locks that could interfere with other git-annex calls.
No, there are not.
I'd like to turn this question around. You, and others, seem to have low expectations of git-annex's locking and handling of concurrency. Is there a change to the documentation somewhere that would convey that git-annex handles multiple concurrent operations, without stepping on each other's toes, with extensive fine-grained locking at both the thread and file level?
Because I keep anwering what seems like the same question, over and over. And I don't understand why people seem to expect the worst, it's kind of a downer and it's tedious to keep having to reassure people. Is it something I did? Is it a general expectation that concurrency is hard and so everything will get it wrong? Is it specifically git certianly having poor locking in several places notably around the index and assuming that git-annex will inherit it?
Joey,
I very much value all the work/thought you've put into making git-annex robust, starting with choosing Haskell.
As to why the question keeps coming up...
I often find myself wanting to use git-annex in what seems to me non-standard ways, so it's possible the usage pattern wasn't planned/optimized/tested for. E.g. with
git-annex-get --batch
the typical usage would be to feed a large batch of keys at once, and to not have other git-annex processes running at the time. The git-annex test suite does not test under concurrency. I've run into intermittent failures with concurrent operations, that were fixed by disabling concurrency. I'll try at some point to isolate reproducible examples of these failures, but they do happen quite consistently on my system.I understand that git-annex is parallelism-safe in that parallelism does not cause data loss. But things short of data loss, like intermittent failures/deadlocks, are something I still need to work around, when using git-annex as a building block in larger automated workflows.