Please describe the problem.
Originally reported against datalad #4976. Ran into it again and decided to reproduce by simply trying to make git config
in parallel (they should not crash, right?) and it brings to the same problem reported
#!/bin/bash
set -eu
export PS4='> '
set -x
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
which git
grep runshell `which git`
git init
git annex init
for s in {1..30}; do
git config annex.version &
done
wait
I also wonder if it is possible to avoid dealing with locales upon every invocation of git/git-annex? couldn't it be done just once e.g. per installed annex? (sorry if we had this discussion already, but it feels not yet resolved optimally).
What version of git-annex are you using? On what operating system?
8.20200908-gcfc74c2f4
Just to note, the standalone build currently ends up being used for many conda installs, so any fixes for it would be great.
Is there a list somewhere of currently known issues affecting the standalone build but not the standard build? This would be useful in helping people decide between a more recent standalone build or an older standard one.
Fixed the rm to not redirect errors to stdout. Traced back to a8a0f7fc58 which involved a case where the rm was failing due to perms, so the intent must have been to also send stdout to /dev/null, but that was omitted then.
I thought you were mostly using git-annex-standalone.deb @yoh, and that, I think sets
GIT_ANNEX_PACKAGE_INSTALL
which prevents doing anything with locales.It does a minumum amount of work every time to get the locales available, which is necessary because the user might have deleted the cache since a previous run. If the cached files are available, it does not do anything expensive. Clearing the old cache, which is what happened to fail here, needs to be done at some point, and only happens once per old cache directory, so I see no way to improve performance of that.
Anyway, something needs to be done to make runshell's updating of this stuff idempotent, clearly.
Hmm, that rm -rf is idempotent itself once the stderr is hidden; if a race causes one rm to fail the other one will probably succeed; if both manage to fail it ignores the nonzero exit code and, the next run will clean up after it by running it again. AFAICS the cache eventually gets cleaned up in all circumstances.
I think what happened there is, the first process created that cache directory, and was in the middle of writing/overwriting the base file when the second process ran. So the second process sees what looks like a cache directory with a missing or empty base file, so it decides to clean it up. In the meantime the first process has written base and other files and so the rm fails. Also, the first process may succeed and end up running git-annex with some locale files missing (if the rm happened to delete those), resulting in incompatable system locales being used.
So, it ought to defer cleaning up old caches until after it's made sure its current cache is all set up. Then that race goes away.
But, there's a problem with this, that comes right after the main cache cleanup:
That runs if the git-annex.linux directory gets overwritten with a new version, so the cache is the same. And it's also prone to similar races. I think, to fix this, it probably needs to use the buildid in the LOCPATH.
The test case works a bit inconsistently, I had the best luck first removing ~/.cache/git-annex/locales/ and the running the test case twice in a row, which made it reliably reproduce that error on the second run.
After the above fixes, the test case no longer reproduces the problem.
@Ilya_Shlyakhter there is no separate bug tracker for runshell, but I don't know of any other open bugs involving it either.
Thank you Joey! FTR (if I located properly): fixed in 8.20200908-163-gd74d97896
If for what "flavor" of git-annex build I use -- every day is not like another these days But yeah -- tend to use a year old neurodebian built .deb when I can afford and do not feel adventurous, but I think in these cases was just using a standalone build indeed.