Since 7.20181121 Debian has been seeing test suite failures, on the mips autobuilders and on the amd64 CI infrastructure, but not on the amd64 autobuilders.
I attempted to reproduce the failure on a mips and could not; instead, four different tests failed in a different way.
The most recent failures on the CI infrastructure and the mips autobuilders looks to be the same set of tests that have failed.
I am inclined to think that there are several bugs here: the test suite is overly sensitive to its environment and can fail in different ways in different environments. What's mysterious is why all this started happening so recently.
It seems unlikely to me that anyone has time to fix these bugs before the upcoming Debian freeze. Getting git-annex v7.x into buster is a priority. Maybe the test suite should be disabled during the package build, since it's flaky? I'd like to hear Joey's opinion on doing that.
--spwhitton
The logs are no longer available, but I recently did a lot of pounding on the test suite to fix interrmittent failures, and the mention of
git annex copy --from
sounds like one of the failures I fixed. fa62c3223383d8377d27576a0e32f7bfec0c826d seems to have fixed the problem that made copying from v7 repos sometimes fail.Closing on that basis; if you see a new test suite failure, please open a new bug. done --Joey
The failure on mips seems to be due to NFS locking issues preventing deleting a directory. Running the test suite on NFS is likely to turn up this kind of problem. It should not be too hard to fix it, Utility.Gpg.testHarness probably just needs to catch more exceptions than the IO exceptions it already catches.
The other two failures probably have the same underlying cause, it's a race condition or something like that involving unlocked files.
I've been seeing this failure intermittently on the git-annex autobuilders for months, it's not a new problem. Probably longer than that, but there was another intermittent problem, since fixed, that occurred more often and so I probably didn't notice these.
I think that disabling that part of the test suite would be a reasonable workaround, since if this is like the previous race it's unlikely to occur except when git-annex is used in the test suite or perhaps a script that runs a problimatic sequence of commands at a given speed.
Unfortunatly it's not at all clear how it's failing, there's no useful error message about what went wrong. If I had a good way to reproduce it I think I could get to the bottom of it in fairly short order.
The NFS problem is probably fixed by 14971414dc263fcb8124b4cf6b14b9b7a19189af
If I could reproduce the remaining failure, what I'd do is run git-annex test with --keep-failures, and then after it fails and prints out the test repo it failed in, go in there and try running the
git annex copy --from
or other command that appears to be failing, see how it's failing, and take it from there.--keep-failures
. I can't reliably reproduce the current failure but if that changes, I will investigate further.