Please describe the problem.
I am trying to setup a cron job on github actions to daily test datalad against bleeding edge git-annex. All the few commands I am using are all in the workflow file: https://github.com/datalad/datalad-extensions/pull/7/files#diff-8364c688b76bfaf5df947cfd4d74eef7R42
To build git-annex I am using a singularity container (based on buster with all build-dependencies installed). While building a binary standalone package (from first prepared .dsc) 3 tests fail:
3 out of 260 tests failed (195.33s)
outside the container system is some ubuntu -- inside debian stable (buster). singularity bind mounts HOME, /tmp and passes all environment variables inside the container.
if you search for Z FAIL
you would find the hit
2020-03-24T23:41:28.7154039Z crypto: [adjusted/master(unlocked) f647310] empty
2020-03-24T23:41:28.7485256Z adjust ok
2020-03-24T23:41:34.7685919Z gpg: can't connect to the agent: File name too long
2020-03-24T23:41:34.7687700Z gpg: error getting the KEK: No agent running
2020-03-24T23:41:34.7689081Z gpg: error reading '[stdin]': No agent running
2020-03-24T23:41:34.7690114Z gpg: import from '[stdin]' failed: No agent running
2020-03-24T23:41:34.7693611Z FAIL
I wonder if it relates to the discrepancy of gpg-agent running outside of the container and gpg inside the container, which I have detected when saw
2020-03-24T23:34:08.8873100Z gpg: WARNING: server 'gpg-agent' is older than us (2.2.4 < 2.2.12)
2020-03-24T23:34:08.8873946Z gpg: Note: Outdated servers may lack important security fixes.
2020-03-24T23:34:08.8875072Z gpg: Note: Use the command "gpgconf --kill all" to restart them.
2020-03-24T23:34:08.9223394Z signfile git-annex_8.20200309+git101-ga51a94f61-1~ndall+1_source.buildinfo
in the beginning of the run, or may be just the fact that inside the container it shouldn't use gpg-agent
...
I wonder if there is an easy way to disable tests which would rely on having connection to gpg-agent?
FWIW, - similarish and then mitigated situation happened awhile back - the same version package builds fine using our conventional neurodebian build setup using cowbuilder (no singularity)
After wasting 20 minutes on github's atrocious mess of an interface, I managed to download some raw logs that I can look at in something that does not freeze the javascript interpreter constantly while searching for "FAIL". (This is quite a mess to inflict on yourself all in the name of proprietary monoculture, just saying.)
Full relevant except:
Very odd that it omits any mention of what subtest failed. I have a feeling this log only contains stdout and not stderr, or something other weird is happening. Probably if that were not missing it would say "test harness self-test failed".
gpg communicates with the agent over a unix socket. On linux, the path to a socket is limited to 109 bytes. The test is being run in "/home/runner/work/datalad-extensions/datalad-extensions/build/git-annex-8.20200309+git101-ga51a94f61" which is 100 bytes. Add ".t/gpgtest/" and the name of the socket, and it's too long.
Quick fix is to cd to /tmp or something before running the test suite.
Tried to reproduce it here running in a directory with the same lenth, but no failures. It could have to do with the old version of gpg-agent.
(I don't see how gpg inside the container could talk to gpg-agent outside the container at all, unless the container shares a filesystem, and even then, gpg inside the container is being run with a nonstandard home directory, so it will not try to talk to a gpg-agent socket in the usual home directory.)
I have a patch that makes the path to the socket file relative, but can't verify if it will fix the problem. I've applied it anyway.
Thank you Joey! TL;DR: If you don't manage to reproduce it with the call below -- let's consider the issue doesn't exist at least I found a workaround.
if you would like to try, after
apt-get install singularity-container
:(it might ask for a passkey to your gpg at some point - building signed pkgs) -- it might or might not fail for you!
Ghosts: it fails if I just run
scripts/ci/build_git_annex build
in my original local git repository. If, instead ofbuild/
I am providing some path outside of current directory (i.e. not justbuild
) when I am under $HOME -- it seems to work. So I started to build out under /tmp/git-annex and github actions seems to pass now!: https://github.com/datalad/datalad-extensions/runs/547292760?check_suite_focus=trueBut it also passes for me when I do above like under
/tmp/
, or when I provide build directory still somewhere in my home, e.g.~/build
. I think it is some singularity bind mounts madness somehow interfering.I guess your command there would need to be run in a directory /home/runner/work/datalad-extensions to replicate the path I saw earlier.
I did try it, but the singulatriy pull is failing with "While pulling shub image: download did not succeed: 400 Bad Request".
Running it in some directory like /tmp avoiding the problem more or less confirms my hypothesis that it's a socket path length problem. Both /tmp/build and /home/yoh/build are much shorter than /home/runner/work/datalad-extensions/datalad-extensions/build.
I should try to remember that ("On linux, the path to a socket is limited to 109 bytes") since I bet it might be biting me in some other cases... And on other systems it seems might even be shorter. Someone might even like to check if it is not accounting for the path to the image/some mountpoint outside of the singularity container which would then limit it even further.
Re singularity: dunno. I have now retried with 2.6.1-2~nd100+1 from neurodebian and 3.5.2+ds1-1 from Debian, in both cases
singularity pull --name buildenv.sif shub://datalad/datalad-extensions:buildenv-git-annex-buster
seems to start downloading.Reproduced on a host that is not behind a satellite http cache.
My earlier change did not fix it. With my change, GNUPGHOME is something like "../gpgtmp/1". So the version of gpg in this singularity must be converting that relative path to an absolute path, too long for a socket path. Which seems like kind of ridiculous behavior, but gpg does
chdir()
in places so maybe that's why.Tried a few changes, including removing --use-agent and --no-tty (it still tries to use the agent), using --no-use-agent (option is obsolete and it still tries to use the agent).
Affected version of gnupg is 2.2.12, and I think 2.2.20 does not behave that way, because it always puts the gpg agent socket in /run/user/uuid/gnupg/ not in GNUPGHOME. (OTOH, the docs for gpg say it uses the standard socket since 2.1.) Or something about the singularity environment could be altering gpg's behavior.
Anyway, this is kind of ridiculous behavior from gpg, and the only thing I can see that git-annex could do to avoid it is set GNUPGHOME to some short path in /tmp. But then git-annex would have to not honor TMPDIR for that, because that could be set to some path that is too long. Ignoring TMPDIR would be easily argued to be a bug in git-annex, while this is pretty clearly a bug in an old version of gpg.
Hmm, it could be that /run is not mounted in the container and then gpg falls back to putting the socket in the home directory.
Ok, I'm just going to make this test be skipped if it fails to import the test key.