projects/datalad/bugs-done/tests fail (gpg-agent related?) when running build inside singularity containeryohhttp://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/git-annexikiwiki2023-01-05T17:30:31Zcomment 1http://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/comment_1_d36c27dca4d266d1e6639e3435198016/joey2023-01-05T17:30:31Z2020-03-26T16:19:02Z
<p>After wasting 20 minutes on github's atrocious mess of an interface, I
managed to download some raw logs that I can look at in something that does
not freeze the javascript interpreter constantly while searching for "FAIL".
(This is quite a mess to inflict on yourself all in the name of proprietary
monoculture, just saying.)</p>
<p>Full relevant except:</p>
<pre><code>2020-03-24T23:41:28.7154004Z crypto: [adjusted/master(unlocked) f647310] empty
2020-03-24T23:41:28.7485238Z adjust ok
2020-03-24T23:41:34.7685883Z gpg: can't connect to the agent: File name too long
2020-03-24T23:41:34.7687691Z gpg: error getting the KEK: No agent running
2020-03-24T23:41:34.7689073Z gpg: error reading '[stdin]': No agent running
2020-03-24T23:41:34.7690106Z gpg: import from '[stdin]' failed: No agent running
2020-03-24T23:41:34.7693591Z FAIL
2020-03-24T23:41:34.7695318Z Exception: user error (gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--import","-q"] exited 2)
</code></pre>
<p>Very odd that it omits any mention of what subtest failed. I have a feeling
this log only contains stdout and not stderr, or something other weird is
happening. Probably if that were not missing it would say "test harness
self-test failed".</p>
<p>gpg communicates with the agent over a unix socket. On linux, the path
to a socket is limited to 109 bytes. The test is being run in
"/home/runner/work/datalad-extensions/datalad-extensions/build/git-annex-8.20200309+git101-ga51a94f61"
which is 100 bytes. Add ".t/gpgtest/" and the name of the socket, and it's too
long.</p>
<p>Quick fix is to cd to /tmp or something before running the test suite.</p>
comment 2http://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/comment_2_c61b14b76cf06c0b10257c1a84d9478a/joey2023-01-05T17:30:31Z2020-03-26T16:55:30Z
<p>Tried to reproduce it here running in a directory with the same lenth, but
no failures. It could have to do with the old version of gpg-agent.</p>
<p>(I don't see how gpg inside the container could talk to gpg-agent outside
the container at all, unless the container shares a filesystem, and even
then, gpg inside the container is being run with a nonstandard home
directory, so it will not try to talk to a gpg-agent socket in the usual
home directory.)</p>
<p>I have a patch that makes the path to the socket file relative, but can't
verify if it will fix the problem. I've applied it anyway.</p>
comment 3http://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/comment_3_ea210a22073919be434884f006212e6a/yarikoptic2023-01-05T17:30:31Z2020-03-31T01:22:53Z
<p>Thank you Joey!
TL;DR: If you don't manage to reproduce it with the call below -- let's consider the issue doesn't exist <img src="http://git-annex.branchable.com/smileys/smile4.png" alt=";)" /> at least I found a workaround.</p>
<p>if you would like to try, after <code>apt-get install singularity-container</code>:</p>
<pre><code>git clone git://github.com/datalad/datalad-extensions -b enh-git-annex && cd datalad-extensions && scripts/ci/build_git_annex build
</code></pre>
<p>(it might ask for a passkey to your gpg at some point - building signed pkgs) -- it might or might not fail for you!</p>
<p>Ghosts: it fails if I just run <code>scripts/ci/build_git_annex build</code> in my original local git repository. If, instead of <code>build/</code> I am providing some path outside of current directory (i.e. not just <code>build</code>) when I am under $HOME -- it seems to work. So I started to build out under /tmp/git-annex and github actions seems to pass now!: https://github.com/datalad/datalad-extensions/runs/547292760?check_suite_focus=true</p>
<p>But it also passes for me when I do above like under <code>/tmp/</code> , or when I provide build directory still somewhere in my home, e.g. <code>~/build</code>.
I think it is some singularity bind mounts madness somehow interfering.</p>
comment 4http://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/comment_4_0cfd7d0d93d0747d42d20f13b788fe43/joey2023-01-05T17:30:31Z2020-04-27T18:10:54Z
<p>I guess your command there would need to be run in a directory /home/runner/work/datalad-extensions
to replicate the path I saw earlier.</p>
<p>I did try it, but the singulatriy pull is failing with "While pulling shub
image: download did not succeed: 400 Bad Request".</p>
<p>Running it in some directory like /tmp avoiding the problem
more or less confirms my hypothesis that it's a socket path length
problem. Both /tmp/build and /home/yoh/build are much shorter than
/home/runner/work/datalad-extensions/datalad-extensions/build.</p>
comment 5http://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/comment_5_944a4474401b2fa0b4f3631bac63a07d/yarikoptic2023-01-05T17:30:31Z2020-04-27T19:46:22Z
<p>I should try to remember that ("On linux, the path to a socket is limited to 109 bytes") since I bet it might be biting me in some other cases... And <a href="https://unix.stackexchange.com/questions/367008/why-is-socket-path-length-limited-to-a-hundred-chars">on other systems it seems might even be shorter</a>. Someone might even like to check if it is not accounting for the path to the image/some mountpoint outside of the singularity container which would then limit it even further.</p>
<p>Re singularity: dunno. I have now retried with 2.6.1-2~nd100+1 from neurodebian and 3.5.2+ds1-1 from Debian, in both cases <code>singularity pull --name buildenv.sif shub://datalad/datalad-extensions:buildenv-git-annex-buster</code> seems to start downloading.</p>
comment 5http://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/comment_5_fa8c65c2223d2be21aa95e5889c1f352/joey2023-01-05T17:30:31Z2020-04-28T17:32:26Z
<p>Reproduced on a host that is not behind a satellite http cache.</p>
<p>My earlier change did not fix it. With my change, GNUPGHOME is something like
"../gpgtmp/1". So the version of gpg in this singularity must be converting
that relative path to an absolute path, too long for a socket path.
Which seems like kind of ridiculous behavior, but gpg does <code>chdir()</code> in
places so maybe that's why.</p>
<p>Tried a few changes, including removing --use-agent and --no-tty
(it still tries to use the agent), using --no-use-agent
(option is obsolete and it still tries to use the agent).</p>
<p>Affected version of gnupg is 2.2.12, and I think 2.2.20 does not behave
that way, because it always puts the gpg agent socket in
/run/user/uuid/gnupg/ not in GNUPGHOME. (OTOH, the docs for gpg say
it uses the standard socket since 2.1.) Or something about the singularity
environment could be altering gpg's behavior.</p>
<p>Anyway, this is kind of ridiculous behavior from gpg, and the only thing
I can see that git-annex could do to avoid it is set GNUPGHOME to some
short path in /tmp. But then git-annex would have to not honor TMPDIR
for that, because that could be set to some path that is too long. Ignoring
TMPDIR would be easily argued to be a bug in git-annex, while this is pretty
clearly a bug in an old version of gpg.</p>
comment 6http://git-annex.branchable.com/projects/datalad/bugs-done/tests_fail___40__gpg-agent_related__63____41___when_running_build_inside_singularity_container/comment_6_d358bf55d89f5d0609d792a26757d0f4/joey2023-01-05T17:30:31Z2020-04-28T19:29:05Z
<p>Hmm, it could be that /run is not mounted in the container and then gpg
falls back to putting the socket in the home directory.</p>
<p>Ok, I'm just going to make this test be skipped if it fails to import the
test key.</p>