projects/datalad/bugs-done/get -J cannot be used with password-based authenticationyohhttp://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/git-annexikiwiki2023-01-05T17:30:31Zcomment 1http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_1_236d2b897a550e7db4b266814d4e778d/joey2023-01-05T17:30:31Z2017-04-07T19:58:41Z
<p>Well let's see.. To fix this would need some way for ssh to outsource its
password prompting to another program, which could then serialize
concurrent password requests, and perhaps reuse the same password when
reconnecting to the same host.</p>
<p>Sounds an aweful lot like ssh-agent, doesn't it?</p>
<p>Now, it does happen to be the case that without -J, the password is only
prompted for once to download multiple files from the same host. That works
because of ssh connection caching. But in the -J case, the
connection caching does not help, because multiple sshed are started before
there's a connection to reuse, so each tries to make a new connection and
prompts.</p>
<p>Even if connection caching worked with -J, the general problem would remain
when it did concurrent downloads from different hosts.</p>
<p>So I tend to feel that this is just not fixable; if the user wants to use
-J, they ought to use ssh-agent so it doesn't prompt for passwords.</p>
may be?http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_2_c877de08f959dee4ace34e66f42c8615/yarikoptic2023-01-05T17:30:31Z2017-04-07T21:01:56Z
<p>well, it kinda depends at either at which level parallelization is happening or how parallel jobs handling is done, or may be ...</p>
<p>level of parallelization:
I guess ATM annex just parallelizes at the level of "get --key KEY" jobs.
But if central process decided to try to "get --from=remote --key KEY" -- call which it submits to parallel work pull -- then it could first check if remote is an ssh remote and connection caching is established, and if not -- establish it and then submit this and/or any subsequent get call.
This would though over-complicate the design I guess considerably, so probably shouldn't be approached.</p>
<p>jobs handling:
if parallel jobs could 'yield' back to the original process (e.g. if there was some protocoled exchange between them and master process... somewhat similar to git annex special remotes in a way) demanding some action (e.g. - authenticate me to the host) and then proceed back with its dues, could work out I guess.
But I guess that is also not current implementation</p>
<p>may be...:
since I guess (didn't check) GIT_SSH_COMMAND is used (or not yet but could be?) for ssh transfers, such activity as establishing shared ssh connection could be deferred to it (with some proper locking/waiting for parallel invocations)... or am I wrong?</p>
comment 3http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_3_daac1d424bc9e5b56772fa49707bc5a5/joey2023-01-05T17:30:31Z2017-04-07T21:06:51Z
How do you check if ssh has established a cached ssh connection?
comment 4http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_4_05bf20db275b911e2d89311182f289f6/joey2023-01-05T17:30:31Z2017-04-07T21:07:44Z
<code>GIT_SSH_COMMAND</code> is used for <em>every</em> call to ssh in git-annex.
comment 5http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_5_b094509fe0194313666b5b1db0a68156/yarikoptic2023-01-05T17:30:31Z2017-04-08T03:16:48Z
<blockquote><p>How do you check if ssh has established a cached ssh connection?</p></blockquote>
<p>ssh -O check -- somewhat of an additional overhead, but possible</p>
<blockquote><p>GIT_SSH_COMMAND is used for every call to ssh in git-annex.</p></blockquote>
<p>so then theoretically we could implement "may be ..." strategy on our end in our sshrun.</p>
comment 6http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_6_f5fe8d4cecfceec5cb4a03dd054d2e0a/joey2023-01-05T17:30:31Z2017-05-11T18:09:04Z
<p>All cases could be dealt with by having a single process-level prompt lock
(not a lock file, but an MVar), that's taken when doing something
that might prompt for input.</p>
<p>Then <code>Annex.Ssh.prepSocket</code> could block to take the prompt lock, and once
it has the prompt lock, start the ssh connection multiplexer and wait for
the the ssh connection to be established.</p>
<p>Thus, even if <code>git annex get -J</code> is connecting to multiple hosts that each
need passwords, password prompting would be serialized.</p>
<p>All message output could also be blocked while the prompt lock is held,
and then concurrent output would not scramble with the ssh password prompt.</p>
<p><code>ssh -S path -O check</code> does indeed exit nonzero when ssh has not yet
connected and is at a password prompt. Also, I noticed that the socket file
is only created after the password prompt, so a less expensive check
(though perhaps not as accurate) is to see if the socket file exists.
(But, it seems we don't need to check, see below.)</p>
<p>The real problem is starting the ssh connection multiplexer without
blocking for eg a whole rsync transfer to run. There's
not a <code>-O</code> command that starts the multiplexer. The only way to do it seems
to be something like <code>ssh -S path -o ControlMaster=auto -o
ControlPersist=yes host true</code>. So, run a no-op command on the remote host just
to get the connection up. Then prepSocket will know the cached connection
is up, and can drop the prompt lock and return.</p>
<p>It would only need to do this when concurrency is enabled, so
non-concurrent uses the current, faster path.</p>
<p>prepSocket takes a shared
file level lock of the socket's lock file, which is used to tell when
another git-annex process is using the connection multiplexer.
So, an optimisation would be for prepSocket to check if it's already
taken that shared lock, and then it does not need to start the multiplexer.</p>
comment 7http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_7_0118a107147f6b94a7da907e599e58db/joey2023-01-05T17:30:31Z2017-05-11T19:23:02Z
<p>What about when <code>GIT_SSH</code> is used? <code>prepSocket</code> is not used then,
and git-annex can only use the <code>GIT_SSH</code> interface to ssh to the host.
So, the approach above won't work.</p>
<p>git-annex could then try to use <code>GIT_SSH</code> to ssh to the host and run eg <code>true</code>,
in hopes that <code>GIT_SSH</code> is enabling ssh connection caching and that will
get the ssh connection set up. If <code>GIT_SSH</code> is not enabling connection
caching, that might add an additional password prompt, and not avoid
other password prompts from overlapping.</p>
statushttp://git-annex.branchable.com/projects/datalad/bugs-done/get_-J_cannot_be_used_with_password-based_authentication/comment_8_5da63cf5fa93120c85b98077fba51488/joey2023-01-05T17:30:31Z2017-05-11T21:57:24Z
<p>Current status: It's implemented, but not for <code>GIT_SSH</code> yet.</p>
<p>The display is a bit ugly, because the ssh password prompt line
confuses the concurrent-output region manager. Opened
<span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=projects%2Fdatalad%2Fbugs-done%2Fget_-J_cannot_be_used_with_password-based_authentication%2Fcomment_8_5da63cf5fa93120c85b98077fba51488&page=minor_display_glitch_with_ssh_password_prompting_and_-J" rel="nofollow">?</a>minor display glitch with ssh password prompting and -J</span> bug for that.</p>