Please describe the problem.
can't fetch in parallel from a host over ssh if authentication is password-based
What steps will reproduce the problem?
try to get -J4 from a host which has ssh authentication password-only (no key)
What version of git-annex are you using? On what operating system?
6.20170101+gitg93d69b1-1~ndall+1 with newer version (6.20170220+gitg75a15e1ad-1~ndall+1) looks slightly different but to the same "effect"
Please provide any additional information below.
$> git annex get -J4
get sourcedata/sub-sid000004/ses-siemens0/anat/sub-sid000004_ses-siemens0_acq-MPRAGE_run-01_T1w.dicom.tgz get sourcedata/sub-sid000004/ses-siemens0/fmap/sub-sid000004_ses-siemens0_acq-3mm_run-01_phasediff.dicom.tgz get sourcedata/sub-sid000005/ses-siemens1/func/sub-sid000005_ses-siemens1_task-life_acq-2mm692_run-04_bold.dicom.tgz get sourcedata/sub-sid000005/ses-siemens1/func/sub-sid000005_ses-siemens1_task-life_acq-2mm748_run-03_bold.dicom.tgz (transfer already in progress, or unable to take transfer lock)
Unable to access these remotes: origin
(from origin...) (from origin...)
Try making some of these repositories available:
2e44be07-8f1a-4c11-a7cb-464802b87b26 -- mvdoc@smaug:/mnt/btrfs/dbic/inbox/dbic-ds-3mm/dbic/pulse_sequences
b2ff2964-c31b-4784-b094-2bebb336da91 -- mvdoc@smaug:/mnt/btrfs/dbic/inbox/dbic-ds/dbic/pulse_sequences
d486ea11-98dc-42d3-9640-e5713acfb675 -- yoh@rolando:/inbox/BIDS/dbic/1000-dbic-dataset [origin]
failed
get sourcedata/sub-sid000005/ses-siemens1/func/sub-sid000005_ses-siemens1_task-life_acq-2mm754_run-05_bold.dicom.tgz (from origin...)
(from origin...)
...
yohtest@rolando.cns's password: yohtest@rolando.cns's password: yohtest@rolando.cns's password: yohtest@rolando.cns's password
I have entered password just once -- didn't try to enter it multiple times into the void but I guess it would be neat if annex could handle this situation gracefully (e.g. initiate central ssh controller first before spawning parallel getters) and demand password once
done; this was fixed for ssh, except for in the case of
GIT_SSH
. I don't think it can be supported forGIT_SSH
, and don't think it's worth leaving this open for something that can't be fixed. --Joey
Well let's see.. To fix this would need some way for ssh to outsource its password prompting to another program, which could then serialize concurrent password requests, and perhaps reuse the same password when reconnecting to the same host.
Sounds an aweful lot like ssh-agent, doesn't it?
Now, it does happen to be the case that without -J, the password is only prompted for once to download multiple files from the same host. That works because of ssh connection caching. But in the -J case, the connection caching does not help, because multiple sshed are started before there's a connection to reuse, so each tries to make a new connection and prompts.
Even if connection caching worked with -J, the general problem would remain when it did concurrent downloads from different hosts.
So I tend to feel that this is just not fixable; if the user wants to use -J, they ought to use ssh-agent so it doesn't prompt for passwords.
well, it kinda depends at either at which level parallelization is happening or how parallel jobs handling is done, or may be ...
level of parallelization: I guess ATM annex just parallelizes at the level of "get --key KEY" jobs. But if central process decided to try to "get --from=remote --key KEY" -- call which it submits to parallel work pull -- then it could first check if remote is an ssh remote and connection caching is established, and if not -- establish it and then submit this and/or any subsequent get call. This would though over-complicate the design I guess considerably, so probably shouldn't be approached.
jobs handling: if parallel jobs could 'yield' back to the original process (e.g. if there was some protocoled exchange between them and master process... somewhat similar to git annex special remotes in a way) demanding some action (e.g. - authenticate me to the host) and then proceed back with its dues, could work out I guess. But I guess that is also not current implementation
may be...: since I guess (didn't check) GIT_SSH_COMMAND is used (or not yet but could be?) for ssh transfers, such activity as establishing shared ssh connection could be deferred to it (with some proper locking/waiting for parallel invocations)... or am I wrong?
GIT_SSH_COMMAND
is used for every call to ssh in git-annex.ssh -O check -- somewhat of an additional overhead, but possible
so then theoretically we could implement "may be ..." strategy on our end in our sshrun.
All cases could be dealt with by having a single process-level prompt lock (not a lock file, but an MVar), that's taken when doing something that might prompt for input.
Then
Annex.Ssh.prepSocket
could block to take the prompt lock, and once it has the prompt lock, start the ssh connection multiplexer and wait for the the ssh connection to be established.Thus, even if
git annex get -J
is connecting to multiple hosts that each need passwords, password prompting would be serialized.All message output could also be blocked while the prompt lock is held, and then concurrent output would not scramble with the ssh password prompt.
ssh -S path -O check
does indeed exit nonzero when ssh has not yet connected and is at a password prompt. Also, I noticed that the socket file is only created after the password prompt, so a less expensive check (though perhaps not as accurate) is to see if the socket file exists. (But, it seems we don't need to check, see below.)The real problem is starting the ssh connection multiplexer without blocking for eg a whole rsync transfer to run. There's not a
-O
command that starts the multiplexer. The only way to do it seems to be something likessh -S path -o ControlMaster=auto -o ControlPersist=yes host true
. So, run a no-op command on the remote host just to get the connection up. Then prepSocket will know the cached connection is up, and can drop the prompt lock and return.It would only need to do this when concurrency is enabled, so non-concurrent uses the current, faster path.
prepSocket takes a shared file level lock of the socket's lock file, which is used to tell when another git-annex process is using the connection multiplexer. So, an optimisation would be for prepSocket to check if it's already taken that shared lock, and then it does not need to start the multiplexer.
What about when
GIT_SSH
is used?prepSocket
is not used then, and git-annex can only use theGIT_SSH
interface to ssh to the host. So, the approach above won't work.git-annex could then try to use
GIT_SSH
to ssh to the host and run egtrue
, in hopes thatGIT_SSH
is enabling ssh connection caching and that will get the ssh connection set up. IfGIT_SSH
is not enabling connection caching, that might add an additional password prompt, and not avoid other password prompts from overlapping.Current status: It's implemented, but not for
GIT_SSH
yet.The display is a bit ugly, because the ssh password prompt line confuses the concurrent-output region manager. Opened ?minor display glitch with ssh password prompting and -J bug for that.