One of my collaborators needs to orchestrate data between local desktop and HPC cluster (you probably have heard enough already about some of the "experiences" with that NFS). Connection to the cluster goes through VPN, which is flaky (can fall within an hour or two) and requires 2FA to get in - so not that easy to transfer large amounts of data back and forth. BUT we were told that the same data is available via globus, without requiring VPNC. So looking at https://docs.globus.org/cli/examples/ I wonder if there would be anything which would preclude having an additional special remote to provide an alternative access to the same remote (same UUID) to just take care about depositing, obtaining, and may be removing files via globus, instead of ssh. We kinda have already similarish scenarios where we publish annex via ssh, but making it available via http for downloads. If remote location is a typical indirect annex (not a super thin version of it without duplicate copy under .git/annex/objects), it should be quite easy I guess to figure out full path to the key (although might need to watch out for bare ones) -- should be as it was locally -- and just get the file via globus cli instead of ssh session. Decided to ask before jumping into trying to implement it (not that I have any globus access ATM - I think all life signs of it were gone from dartmouth sites awhile back).
It's not currently possible for two special remotes to have the same uuid, because the remote.log is indexed by uuid, and so their configurations would overlap, including the type= and remotetype= settings.
But I think in this case, that may not be a problem, it seems you have a regular remote accessed via ssh, and you want to add a special remote with the same uuid that transfers from the same remote using globus. This is like accessing the same repo via two ssh remotes etc, should work ok.
You can pass uuid=whatever to git-annex initremote to force it to use the same uuid as the ssh remote.
(Returning to the question of two special remotes with the same uuid, supporting that would need some way to separate their configurations in remote.log into different namespaces. Seems doable.)
initremote --sameas
can now be used to tell git-annex that two special remotes use the same underlying data.Is there anything else needing to be done in git-annex to let this globus special remote be implemented?
Added recently
--sameas
functionality provides support at the "UUID logistics" level, and examples in the comments exercise it for two external remotes (rsync + directory) with the same layout of annex objects. The original use case I am pursuing is for a regular git repository (e.g. non-bare) with "git repository" layout of the store (i.e. under.git/annex/objects/
) use a special remote primarily as a transport mechanism. In our case it will beglobus
. I really doubt it would work "out of the box" since AFAIK any special remote has only two possible ideas about layout of objects: its regular "special remote layout" (e.g. a flat list of keys or with some hash directories) or exported (such as a file tree). Only in case ofgit
special remote layout would be the same, but otherwise special remote layout would be different, and "export" wouldn't really be the one desired (especially for placing files to the remote). So it seems that the only way to accomplish my mission would be to implement in theglobus
custom special remote the support of additional layout by parametrizing special remote upon initremote with e.g.layout=local
, which would lookup location for the key in the local repository (under.git/annex/objects
), and use it as the path for the key on the remote.Is that a correct idea Joey? or you see a better way?
If the repository being accessed over globus uses .git/annex/objects/ locations, it sounds to me like it's a git-annex repo, being accessed over a protocol other than ssh. A special remote that accesses remote annex objects could be created, and --sameas used to make the special remote have the same uuid as the (remote) git-annex repo.
That is correct.
That is correct too. The question is either it should be a dedicated
git-annex-remote-globus-gitannex
special remote which would need to probably use the same functionality of agit-annex-remote-globus
for actual authentication and interaction with globus (with difference largely in paths to assume) or just an option to thegit-annex-remote-globus
...?I'm not clear how the answer to that question would impact git-annex.
Assuming this is built with external special remotes and/or plain git remotes, is there something lacking in git-annex to implement it now?