Recent comments posted to this site:
I may have actually come up with a solution. Instead of creating a second remote, I was able to make my ~/.ssh/config dynamic based on the results of a dig command: https://fmartingr.com/blog/2022/08/12/using-ssh-config-match-to-connect-to-a-host-using-multiple-ip-or-hostnames/
Thanks for going with me on this journey!
sameas and if cost will work in that case as well.
After a bug fix, it's now possible to make a sameas remote that is private to the local repository.
git-annex initremote bar --sameas=foo --private type=...
While not ephemeral as such, if you git remote remove bar,
the only trace left of it will probably
be in .git/annex/journal-private/remote.log, and possibly
any creds that got cached for it.
It would be possible to have a command that removes the remote, and also
clears that.
If that is close enough to ephemeral, then we could think about the second part, extending the external special remote protocol with REDIRECT-REMOTE.
That is similar to Special remote redirect to URL. And a few comments over there go in a similar direction. In particular, the discussion of CLAIMURL. If TRANSFER-RETRIEVE-URL and TRANSFER-CHECKPRESENT-URL supported CLAIMURL, then if the ephermeral special remote had some type of url, that it claimed, those could be used rather than REDIRECT-REMOTE.
That would not cover TRANSFER STORE and REMOVE though. And it probably doesn't make sense to extend those to urls generally. (There are too many ways to store to an url or remove an url, everything isn't WebDAV..)
I don't know if it is really elegant to drag urls into this anyway. The user may be left making up an url scheme for something that does not involve urls at all.
Some of this strikes me as perhaps coming at Ephemeral special remotes from a different direction?
Re the inflation of the git-annex branch when using sameas,
I fixed a bug (sameas private) and you'll be able to use
git-annex initremote --sameas=foo --private to keep the configuration
of the new sameas remote out of the git-annex branch.
So, it seems to me that your broker, if it knows of several different urls
that can be used to access myplace, can be configured at initremote
time which set of urls to use. And you can initialize multiple instances
of the broker, each configured to use a different set of url, with
--sameas --private.
CLAIMURL is not currently used for TRANSFER-RETRIEVE-URL. (It's also
not quite accurate to say that the web special remote is used.)
Supporting that would mean that, each time a remote replies with TRANSFER-RETRIEVE-URL, git-annex would need to query each other remote in turn to see if they claim the url. That could mean starting up a lot of extenal special remote programs (when not running yet) and doing a roundtrip through them, so latency might start to become a problem.
Also, there would be the possibility of loops between 2 or more remotes. Eg, remote A replies with TRANSFER-RETRIEVE-URL with an url that remote B CLAIMURLs, only to then reply with TRANSFER-RETRIEVE-URL, with an url that remote A CLAIMURLs.
TRANSFER-RETRIEVE-URL was designed as a redirect, so it only redirects to one place. And git-annex won't try again to retrieve from the same remote if url fails to download.
I could imagine extending TRANSFER-RETRIEVE-URL to have a list of urls. But I can also imagine needing to extend it with HTTP headers to use for the url, and these things conflict, given the simple line and word based protocol.
I think that sameas remotes that use other urls might be a solution.
Running eg git-annex get without specifying a remote, it will keep trying
different remotes until one succeeds.
Yes CHECKPRESENT still needs the special remote to do HTTP.
I do think that was an oversight. The original todo mentioned "taking advantage of the testing and security hardening of the git-annex implementation" and if a special remote is read-only, CHECKPRESENT may be the only time it needs to do HTTP.
A protocol extension for this would look like:
EXTENSIONS CHECKPRESENT-URL
CHECKPRESENT-URL Key Url
Would it impact the usage of such a special remote, if it would be configured with sameas=otherremote? Would both remote implementations need to implement CHECKPRESENT (consistently), or would one (in this case otherremote) by enough.
git-annex won't try to use the otherremote when it's been asked to use the sameas remote.
If one implemented CHECKPRESENT and the other always replied with
"CHECKPRESENT-UNKNOWN", then a command like git-annex fsck --fast --from
when used with the former remote would be able to verify that the content
is present, and when used with the latter remote would it would error out.
So you could perhaps get away with not implementing that. For a readonly remote, fsck is I think the only thing that uses CHECKPRESENT on a user-specified remote. It's more used on remotes that can be written to.
I looked into adopting this new feature for a special remote implementation. Four questions arose:
In order to implement CHECKPRESENT it appears that a special remote still needs to implemented the logic for the equivalent of a HTTP HEAD request. From my POV this limits the utility of a git-annex based download, because significant logic still needs to be implemented in a special remote itself. Would it impact the usage of such a special remote, if it would be configured with
sameas=otherremote? Would both remote implementations need to implement CHECKPRESENT (consistently), or would one (in this caseotherremote) by enough.I am uncertain re the signaling in case of multiple possible URL targets for a key, and an eventual download failure regarding one URL communicated via TRANSFER-RETRIEVE-URL. I believe that, when git-annex fails to download from a reported URL successfully, it can only send another TRANSFER-RETRIEVE request to the special remote (possibly go to the next remote first). This would mean that the special remote either needs to maintain a state on which URL has been reported before, or it would need to implement the capacity to test for availability (essentially the topic of Q1), and can never report more than one URL. Is this correct?
What is the logic git-annex uses to act on a URL communicated via TRANSFER-RETRIEVE-URL. Would it match it against all available special remotes via CLAIMURL, or give it straight to
web(and only that)?I am wondering, if it would be possible and sensible, to use this feature for implementing a download URL "broker"? A use case would be an informed selection of a download URL from a set of URLs associated with a key. This is similar to the
urlinclude/excludefeature of thewebspecial remote, but (depending on Q3) is relevant also to other special remotes acting as downloader implementations.
Elaborating on (4) a bit more: My thinking is focused on the optimal long-term accessibility of keys -- across infrastructure transitions and different concurrent environments. From my POV git-annex provides me with the following options for making myplace as a special remote optimally work across space and time.
via
sameas=myplace, I can have multiple special remotes point tomyplace. In each environment I can use the additional remotes (by name) to optimally accessmyplace. The decision making process it independent of git-annex. However, the possible access options need to be encoded in the annex branch to make this work. This creates a problem of inflation of this space in case of repositories that are used in many different contexts (think public (research) data that want to capitalize on the decentralized nature of git-annex).via
enableremoteI can swap out the type and parameterization ofmyplaceentirely. However, unlike withinitremotethere is no--private, so this is more geared toward the use case of "previous access method is no longer available", rather than a temporary optimization.when key access is (temporarily) preferred via URLs, I could generated a temporary
webspecial remote viainitremote --privateand aurlincludepattern.
In all cases, I cannot simply run git annex get, but I need to identify a specific remote that may need to be created first, or set a low cost for it.
I'd be glad to be pointed at omissions in this assessment. Thanks!
The assistant has some very tricky, and probably also fragile code that gathers related inotify events. That would need to be factored out for this.