Connecting to a discussion we had at distribits....
It would be useful to extend the external special remote protocol with the ability to create ephemeral special remotes. Ephemeral in the sense that they are created by and during the runtime of a special remote, and only exist until that special remote process is terminated by git-annex.
There could be a new protocol command that takes the same parameters as initremote as arguments. Its response would be the UUID of the created special remote.
The second part of the protocol extension would be a third response value for CHECKPRESENT, TRANSFER*, REMOVE. The addition to SUCCESS, and FAILURE would by REDIRECT-REMOTE <UUID>, and instruct git-annex to perform the same request against the special remote given by UUID instead.
The corresponding change in key availability would be recorded for the original special remote.
A use case would be to have an "orchestration" special remotes that maybe represent a particular infrastructure. They dynamically deploy appropriate transfer setups, and do not commit them to a repository. This can be useful for setups with short-lived tokens/urls. This is
in some way also an alternative to the sameas approach, where the alternatives are hidden in the implementation of a special remote, rather than in each repository.
The major difficulty in implementing this seems to be the setup stage, which is the per-special-remote code that runs during initremote/enableremote. That code can write to disk, or perform expensive operations.
A few examples:
My gut feeling is that it won't be practical to make it possible to ephemeralize every type of special remote. But it would not be too hard to make some subset of special remotes able to be used ephemerally.
It might be possible to maintain a cache of recently used ephemeral special remotes across runs of git-annex, and so avoid needing to re-run the setup stage.
There are also some common setup stage tasks that pose problems but could all be fixed in one place:
After a bug fix, it's now possible to make a sameas remote that is private to the local repository.
While not ephemeral as such, if you
git remote remove bar, the only trace left of it will probably be in.git/annex/journal-private/remote.log, and possibly any creds that got cached for it. It would be possible to have a command that removes the remote, and also clears that.If that is close enough to ephemeral, then we could think about the second part, extending the external special remote protocol with REDIRECT-REMOTE.
That is similar to Special remote redirect to URL. And a few comments over there go in a similar direction. In particular, the discussion of CLAIMURL. If TRANSFER-RETRIEVE-URL and TRANSFER-CHECKPRESENT-URL supported CLAIMURL, then if the ephermeral special remote had some type of url, that it claimed, those could be used rather than REDIRECT-REMOTE.
That would not cover TRANSFER STORE and REMOVE though. And it probably doesn't make sense to extend those to urls generally. (There are too many ways to store to an url or remove an url, everything isn't WebDAV..)
I don't know if it is really elegant to drag urls into this anyway. The user may be left making up an url scheme for something that does not involve urls at all.
Continuing my line of thought,
REDIRECT_REMOTEwould I guess be provided with a remote name, not a uuid, since with --sameas the remote would have the same uuid.While special remote "foo" could use "foo-bar", "foo-baz" etc as the name of its not-really-ephemeral helper remotes, that is not entirely satisfactory, since the user might have their own "foo-bar" remote. Or the user might notice "foo-bar" exists, and start using it, and then it would be painful if "foo" later removes it.
And, new protocol command like
initremotedoes seem to be needed, because if a special remote runsgit-annex initremoteitself, the git-annex process that is using the special remote won't know about the new remote.If there's an
initremote-like protocol command, the special remotes it inits could be in a separate namespace, andREDIRECT_REMOTEcould automatically use that namespace.For example:
That might make a remote named eg "foo-$foouuid-blah" where $foouuid is the uuid of the special remote foo that owns it. So there is no possibility of collision. That would be in
.git/configfor the reasons I discussed earlier.Depending on the type of remote, it might be cheap enough to INITREMOTE and REMOVEREMOTE in the same session. Making it emphmeral, athough with some disk writes happening behind the scenes to update the git config etc. Or, the REMOVEREMOTE could be skipped to leave it set up for the next session. Then an
INITREMOTEwith the same settings would be optimised to a no-op.That would have
git remote remote fooleave behind the configs for the not-so-ephemeral remotes that it set up. Not a big problem, the user can go in and delete them or agit-annex removeremotecould handle it, as well as deleting.git/annex/journal-private/remote.log, cached creds, etc.I'm here going with the name "aspect" to refer to a sameas remote that is in a private namespace belonging to the external special remote that uses it. This name is a bit of a placeholder, but I think some name is needed, because it would be surprising if "INITREMOTE" did a different thing than
git-annex initremote.Add to external special remote protocol, enabled by the
REDIRECTREMOTEextension:Add response to TRANSFER, REMOVE, CHECKPRESENT, TRANSFEREXPORT, CHECKPRESENTEXPORT, REMOVEEXPORT, REMOVEEXPORTDIRECTORY, RENAMEEXPORT:
With ephemeral=yes, the aspect is automatically removed when the external special remote program shuts down (unless another one is using it.) With ephemeral=no, the aspect remains initialized for use next time.
Note that INITASPECT will successfully do nothing if the aspect already exists with the same config. If an aspect exists with that name but a different config, it will fail. I earlier thought it could remove the old one and make a new one, but that risks removing an aspect that is still in use by another process, which could result in unexpected behavior when that aspect reads its git config or cached creds or etc. It should be easy enough in most cases to avoid reusing the same aspect name for two different configs.
I concur with you reasoning, also in particular with the observation that making this about URLs would be a mistake. I was already trying to have the "redirect" approach do things, it did not want to be used for.
Here is my understanding of the proposed design:
I could use this to implement an "orchestration" special remote that, rather then implementing store and retrieve procedures, is focused on what other implementations shall be used. For this, it can rely on the full set of special remotes available on a system. It would be possible to have a single remote (using this new feature) abstract a data holding site that can be talked to via various protocols, and the specific access approach can be selected dynamically. This would, therefore, include the ability to use a redirect special remote for URL-based downloads.
Few questions which I could not answer with confidence:
initremote?INITASPECT(have to) be used? ThePREPAREstage, I guess.INITASPECTbe? Would it (immediately) trigger init/prepare of the aspect-remote?INITASPECT-OK|INITASPECT-FAILUREare responses sent by the main git-annex process to the special remote, right? Any implementation would need to implement some kind of error handline (try another aspect, or error also).I think it would be best for it not to be visible to the user. Since these remotes can still set their own git configs though, they will necessarily show up in
git remote list. (Anygit config remote.foo.barsetting is enough for that.) It would be possible for git-annex to not treat them as valid remotes when used outside of the aspect context though. Easiest would be to set annex-ignore on them.It would be possible to point
GIT_CONFIGat a different config file when setting up and using the ephemeral special remote. That would have the problem though that if the special remote looks at some user-set git configs, it wouldn't see them. An example that comes to mind that a special remote would be expected to see is the "credential.helper" configuration. Maybe git-annex could merge .git/config into the ephemeral remote's version when using it? Seems complex and potentially slow though.(BTW, Even ephemeral aspects will be user-visible while git-annex is running.)
I think it could be used at any point.
As expensive as
git-annex initremoteinitially, but subsequenty close to a noop when the remote configuration includes emphemeral=noAlso, calling it repeatedly in the same session with the same configuration should be a noop after the first time. So you could call it immediately before USEASPECT.
That does suggest a simplification: Rather than having a separate INITASPECT command:
USEASPECT type=whatever ephemeral=yes|no [params]
Neat, this avoids needing to name the aspect! And avoids any problem with the aspect name having been used before with a different config.
It also means that any failure to initialize will necessarily make the USEASPECT response be an error message, so error handling takes care of itself.
git-annex would still need a remote name internally; it could eg hash the configuration to get a name.
I'm inclined to go with this simplification.
It's per-operation. If you want different aspects for different types of keys it would be up to you to pick between them.
Add to external special remote protocol, enabled by the
DELEGATEextension:Which can be used as a response to TRANSFER, REMOVE, CHECKPRESENT, TRANSFEREXPORT, CHECKPRESENTEXPORT, REMOVEEXPORT, REMOVEEXPORTDIRECTORY, RENAMEEXPORT
This initializes a delegate special remote in a private namespace, and uses it to perform the operation.
Subsequent uses of DELEGATE with the same configuration avoid the overhead of reinitialization.
With ephemeral=yes, the delegate is automatically removed when the external special remote program shuts down (unless another one is using it.) With ephemeral=no, the delegate remains initialized for use next time.
Started developing this in the
delegatebranch.It seems to also make sense to allow DELEGATE as a response to WHEREIS.
I'm on the fence about delegating GETORDERED. Probably most remotes won't bother to respond to GETORDERED at all, and the only time it makes sense to delegate it is when always delegating to the same type of special remote. If delegating to different special remotes at different times, it doesn't make sense to delegate it to a single on of them.
Similarly I don't think it makes sense to delegate GETINFO unless only delegating to a single special remote. Will probably wait to see if someone has a use case before supporting GETINFO, GETAVAILABILITY, CLAIMURL, etc.