ATM, special remotes store content (keys) in a keystore which has no reflection of original files hierarchy.
In many cases though it is plausible and reasonable to replicate original files hierarchy, e.g. when uploading path/file1.ext, have it stored on a remote also under such name with additional part of the path to allow for multiple versions, e.g. path/file1-Key[:5].ext (collision is possible but unlikely) or path/file1.ext/Key . This way we could replicate initial files hierarchy on a special remote, making it useful/usable without annex. I guess (didn't check yet) that File in TRANSFER STORE request points to original file location (not resolved key location), so we can have information about original sample location within special remote. Since it would be virtually impossible (or expensive to locate) to retrieve content solely by a Key, we could use URLs mechanism to associate given uploaded Key with a new custom URL (e.g. custom-schema:path-file1.ext/Key) so later this special remote could provide the content by claiming that url. Sure thing custom remote could just use 'addurl' call independently within a call to "TRANSFER" to it, but I wondered if may be protocol could be adjusted to support
TRANSFER-SUCCESS STORE Key URL
response when upon STORE success special remote provides a url under which content should be registered available from.
I think this was trying to implement something like what
git annex export
, and since that's implemented now, we shouldn't need to worry about this. done --Joey
Well no, the filename passed to "TRANSFER STORE" is wherever the content of the file is, in most circumstances it will not be a file in the working tree.
(And even if the filename is a worktree file in some case, the special remote needs to support storing multiple versions of a file. So trying to use the name used in the working tree on the special remote seems very problimatic.)
In any case, the external special remote protocol already has
SETURLPRESENT
which can be used if a TRANSFER STORE makes a key be available at an url.SETURLPRESENT -- sounds like could be used indeed, but
"remote needs to support storing multiple versions of a file"
SURE (e.g. as I have described before)
"in most circumstances it will not be a file in the working tree"
could then there be a reasonable (scalable, via git annex interface) way to discover the (original) path(s) within repository which was given to "git add PATH"?
Once again -- the idea is to make some special remotes useful on their own without relying on having an annex and original git/annex repository to associate those with specific files.
Related: I see that there is now also SETURIPRESENT. Is there difference how annex handles URIs in comparison to URLs? in an example which I saw with ipfs: URI. it feels that those are still URLs as prescribing "how" content should be retrieved (via ipfs). We have used similarly addurl command to register our urls for content from archives (e.g. dl+archive:MD5E-s2416581890--662e0713d0ce42bcdbadb8251b893b8a.tgz#path=ds001/sub-01/anat/sub-01_T1w.nii.gz)
git-annex does not keep track of which urls belong to which remote. Urls are, after all, Universal; it shoudn't matter which remote set an url.
So, if
SETURLPRESENT
was used, and if git-annex thinks that the web special remote is recorded as having the content, it will try to download from that url, as well as any other urls that might be set.But,
SETURLPRESENT
does not make it think that the web special remote has the content. So, if the special remote that git-annex does think has to content is not enabled,git annex get
won't try the web special remote.So, what you can do is run
git annex setpresentkey $key 00000000-0000-0000-0000-000000000001
to make it think the web special remote has the url afterSETURLPRESENT
. Then it'll be the same as ifaddurl
had been used; it will download from the web.(There's also a way to enable a external special remote in readonly mode. In this mode, the special remote program does not have to be in PATH, and when git-annex wants to get content from the remote it will download content from any urls.)
(The difference with
SETURIPRESENT
is that it's assumed the URI cannot be downloaded via HTTP/FTP. So, whilegit annex whereis
displays URIs, git-annex won't try to download them itself.)git-annex keeps track of the AssociatedFile, which (when available) is the worktree file corresponding to the Key that's being operated on.
This information is not exposed in the external special remote interface. I'm worried that, if it were, people would try to do stuff that just can't work, like http://git-annex.branchable.com/todo/dumb__44___unsafe__44___human-readable_backend/
Worktree files can be renamed or deleted or copied at any time and can have multiple versions, and any special remote that used this information to try to create something that resembles the worktree would have massive problems.
I am having a hard time thinking of any use that an external special remote could make of the information that would not be a mistake.