publicurl
configuration option was added to S3 special remote to facilitate public access to the files deposited to S3 via their HTTP "frontend". For many remotes (e.g. rsync
, directory
, etc) it might happen that the remote location directory is also served by a regular HTTP server. So it sounds very reasonable to enable regular HTTP(/https) access to those files publicly by providing those special remotes with publicurl
setting so annex could simply try to access those files via http. In particular relevant for the special remotes with exporttree=true
.
done by implementing another design, not the one suggested here --Joey
I'm not sure how to implement this in git-annex's Remote API. retrieveKeyFile/retrieveExport would need to check it and download the url, so that would need modifications of those methods of every remote that implements this. And it would need to be possible to enable the remote in readonly mode.
It might be possible to use a mixin to modify a Remote to support this?
This will need the remote to provide a function
Key -> FilePath
, in order to support whatever hash directories or filename mangling the remote does. It might be better to generalize the function toUrl -> Key -> Url
where the first url is the publicurl value. (When exporttree=true, the function is probably not needed.)To support that function in external special remotes, the protocol would need to be extended. Hmm, that means that, in order to get a file, the external program would need to be installed, even though the actual file download only needs http. Contrast with the current readonly mode that doesn't need the external program to be installed since the url is recorded on the git-annex branch.
I think that the only built-in remotes that would make sense to support this are rsync, directory[1], and webdav. s3 already supports it but could be refactored. git remotes already support http access which is effectively the same result, and git-lfs already supports unauthed downloads, assuming the server allows it.
[1] a bit problimatic because old versions used a different hash directory than current versions, so unless it can return two urls, things stored with an old version won't be accessible
?generic readonly http remote if implemented would accomplish the same thing as this todo.
The advantage to that idea is, it doesn't need modifications be made to every special remote that might end up exposed over http. As long as the special remote is not too special about how it mangles keys into paths, it can work for a lot of special remotes.
The directory special remote is a good example.. It would be weird for that to have http-specific configuration and complications.
And, that approach doesn't complicate the external special remote protocol, and avoids the problems discussed in comment #4.
So, I'm inclined to that approach over this one.
I've implemented the http special remote, that can be combined with other special remotes to access them using anonymous http.
I think that probably addresses this todo well enough to close it. (Although I didn't get around to ?make http special remote support exporttree remotes yet, and this todo mentions supporting exporttree. Should be easy to add later though.)
There are probably some special remotes that are unusual enough that the http special remote can't support them, which it would make sense to add a publicurl= config to, like S3 has. (Although I think S3 itself could now be used with the http special remote so its option is vestigal now.)
I guess that publicurl= config would best be added to the individual special remote, so it doesn't need any particular support in git-annex to add it.
http
(e.g. probably could remap to publicrsync://
,ftp://
ors3://
to which git-annex knows how to "talk to") even if ATM supporting onlyhttp/https
, but I guess time will show if that would be needed.