In backups2datalad we are implementing support for "embargoed" data -- files which would require authorization initially. To provide authentication support we enable/configure git-annex-remote-datalad
to handle them. But in the future, when data gets unembargoed we would need to remove/disable datalad
special remote and migrate (the same) URLs to no longer be associated with it and rather be handled by web
remote. Generally the use case might desire moving URLs between remotes (e.g. from datalad
to datalad-next
), or migrating from web
into an external remote.
Is it already possible reasonably well, or would require messing with .web
files in the git-annex branch?
I see you're using CLAIMURL. What
git-annex addurl
does when a special remote claims an url is it records the url for the key in the git-annex branch, but mangled to indicate that it is not an url used by the web special remote.The mangling is just to prefix the url with ":".
If you
git-annex registerurl --remote=web
the same url again but without that prefix, andgit-annex setpresentkey $key 00000000-0000-0000-0000-000000000001 1
, the url will will able to be downloaded by the web special remote.Then you can unregisterurl the (unmangled) url from your remote that you no longer want to use, and use setpresentkey with your remote's uuid and "0" to remove the mangled url.
I don't think there's any plumbing currently that makes it easy to access and demangle the urls.
git-annex whereis --json
will list the demangled url in the "urls" field, but in amoung any other urls that other special remotes might have for the same content. Without --json there is a nice display that shows the remote that claims the url:Improving the whereis --json or adding some other machine-readable way to list urls claimed by a remote seems like maybe worth doing? Let me know.
in theory - yes. In practice, ATM I would probably just
git annex whereis | awk -e "/^ *$remote:/{print $2;}"
or alike in bash, and in Python it would be quite trivial to filter for a specific remote. The main "hurdle" here would be the need to do that dance with registerurl/unregisterurl and given that we might need to do that for many thousands keys (imaging going through such repos as dandizarrs), I felt like some internal dedicated function would be worthwhile.The commands you would need do all support --batch, so it seems very scriptable.
I'm struggling a bit with the idea of a dedicated git-annex command for this, because it seems a fairly unusual situation.
Maybe `git-annex registerurl could have a switch that says to move urls from remote foo to the web special remote? That would avoid needing to query for the urls.
Since registering an url also marks it present, all you would need after that is to mark it as not present any longer on your special remote.
looks good
it is
--move-from
, not--copy-from
, so wouldn't it mark it not present any longer onfoo
?Thing is that
unregisterurl
does not mark content as not present in a special remote. Except for the web which is a special case. Reason is that not having an url registered by a special remote does not prevent getting content from that special remote in general.So my idea for
--move-from=foo
was that it should behave the same way since it's the same asregisterurl + unregisterurl
.If you're going to remove/disable the special remote anyway, it won't matter whether git-annex thinks it contains content, I suppose? Or you could use setpresentkey of course.
Went far enough down implementing
registerurl --move-from
to be sure that it would complicate the code far more than just adding a newmoveurl
command. So despite it being a fairly unusual situation, a new command is better than that option.And implemented it: