A while ago, I added a bunch of files from archive.org to my repository, using git annex addurl --fast
. This worked fine. Unfortunately, since then the relevant archive has been marked access-restricted, meaning you need to log in to archive.org to download it.
archive.org uses the now-standard Web authentication method of going to a login page and setting an authentication cookie. This is, of course, hard to automate. However, I can just log in from the browser and then export those cookies; I use curlfire for this.
The problem is there isn't apparently any way to tell git-annex to do this. The old annex.web-download-command
is apparently defunct; the new web-options
doesn't let you change what program to use, only pass options to curl. What's the best way to handle this?
Well curl does have a --cookie option. But setting that would make all downloads from the web special remote have the same cookies set. So exposing them to any other web servers you also use with that remote.
I think that generally, things involving authentication are a good use case for writing a little external special remote of your own that handles the particulars of a given service. Especially if you can share it with others. ?example.sh is a good starting place for writing that.
That said, this is also right on the line to something it might be possible for git-annex to support better without you needing to do that work. It's actually possible to initremote a second web special remote that is limited to a single host and is used preferentially to the web special remote:
If
annex.web-options
had a per-remote config, like some other configs do, but which it currently does not, you could then just set that to pass the cookies to curl when using that archiveorg special remote:Since that seems like a good idea, I've implemented it! Get it in the next release or a daily build.
PS, you'll also need to set this, which does have its own security ramifications: