Got git-annex downloading versioned files from S3, without needing S3 credentials. This makes a S3 special remote be equally capable as a git-annex repository exported over http, other than of course not including the git objects.
An example of this new feature:
AWS_SECRET_ACCESS_KEY=... AWS_ACCESS_KEY_ID=...
git annex initremote s3 type=S3 public=yes exporttree=yes versioning=yes
git annex export --tracking master --to s3
git tag 1.0
# modify some files here
git annex sync --content s3
And then in a clone without the credentials:
git annex enableremote s3
git checkout 1.0
git annex get somefile
This is nice; I only wish it were supported by other special remotes. It seems that any special remote could be made to support it, but ones not supporting some kind of versioning would need to store each file twice, and many would also need each file to be uploaded to them twice. But perhaps there are others that do have a form of versioning. WebDAV for one has a versioning extension in RFC 3253.
Also did a final review of a patch Antoine Beaupré is working on to backport the recent git-annex security fixes to debian oldstable, git-annex 5.20141125. He described the backport in his blog:
This time again, Haskell was nice to work with: by changing type configurations and APIs, the compiler makes sure that everything works out and there are no inconsistencies. This logic is somewhat backwards to what we are used to: normally, in security updates, we avoid breaking APIs at all costs. But in Haskell, it's a fundamental way to make sure the system is still coherent.
Today's work was sponsored by Trenton Cronholm on Patreon.
OpenNeuro folks have been exporting their datasets for a while by now using older git-annex, which didn't have all the recent "public S3" support implemented. What changes should be done on their end so git-annex could get those files (ATM --debug output and error message give no information on why access is failing)? Here is an example repository/dataset:
so file is available from S3, matches the checksum, but
git annex get
just fails to get it without giving any clarification on why. If I addversioning=yes
to enableremote then it starts asking for the AWS credentials.Thank you in advance for looking into it.
What they'll need to do is:
git-annex setpresentkey
The third item is surely going to involve some work since they'll have to gather the S3 version IDs, match them up with git-annex keys for the file that was uploaded at that point, and manually generate files to commit to the git-annex branch. The rmet log file for S3 looks like:
For example, for a file that was uploaded to the bucket as "myfile":
Note that if the filename contains spaces it gets more complicated and it would probably be worth adding some plumbing command to git-annex to set per-remote metadata.
As to the problems you're having with that S3 remote, it looks like it does not have public=yes enabled in its configuration at all, which is why git-annex is failing to download from it. It kind of looks like you have some other S3 creds in the environment, otherwise I'd expect git-annex to complain that the remote is not public and no creds are set. Or maybe a bug in the handling of that case. devblog doesn't seem like the place to dig into that..