Should be possible for any public bucket which allows for annonymous access. ATM git-annex demands AWS credentials for importtree on such public bucket
(dandisets) dandi@drogon:/mnt/backup/dandi/tmp/dandisets/test-importtree-s3$ git annex initremote s3-origin type=S3 importtree=yes encryption=none autoenable=true bucket=dandiarchive fileprefix=zarr-checksums/2ac71edb-738c-40ac-bd8c-8ca985adaa12/ initremote s3-origin (checking bucket...)
Set both AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use S3
git-annex: No S3 credentials configured
failed
initremote: 1 failed
(dandisets) dandi@drogon:/mnt/backup/dandi/tmp/dandisets/test-importtree-s3$ git annex version
git-annex version: 10.20220525-gf1fdc90
in my searches information on "anonymous access to S3" is scarce, but in DataLad we rely on old-ish version of boto
library which does it for us, more info/pointers eg here on SO
I've checked and the haskell aws library does not currently support this. Since the library currently needs a maintainer, I have not filed an issue to implement this.
It might be possible to work around it, by using s3SignQuery with a dummy credentials, and then modifying the SignedQuery that it returns to remove the authentication headers. Or by bypassing s3SignQuery and constructing a SignedQuery that is not actually signed. Update: No, it's not possible, because s3SignQuery is used internally in aws.
Do you have a sample bucket that does allow anonymous access, not only to individual files, but to listing the content of the bucket?
doesn't that
dandiarchive
allows for listing? if not, also tryfcp-indi
. Also, if looking for a small one, checkout our test onedatalad-test0-versioned
- I think it is fully public. At least for fcp-indi and datalad-test0-versioned query likecurl -v https://s3.amazonaws.com/fcp-indi/
works and seems to list itOk, I hacked up the aws library to omit the authentication headers, and provided git-annex with dummy AWS credentials. I was able to import from datalad-test0-versioned after a small fix to git-annex.
Here's the patch I used. This is certianly not upstreamable as-is, but is a nice proof of concept.
We are hosting two datasets on S3 that allows anonymous downloads: https://github.com/spine-generic/data-multi-subject, https://github.com/spine-generic/data-single-subject.
You can try it right now:
The trick was simply to set
public=yes
andpublicurl
when runninginitremote
. The final config I have stored isWhy would
importtree
behave so differently?interesting idea. May be my invocation is still incomplete (no host or datacenter) but with the following one I am still queried for the credentials:
or may be whenever you ran
initremote
you did have those credential variables exported already?@nick.guenther importtree needs a way to list the files stored in a bucket, and doing that involves using the S3 API.
In the case of your remote, it knows what files are stored in the bucket, since git-annex stored them there previously.
I've finished the work on aws, which is in https://github.com/aristidb/aws/pull/281 and I hope will be merged soon.
git-annex now has a branch
anons3
that implements this, when the S3 remote is configured with signature=anonymous.Also, I've fixed it to only list files in the fileprefix, which sped up the listing a lot in this bucket with many other files..
I've finished up the work on this. To use it, you will need a git-annex built with aws 0.23. It will take some time before that is available in some builds, but when building git-annex with stack, it will use it already.