Please describe the problem.
Amazon has deprecated ACLs
A majority of modern use cases in Amazon S3 no longer require the use of ACLs, and we recommend that you disable ACLs except in unusual circumstances where you need to control access for each object individually. With Object Ownership, you can disable ACLs and rely on policies for access control. When you disable ACLs, you can easily maintain a bucket with objects uploaded by different AWS accounts. You, as the bucket owner, own all the objects in the bucket and can manage access to them using policies.
They are encouraging everyone to migrate to bucket policies instead.
But if such a bucket is public (meaning: write needs credentials, reads are open to the world):
public=yes
causesgit annex copy --to
to always set an ACL, which fails, which fails the entire upload- But setting
public=no
causespublicurl
to be ignored bygit annex copy --from
, failing the download
Feature Request
- If
public=yes
, instead of trying to set an ACL, first tryHTTP HEAD
on the newly uploaded object without using the AWS credentials. Only if that fails, fall over to trying to set an ACL using credential. And if you get AccessControlListNotSupported (i.e. the error due to BucketOwnerEnforced), then give a warning that the bucket policy is not configured for public access. - Make
publicurl
orthogonal topublic
: if set,git annex copy --from
should always use it unconditionally. - Update the docs here to explain how to set up a public bucket policy as recommended by Amazon, and that
public=yes
will either try to confirm that the bucket policy is public, or will fallback to using (legacy) ACLs.
What steps will reproduce the problem?
In a bucket I run, I reset the ACLs on that bucket to Amazon's default permissions:
- Bucket owner (your AWS account):
- Objects:
- List
- Write
- Bucket ACL (i.e. what ACLs are applied by default to all objects):
- Read
- Write
- Objects:
and with that set Amazon let me also set
Object ownership: Bucket owner enforced
This should be the default configuration for any new bucket created now, so you only need to do the above if you're migrating an existing bucket like I was; for reproducing, just creating an empty bucket should be enough.
It should look like this:
?S3 bucket.png
With that in place, I wrote and attached this Bucket Policy to make it public:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::BUCKET-NAME",
"arn:aws:s3:::BUCKET-NAME/*"
]
}
]
}
?S3 bucket-perms.png
I told git-annex
about it with
git annex initremote amazon type=S3 bucket=BUCKET_NAME public=yes publicurl=https://BUCKET_NAME.s3.amazonaws.com autoenable=true
But, attempting to use the new setup failed:
$ git annex copy --to amazon what.nii.gz
copy what.nii.gz (checking amazon...) (to amazon...)
41% 8.15 MiB 20 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "a6+ieujj4z3Z4P8ooA306DdbGAoxWDiXd6O2ZwjdfapGnuOGPyL5/WQ4UBEytR80FG+5b6xdlsM=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
32% 6.43 MiB 16 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "bFOgMomROCOes9yI6HZHysQGoZaTbsPI5b7rHjcTI0wA8Yx5Dm1JOky9BvXvpcXxzY1kVt48FRQ=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
37% 7.37 MiB 21 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "hqd4HRNk5yp3tKJ6yMhcECEpCjBw8qB6oTpKF3PaOsYFeVG0C+dGI06xq3zgmvnPoFUttI040sY=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
39% 7.81 MiB 21 MiB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "7m7wwG5woSPmICIuXr9QnBOEjUikuyzHSebMLuaNyZMc2Ki2vaqKpU9U+GOTYmR/NzFjOeyxngk=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
git-annex: copy: 1 failed
What version of git-annex are you using? On what operating system?
$ git annex version
git-annex version: 8.20210223
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
local repository version: 8
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Please provide any additional information below.
Disabling public
allows uploads:
$ git annex enableremote amazon public=no
enableremote amazon ok
(recording state in git...)
$ git annex copy --to amazon what.nii.gz
copy what.nii.gz (checking amazon...) (to amazon...)
ok
But then causes the new problem that downloads fail when other users try to download the updated dataset:
$ git annex get what.nii.gz
get what.nii.gz (from amazon...)
S3 bucket does not allow public access; Set both AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use S3
cannot download content
Unable to access these remotes: amazon
(Note that these git remotes have annex-ignore set: origin)
failed
git-annex: get: 1 failed
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
We use git-annex to share large datasets with the scientific community at https://github.com/spine-generic/data-multi-subject !
This only affects new S3 buckets. Existing S3 buckets that were created before April 2023 and were set up to allow public access should keep working, including ACL settings when storing new files in them. Per Amazon's announcement, "There is no change for existing buckets."
I've made
publicurl
orthogonal topublic
.As for the idea of
HTTP HEAD
before trying to set the ACL, the ACL is currently sent at past of the PutObject request. And either there is not a way to change the ACL later, or the aws haskell library is missing support for the API to do that.While git-annex could HEAD without creds when publicyes=yes to verify that the user has configured the bucket correctly, and at least warn about a misconfiguration, that would add some overhead, and I guess if the user has not configured the bucket correctly, they will notice in some other way eventually and can fix its bucket policy after the fact. So I'm inclined not to do that.
Instead I've simply depredated
public
, noting that it should not be set on new buckets. The user will have to deal with setting up the Bucket Policy themselves.