Please describe the problem.

Amazon has deprecated ACLs

A majority of modern use cases in Amazon S3 no longer require the use of ACLs, and we recommend that you disable ACLs except in unusual circumstances where you need to control access for each object individually. With Object Ownership, you can disable ACLs and rely on policies for access control. When you disable ACLs, you can easily maintain a bucket with objects uploaded by different AWS accounts. You, as the bucket owner, own all the objects in the bucket and can manage access to them using policies.

They are encouraging everyone to migrate to bucket policies instead.

But if such a bucket is public (meaning: write needs credentials, reads are open to the world):

  • public=yes causes git annex copy --to to always set an ACL, which fails, which fails the entire upload
  • But setting public=no causes publicurl to be ignored by git annex copy --from, failing the download

Feature Request

  • If public=yes, instead of trying to set an ACL, first try HTTP HEAD on the newly uploaded object without using the AWS credentials. Only if that fails, fall over to trying to set an ACL using credential. And if you get AccessControlListNotSupported (i.e. the error due to BucketOwnerEnforced), then give a warning that the bucket policy is not configured for public access.
  • Make publicurl orthogonal to public: if set, git annex copy --from should always use it unconditionally.
  • Update the docs here to explain how to set up a public bucket policy as recommended by Amazon, and that public=yes will either try to confirm that the bucket policy is public, or will fallback to using (legacy) ACLs.

What steps will reproduce the problem?

In a bucket I run, I reset the ACLs on that bucket to Amazon's default permissions:

  • Bucket owner (your AWS account):
    • Objects:
      • List
      • Write
    • Bucket ACL (i.e. what ACLs are applied by default to all objects):
      • Read
      • Write

and with that set Amazon let me also set

Object ownership: Bucket owner enforced

This should be the default configuration for any new bucket created now, so you only need to do the above if you're migrating an existing bucket like I was; for reproducing, just creating an empty bucket should be enough.

It should look like this:

?S3 bucket.png

With that in place, I wrote and attached this Bucket Policy to make it public:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::BUCKET-NAME",
                "arn:aws:s3:::BUCKET-NAME/*"
            ]
        }
    ]
}

?S3 bucket-perms.png

I told git-annex about it with

git annex initremote amazon type=S3 bucket=BUCKET_NAME public=yes publicurl=https://BUCKET_NAME.s3.amazonaws.com autoenable=true

But, attempting to use the new setup failed:

$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
41%   8.15 MiB         20 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "a6+ieujj4z3Z4P8ooA306DdbGAoxWDiXd6O2ZwjdfapGnuOGPyL5/WQ4UBEytR80FG+5b6xdlsM=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

32%   6.43 MiB         16 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "bFOgMomROCOes9yI6HZHysQGoZaTbsPI5b7rHjcTI0wA8Yx5Dm1JOky9BvXvpcXxzY1kVt48FRQ=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

37%   7.37 MiB         21 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "hqd4HRNk5yp3tKJ6yMhcECEpCjBw8qB6oTpKF3PaOsYFeVG0C+dGI06xq3zgmvnPoFUttI040sY=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

39%   7.81 MiB         21 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "7m7wwG5woSPmICIuXr9QnBOEjUikuyzHSebMLuaNyZMc2Ki2vaqKpU9U+GOTYmR/NzFjOeyxngk=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
git-annex: copy: 1 failed

What version of git-annex are you using? On what operating system?

$ git annex version
git-annex version: 8.20210223
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
local repository version: 8
$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Please provide any additional information below.

Disabling public allows uploads:

$ git annex enableremote amazon public=no
enableremote amazon ok
(recording state in git...)
$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
ok                                

But then causes the new problem that downloads fail when other users try to download the updated dataset:

$ git annex get what.nii.gz
get what.nii.gz (from amazon...) 

  S3 bucket does not allow public access; Set both AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use S3

  cannot download content

  Unable to access these remotes: amazon

  (Note that these git remotes have annex-ignore set: origin)
failed
git-annex: get: 1 failed

Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

We use git-annex to share large datasets with the scientific community at https://github.com/spine-generic/data-multi-subject !

fixed --Joey