Please describe the problem.
What steps will reproduce the problem?
I followed the s3 special remote tip to setup a public S3 special remote:
export AWS_ACCESS_KEY_ID...
export AWS_SECTRET_ACCESS_KEY...
git annex initremote pubs3 type=S3 encryption=none public=yes
initremote pubs3 (checking bucket...) (creating bucket in US...)
git-annex: S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "InvalidBucketAclWithObjectOwnership", s3ErrorMessage = "Bucket cannot have ACLs set with ObjectOwnership's BucketOwnerEnforced setting", s3ErrorResource = Nothing, s3ErrorHostId = Just "CvCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx8ic=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
initremote: 1 failed
The public
parameter is documented with
"Set to "yes" to allow public read access to files sent to the S3 remote. This is accomplished by setting an ACL when each file is uploaded to the remote. So, changes to this setting will only affect subseqent uploads."
This AWS post says that it disables setting ACLs for S3 buckets from April 2023 onward following their previous recommendation to do so, so this may be the issue. I know too little about AWS S3 to suggest an alternative approach, so this is just a bug report that the "public" parameter causes trouble when set to "yes".
There is no issue if I set public=no
.
What version of git-annex are you using? On what operating system?
$ git annex version 255 !
git-annex version: 10.20221003
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
I work daily with git-annex and I never fail to be amazed by it. Thank you for your work!
This only affects new S3 buckets. Existing S3 buckets that were created before April 2023 and were set up to allow public access should keep working, including ACL settings when storing new files in them. Per Amazon's announcement, "There is no change for existing buckets."
So users who create new buckets will need to set
public=no
(the default) and set a bucket policy instread. See this comment for an example policy.That comment also suggests:
However, the ACL is currently sent at past of the PutObject request. And either there is not a way to change the ACL later, or the aws haskell library is missing support for the API to do that.
I think what needs to be done is discourage initializing new S3 remotes with public=yes, since it won't work. (Assuming some other S3 implementation than Amazon doesn't keep on supporting ACLs.)
And allow setting publicurl=yes without public=yes, so users who create new buckets and configure a bucket policy to allow public access can tell git-annex it's set up that way, so it will download from the bucket w/o S3 credentials.
While git-annex could HEAD without creds when publicyes=yes to verify that the user has configured the bucket correctly, that would add some overhead, and I guess if the user has not configured the bucket correctly, they will notice in some other way eventually and can fix its bucket policy after the fact. So I'm inclined not to do that.