A colleague used a wrong config, which was pointing to minio console rather than the S3 endpoint. When they ran initremote, the console wrongfully replied 200-OK when PUTting the annex-uuid file, same when they then pushed the data. The minio console always redirect to a login page, and doesn't fail on PUT ( which is non-compliant ).  So the dataset recorded all the data being present in that remote, while there was no trace of any buckets or objects in the S3.

steps to reproduce:

git init test_s3
cd test_s3/
git-annex init
export AWS_ACCESS_KEY_ID=john AWS_SECRET_ACCESS_KEY=doe
git annex initremote -d test_remote  host=\"play.min.io\" bucket=\"test_bucket\" type=S3 encryption=none autoenable=true port=9443 protocol=https chunk=1GiB requeststyle=pathecho test > test_annexed_file
git-annex add test_annexed_file
git commit -m 'add annexed file'
git-annex copy --fast --to test_remote

I am showing it with --fast flag here, as this is what datalad uses by default. Without --fast, it fails with (HeaderException {headerErrorMessage = \"ETag missing\"}) failed which is better.

So to sum it up, the unfortunate circumstances are:

  1. the initremote PUT of annex-uuid is not performing check that the annex-uuid file was effectively pushed in a bucket.
  2. minio console replies with 200-OK for all http requests
  3. datalad uses push --fast by default, which recorded files as being pushed without performing a HEAD after push. I guess that's for performance reason, but that is dangerous if a server or reverse-proxy ends-up responding 200-OK to all requests after init.

Thanks for your help!