https://github.com/OpenNeuroOrg/openneuro/issues/3446#issuecomment-2892398583
This is a case where a truncated file was exported as part of a tree to S3.
In particular a bucket with versioning=yes.
Note that git-annex export does not verify checksums before sending, and
so it's possible for this to happen if a corrupted object has somehow
gotten into the local repository. It might be possible to improve this to
deal better with object corruption, including object corruption that occurs
while exporting.
Currently there is no good way for a user to recover from this. Exporting a tree that deletes the corrupted file, followed by a tree that adds back the right version of the file will generally work. But it will not work for a versioned S3 bucket, because removing an export from a versioned S3 bucket does not remove the recorded S3 versionId. While re-exporting the file will record the new versionId, the old one remains recorded, and when multiple versionIds are recorded for the same key, either may be used when retrieving it.
What needs to be done is to remove the old versionId. But it does not seem right to generally do this when removing an exported file from a S3 bucket, because usually, when it's not corrupted, that versionId is still valid, and can still be used to retrieve that object.
git-annex fsck --from=s3 will detect the problem, but it is unable to do
anything to resolve it, since it can only try to drop the corrupted key,
and dropping by key is not supported with an exporttree=yes remote.
Could fsck be extended to handle this? It should be possible for fsck to:
- removeExport the corrupted file, and update the export log to say that the export of the tree to the special remote is incomplete.
- Handle the special case of the versioned S3 bucket with eg, a new Remote method that is used when a key on the remote is corrupted. In the case of a versioned S3 bucket, that new method would remove the versionId.
--Joey
Note that it would also be possible for a valid object to be sent, but then get corrupted in the remote storage. I don't think that's what happened here.
If that did happen, a similar recovery process is also needed.
Which I think says that focusing on a recovery process, rather than on prevention, is more useful.
The OpenNeuro dataset ds005256 is a S3 bucket with versioning=yes, and a publicurl set, and exporttree=yes. With that combination, when S3 credentials are not set, the versionId is used, in the public url for downloading.
Note that this first does a download that fails incomplete with "Verification of content failed". Then it complains "Unable to access these remotes: s3-PUBLIC". It's trying two different download methods; the second one can only work with S3 credentials set.
Note that this doesn't download, but fails at the checkPresent stage. At that point, the HTTP HEAD reports the size of the object, and it's too short.
If drop from export remote were implemented that would take care of #1.
The user can export a tree that removes the file themselves. fsck even suggests doing that when it finds a corrupted file on an exporttree remote, since it's unable to drop it in that case.
But notice that the fsck run above does not suggest doing that. Granted, with a S3 bucket with versioning, exporting a tree won't remove the corrupted version of the file from the remote anyway.
It seems that dealing with #2 here is enough to recover the problem dataset, and #1 can be left to that other todo.
After a lot of thought and struggling with layering issues between fsck and the S3 remote, here is a design to solve #2:
Add a new method
repairCorruptedKey :: Key -> Annex Boolfsck calls this when it finds a remote does not have a key it expected it to have, or when it downloads corrupted content.
If
repairCorruptedKeyreturns True, it was able to repair a problem, and the Key should be able to be downloaded from the remote still. If it returns False, it was not able to repair the problem.Most special remotes will make this
pure False. For S3 with versioning=yes, it will download the object from the bucket, using each recorded versionId. Any versionId that does not work will be removed. And return True if any download did succeed.In a case where the object size is right, but it's corrupt, fsck will download the object, and then repairCorruptedKey will download it a second time. If there were 2 files with the same content, it would end up being downloaded 3 times! So this can be pretty expensive, but it's simple and will work.
repairbranch.1 is not needed for the case of a versioned S3 bucket, because after
git-annex fsck --from S3corrects the problem,git-annex export --to S3will see that the file is not in S3, and re-upload it.In the general case, #1 is still needed. I think drop from export remote would solve this, and so no need to deal with it here.
I thought about making
git-annex exportchecksum files before uploading, but I don't see why export needs that any more than a regular copy to a remote does. In either case, annex.verify will notice the bad content when getting from the remote, and fscking the remote will also detect it, and now, recover from it.It seems unlikely to me that the annex object file got truncated before it was sent to ds005256 in any case. Seems more likely that the upload was somehow not of the whole file.