I would like to use the automated AWS lifecycle rules to move the git annex files store on S3 to Glacier after a bit of time. Git annex need must support this kind of S3 files explicitly in order for it to work.
This is different from the adding a Glacier remote to git annex because of the reasons explained in http://aws.typepad.com/aws/2012/11/archive-s3-to-glacier.html.
Basically, the files moved by AWS from S3 to Glacier are not available under the normal Glacier API. In fact, the moved S3 files are listed as available but under the GLACIER storage class and need a RESTORE request before they can be GET like other S3 files. Trying to GET an S3 file that has been moved to Glacier will not restore it from Glacier and will result in an 403 error.
I suppose DELETE needs special care as well.
done, configure the S3 remote with
restore=yes. There are also some git configs. --Joey
Or alternatively support for S3 Lifecycle Rules. I'm not exactly sure what git-annex assistant does if it backs up into a S3 bucket with i.e. a rule to archive everything 1 month later into Glacier. Can it restore/access?
Glacier is in the process of being deprecated, instead there is the Deep Archive S3 storage class. https://aws.amazon.com/blogs/aws/new-amazon-s3-storage-class-glacier-deep-archive/
While it is possible to configure a S3 special remote with
storageclass=DEEP_ARCHIVE, or configure a bucket with lifecycle rules to move objects to deep archive, git-annex won't be able to retrieve objects stored in deep archive.To support that, the S3 special remote would need to send a request to S3 to restore an object from deep archive. Then later (on a subsequent
git-annexrun) it can download the object from S3.This is the API: https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html
It includes a Tier tag which controls whether the restore is expedited. There would probably need to be a git config for that, since the user may want to get a file fast or pay less for a slower retrieval.
And there is a Days tag, which controls how long the object should be left accessible in S3. This would also make sense to have a git config.
I have opened this issue, which is a prerequisite to implementing this https://github.com/aristidb/aws/issues/297