Some remotes like S3 support versioning of data stored in them. When git-annex updates an export, it deletes the old content from eg the S3 bucket, but with versioning enabled, S3 retains the content and it can be accessed using a version ID (that S3 returns when storing the content). So it should be possible for git-annex to allow downloading old versions of files from such a remote.

https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html

Basically, store the S3 version ID in git-annex branch and support downloading using it.

But this has the problem that dropping makes git-annex think it's not in S3 any more, while what we want for export is for it to be removed from the current bucket, but still tracked as present in S3.

The drop from S3 could fail, or "succeed" in a way that prevents the location tracking being updated to say it lacks the content. Failing is how bup deals with it. It seems confusing to have a drop appear to succeed but not really drop, especially since dropping again would seem to do something a second time.

This does mean that git-annex drop/sync --content/assistant might try to do a lot of drops from the remote, and generate a lot of noise when they fail. Which is kind of ok for drop, since the user should be told that they can't delete the data. Could add a way to say "this remote does not support drop", and make at sync --content/assistant use that.

Note that git-annex export does not rely on location tracking to determine which files still need to be sent to an export. It uses the export database to keep track of that. This is important, because the location tracking won't be updated, as discussed above.

The haskell aws library does not seem to support enabling versioning when creating a bucket, so it would need to be done from the web console.

other considerations

If the user enables versioning in git-annex but forgets to enable it in the bucket (or later suspends versioning in the bucket), it's no big problem; old files will not be retained and git-annex will notice this in the usual way (drop locking, fsck). So, it seems that initremote does not need to check if the versioning=yes setting matches the bucket configuration. For same reasons, it's ok to enable versioning for an existing remote.

S3 does allow DELETE of a version of an object from a bucket. So it would be possible to support git annex drop of old versions of a file from an export remote. Dropping the current version though, would make the export database inconsistent; it would not know that a file in the exported tree was no longer present. I don't think that inconsitency can easily be resolved -- bear in ming that multiple repositories can have an export db, so it would need to look at location tracking for all objects in the export to find ones that some other repository dropped. And dropping of only keys that are not used in the current export doesn't help because another repository may have changed the exported tree and be relying on the dropped key being present in the export. Unless... Could export conflict resultion somehow detect that?

final plan

Add an "appendOnly" field to Remote, indicating it retains all content stored in it. done

Make git annex export check appendOnly when removing a file from an export, and not update the location log, since the remote still contains the content. done

Make git-annex sync and the assistant skip trying to drop from appendOnly remotes since it's just going to fail. done

Make exporttree=yes remotes that are appendOnly not be untrusted, and not force verification of content, since the usual concerns about losing data when an export is updated by someone else don't apply. done

Let S3 remotes be configured with versioning=yes which enables appendOnly. done

Make S3 store version IDs for exported files in the per-remote log when so configured. done

Use version IDs when retrieving keys and for checkpresent. done

all done! (but see support public versioned S3 access) --Joey