Currently, some issues impede the use of export remotes: (1) they're untrusted, except for versioned ones -- and from those keys cannot be dropped; (2) using them is different than using normal remotes: one can't just copy or move keys to them, one has to first make a tree-ish. Maybe this could be fixed, as follows. To copy a key to an export remote, if the key is not yet present in it, put it under .keys/aaa/bbb/keyname on the remote. That is, take the tree-ish currently on the remote, merge .keys/aaa/bbb/keyname with it, and put that on the remote. To drop a key from an external remote, take the tree-ish currently on the remote, drop all instances of the key from it, and push the changed tree-ish to the remote. To git-annex-export add an option --add , which will add the tree-ish to the tree-ish currently on the remote, without losing any keys currently on the remote: take the tree-ish currently on the remote; overlay on it the treeish being exported; for any files that would be overwritten, if no copies of that key would be left, move it to .keys/aaa/bbb/keyname in the tree-ish that is then pushed to the remote.
This way, can always just copy any tree to the remote, without worrying about losing data.
This is essentially using namespacing on the remote to implement an equivilant of S3 versioning, though with less efficiency.
A remote's implementation could do the same thing and claim it supports versioning, without any change to the current remote interface or user interface.
Except for removing versioned content, which indeed would need to update the tree to reflect the removal. From Remote.Helper.ExportImport:
I think that there are two separate things here that of course would work well together, but neither depends on the other. Generic versioning via namespacing could be done with or without support for removeKey (and vice-versa).
I'm doubtful that this would actually let the interface be simplified, there are too many differences in the capabilities of different remotes.
For example, if a S3 bucket has versioning disabled, and git-annex imports from it, then in this scheme it would need to re-upload the import to the key-value location. But, if a S3 bucket has versioning enabled, that upload would be redundant and should not be done. And, if a S3 bucket is read-only, then an import can't re-upload.
Also, not all users are going to want export remotes to store past versions of files; if they're used for some kind of publication, you may not want the exposure/cost of publishing old versions of files there. Of course, you could drop the old versions from the remote later, but this would be a workflow change from how export remotes work now.
So it seems to me that this would need to be an optional thing.
"I'm doubtful that this would actually let the interface be simplified" -- I only meant that the minimum required interface would be simplified, in that git-annex could provide a default implementation of key-value remote methods in terms of the export remote interface; but any given remote could provide a more efficient implementation of these methods, overriding the default ones.
But the main benefit would be to simplify the user-facing interface: as far as the user is concerned, all special remotes could be trusted, and accessed with the same basic commands, whether configured as export or not.
"if a S3 bucket is read-only, then an import can't re-upload." -- if the special remote is configured as read-only, then git-annex itself would not attempt to upload things there, no?
"not all users are going to want export remotes to store past versions of files" -- maybe, there could be an option to store past versions encrypted, while storing current versions plain?
Another approach would be to let a key-value remote and an export remote be combined into one "combo" remote. To the user, this would look like one trusted, versioned remote supporting both key-value and export operations. Keys overwritten in the export remote would be stored in the key-value remote. Either keys or trees could be copied to the combo remote, keys going to the key-value remote and trees to the export remote. The downside is that files could not be moved directly between the backing remotes. But the inefficiency might not always matter. Also, TRANSFER and TRANSFEREXPORT could be extended to optionally accept URIs in lieu of content, and to do the transfer in the cloud.
More generically, maybe repository groups could be treated as special remotes? You'd configure the minimum number of copies of a key in a group. You could then put a key-value remote and an export remote in a group. When copying a tree to the group, if this would cause old keys to be overwritten, git-annex would first copy them to a key-value remote in the group, to preserve the per-group minimum number of copies constraint.