simpler, trusted export remotes

Currently, some issues impede the use of export remotes: (1) they're untrusted, except for versioned ones -- and from those keys cannot be dropped; (2) using them is different than using normal remotes: one can't just copy or move keys to them, one has to first make a tree-ish. Maybe this could be fixed, as follows. To copy a key to an export remote, if the key is not yet present in it, put it under .keys/aaa/bbb/keyname on the remote. That is, take the tree-ish currently on the remote, merge .keys/aaa/bbb/keyname with it, and put that on the remote. To drop a key from an external remote, take the tree-ish currently on the remote, drop all instances of the key from it, and push the changed tree-ish to the remote. To git-annex-export add an option --add , which will add the tree-ish to the tree-ish currently on the remote, without losing any keys currently on the remote: take the tree-ish currently on the remote; overlay on it the treeish being exported; for any files that would be overwritten, if no copies of that key would be left, move it to .keys/aaa/bbb/keyname in the tree-ish that is then pushed to the remote.

This way, can always just copy any tree to the remote, without worrying about losing data.

RSS Atom

comment 1

This is essentially using namespacing on the remote to implement an equivilant of S3 versioning, though with less efficiency.

A remote's implementation could do the same thing and claim it supports versioning, without any change to the current remote interface or user interface.

Except for removing versioned content, which indeed would need to update the tree to reflect the removal. From Remote.Helper.ExportImport:

            -- Removing a key from an export would need to
            -- change the tree in the export log to not include
            -- the file. Otherwise, conflicts when removing
            -- files would not be dealt with correctly.
            -- There does not seem to be a good use case for
            -- removing a key from an export in any case.
            , removeKey = \_k -> do
                    warning "dropping content from an export is not supported; use `git annex export` to export a tree that lacks the
                    return False

I think that there are two separate things here that of course would work well together, but neither depends on the other. Generic versioning via namespacing could be done with or without support for removeKey (and vice-versa).

Comment by joey — Tue Mar 19 17:06:50 2019

Remove comment

comment 2

"A remote's implementation could do the same thing" -- right, but if git-annex implements this (through the current special remote protocol) then one implementation covers all export-supporting remotes. Also, implementing special remotes would become simpler, since they could just implement the export part of the protocol, and the key-value store then gets implemented in terms of that. But the main plus would be the uniformity: export remotes would be key-value remotes with an added ability (to store files in human-readable locations), rather than a separate untrustworthy species.

Comment by Ilya_Shlyakhter — Tue Mar 19 20:40:49 2019

Remove comment

comment 3

I'm doubtful that this would actually let the interface be simplified, there are too many differences in the capabilities of different remotes.

For example, if a S3 bucket has versioning disabled, and git-annex imports from it, then in this scheme it would need to re-upload the import to the key-value location. But, if a S3 bucket has versioning enabled, that upload would be redundant and should not be done. And, if a S3 bucket is read-only, then an import can't re-upload.

Also, not all users are going to want export remotes to store past versions of files; if they're used for some kind of publication, you may not want the exposure/cost of publishing old versions of files there. Of course, you could drop the old versions from the remote later, but this would be a workflow change from how export remotes work now.

So it seems to me that this would need to be an optional thing.

Comment by joey — Fri Mar 22 14:10:41 2019

Remove comment

simplifying the interface

"I'm doubtful that this would actually let the interface be simplified" -- I only meant that the minimum required interface would be simplified, in that git-annex could provide a default implementation of key-value remote methods in terms of the export remote interface; but any given remote could provide a more efficient implementation of these methods, overriding the default ones.

But the main benefit would be to simplify the user-facing interface: as far as the user is concerned, all special remotes could be trusted, and accessed with the same basic commands, whether configured as export or not.

"if a S3 bucket is read-only, then an import can't re-upload." -- if the special remote is configured as read-only, then git-annex itself would not attempt to upload things there, no?

"not all users are going to want export remotes to store past versions of files" -- maybe, there could be an option to store past versions encrypted, while storing current versions plain?

Comment by Ilya_Shlyakhter — Tue Mar 26 17:40:32 2019

Remove comment

comment 5

Another approach would be to let a key-value remote and an export remote be combined into one "combo" remote. To the user, this would look like one trusted, versioned remote supporting both key-value and export operations. Keys overwritten in the export remote would be stored in the key-value remote. Either keys or trees could be copied to the combo remote, keys going to the key-value remote and trees to the export remote. The downside is that files could not be moved directly between the backing remotes. But the inefficiency might not always matter. Also, TRANSFER and TRANSFEREXPORT could be extended to optionally accept URIs in lieu of content, and to do the transfer in the cloud.

More generically, maybe repository groups could be treated as special remotes? You'd configure the minimum number of copies of a key in a group. You could then put a key-value remote and an export remote in a group. When copying a tree to the group, if this would cause old keys to be overwritten, git-annex would first copy them to a key-value remote in the group, to preserve the per-group minimum number of copies constraint.

Comment by Ilya_Shlyakhter — Tue Mar 26 18:27:04 2019

Remove comment

comment 6

I think that the --sameas feature could be used to implement those combo remotes?

Comment by joey — Thu Jan 30 18:05:29 2020

Remove comment

Add a comment