An exporttree=yes remote is limited to storing the files in the tree that is exported to it.

However, it would be possible to instead make these remotes be able to store any key. Keys that were not in the exported tree would be stored under .git/annex/objects/

When updating the tree exported to the remote, files that were deleted would have their content renamed to the .git/annex/objects/ location. And keys that were already stored there but were not used in the previous tree would be moved to the file in the new tree that uses them.

In fact, git-remote-annex already does this for its own (very special) keys, in order to support exporttree=yes remotes.

Another place this would be useful is proxying to exporttree=yes special remotes.

With this change, a user could just git-annex copy --to remote and copy whatever files they want into it. Then later git-annex export master --to remote would efficiently update the tree there.

If sending individual files like that works, when git-annex drop --from remote would also work. And dropping a key that is used by a file in the exported tree would delete that file. One problem with this is git-annex import would then import a tree with a deleted file. Probably not what the user wanted to do! So it seems that git-annex drop should fail if the file is part of the exported tree, and only allow dropping keys that are not, from the .git/annex/objects/ location in the special remote.

Some remotes don't support renameExport, and if this was used with such a remote, files would need to be re-uploaded at when updating the tree exported to the remote, and the .git/annex/objects/ files deleted. So users of such remotes would not want to use workflows that copy files to them before updating the export tree.

An old version of git-annex would not know about the new way to store keys on the remote, and so things like git-annex fsck --from remote would update bad data. Fsck is not really a problem since a fsck can be run with a new git-annex to recover. But compatability with old versions needs to be carefully considered if making this change.

There is also the potential for breaking existing workflows, eg a user might be running a command like git-annex push that doesn't currently send keys to the remote that are not in the exported tree. If it started sending all (wanted) keys to an exporttree=yes remote, that would be surprising for an existing user!

Perhaps this should not be "exportree=yes", but something else.

Currently, if a remote is configured with "exporttree=foo", that is treated the same as "exporttree=no". So this will need to be a config added to exporttree=yes in order to interoperate with old git-annex.

Call it "exporttree=yes annexobjects=yes" --Joey


Consider two repositories A and B that both have access to the same exporttree=yes special remote R.

  • A exports tree T1 to R
  • B pulls from A, so knows R has tree T1
  • A exports tree T2 to R, which deletes file foo. So it is moved to R's .git/annex/objects. Or, alternatively, foo is deleted, and the key is then copied to R again, also to .git/annex/objects.
  • B exports tree T2 to R also. So B deletes file foo. But it was not present anyway. If B then marks the key as not present in R, we will have lost track of the fact that A moved it to the objects location.

So, when calling removeExport, have to also check if the key is present in the objects location. If so, either don't record the key as missing, or also remove from the objects location.


Could a remote with annexobjects=yet and exporttree=yes but without importtree=yes not be forced to be untrusted?

If not, the retrieval from the annexobjects location needs to do strong verification of the content.

If the annexobjects directory only gets keys uploaded to it, and never had exported files renamed into it, its content will always be as expected, and perhaps the remote does not need to be untrusted.

OTOH, if an exported file that is being deleted in an updated export gets renamed into the annexobjects directory, it's possible that the file has in fact been overwritten with other content (by git-annex in another clone of the repository), and so the object in annexobjects would not be as expected. So unfortunately, it seems that rename can't be done without forcing untrusted.

Note that, exporting a new tree can still delete any file at any time. If the remote is not untrusted, that could violate numcopies. So, performUnexport would need to check numcopies first, when using such a remote.

Even if they are not untrusted, an exported file can't be counted as a copy. Only a file in the annexobjects location can be. So the remote's checkPresent will perhaps need to return false for files that are exported? But surely other things than numcopies use checkPresent. So this might need a change to checkPresent's type to indicate the difference.

Crazy idea: Split the remote into two uuids. Use one for the annexobjects directory, and the other for the exported files. This clean separation avoids the above problem. But would be confusing for the user. HOWEVER, what if the two were treated as parts of the same cluster....?

This may be worth revisiting later, but for now, I am leaning to keeping it untrusted, and following down that line to make it as performant as possible.


Implementing in the "exportreeplus" branch --Joey

done --Joey