Some git hosting sites (e.g forgejo instances tweaked to support git-annex) can store annexed contents. The goal here would be to encrypt the annexed file contents, but not the git repo. What would it take?
Git-annex encryption can be enabled for special remotes, but in this case there is only a "regular" git remote so there is no way to set the config.
My first intuition was to initialize a type=git special remote pointing to the same location, but it does not support encryption
(initremote
fails with git-annex: Unexpected parameters: encryption keyid
).
There is the gcrypt special remote (and it worked with the forgejo instance I tried), but it encrypts / obfuscates everything (file names, commits etc.) and turns each push into a force push.
The advantage of having the annexed files but not the git repo encrypted is that the file tree, commit history, readme and all the things typically displayed by the site would still be viewable (communicating repository layout, contents), but GPG keys would be used to control practical access (possibly on top of site's access premissions).
Thanks in advance for considering! -- MSz
Another place this came up is https://git-annex.branchable.com/design/passthrough_proxy/#index14h2 where a proxy to an encrypted special remote necessarily does encryption server side, but the user may not want the server to see their unencrypted files.
There I suggested "adding a special remote that does its own client-side encryption in front of the proxy". Such a layered special remote could also be used with a git remote. There would be some complexity cost, since you would have two remote names, one used for git and the other for git-annex.
Implementing object encryption in git remotes is certianly possible, but it would be a special case and the existing code for encrypting special remotes (particularly Remote.Helper.Special.specialRemote) would not be able to be reused.
There's also the problem that, if such a git repository is added as a regular remote, and the git-annex branch that indicates that it is encrypted has not yet been pulled, git-annex would not realize that it is supposed to be encrypted, so would send unencrypted objects to it. This seems like an easy situation to accidentially get into eg:
Overall I prefer the idea of layering an encrypted special remote to complicating the git remote with encryption. Enabling that special remote could make git-annex treat the underlying remote as annex-ignore, to prevent accidentially sending unencrypted objects to it.
There could also be situations where you want to store some files unencrypted on a git hosting site to let them be accessible via its UI, but encrypt other files, and the layered special remote also allows for that kind of thing.
I've had the opportunity to revisit this old question of mine, and I'd like to ask some further questions.
I understand the preference for git+annex remotes to not support encryption. However, I'm not sure how to understand layering the special remote (in particular, in front of a proxy).
Is this possible with the existing git-annex tooling, or hypothetical? As far as I understand, a proxy 1) has to be a git repo thus cannot have encryption enabled, and 2) can push to an encrypted special remote (which must be a different type than git). It cannot pull encrypted annex keys from one special remote and put them unmodified into another (especially not into an annex-supporting git remote), right?
The (modified) Forgejo instances we use support git-annex, i.e. git remotes which do not ignore the annex and accept content pushes (I call that git+annex). AFAIK the internal layout of such a Forgejo repository is not different from a bare repository (DataLad blog: forgejo-anexajo - behind the curtain). The goal would be to have the annex objects sent encrypted to a Forgejo instance, inside or alongside the git repository. It seems that we would need a "layer" sitting on top of a normal git remote - but I don't see what that layer could look like.
The best proxy set-up I came up with was a like this, with the encrypted remote behind the proxy (I'm using bare repository as the push target - not sure if Forgejo could be bent to our will like that):
It worked, also with encryption, but the setup has limitations. First, encryption happens server-side. Second, only sharedpubkey encryption does not require private keys to be on the server -- in which case pushing to the proxied "origin-storage" works, but getting (necessarily) requires enabling "storage" locally.
What I was talking about is still hypothetical. But I think it would be fairly easy to implement.
This would be a regular special remote, so it supports encryption=yes and related settings as usual. When a file is stored to this special remote, it would take the object (which would be encrypted if it were so configured), and store it on the remote it is layered on top of. Retrieval would get the object from the layered remote. And so on.
That could probably be implemented outside git-annex as an external special remote. It might be better to build it into git-annex, to allow for better streaming of files through it.
When used on top of a regular git remote, it would result in the remote having
.git/annex/objects/
containing some encrypted keys. (It could also contain un-encrypted keys stored in it as usual.)The proxy would not be needed to use it. A proxy is just another case where a layered special remote could be useful, when the user wants client-side encryption.
A few gotchas I can see:
encryption=shared
will not add any security if the underlying remote is a git repository, because pushing the git-annex branch there will make the encryption key available to anyone who can access the git repository. Instead will need to useencryption=pubkey
. (Since this is a bit non-obvious, it should probably reject attempts to do that.)I have some early work (documentation) in the
maskremote
branch.Thank you, this is very informative!
Could such a special remote use the same "transport" as the underlying remote (thinking of p2p http in particular), which would mean the same authentication & the same set of permissions server side?
Implemented this as the mask special remote.
For example, to make a remote that stores annexed files encrypted in the origin remote:
Yes, the underlying remote is used as-is, whatever it is.