encrypt just the annex on git+annex hosting site

Some git hosting sites (e.g forgejo instances tweaked to support git-annex) can store annexed contents. The goal here would be to encrypt the annexed file contents, but not the git repo. What would it take?

Git-annex encryption can be enabled for special remotes, but in this case there is only a "regular" git remote so there is no way to set the config.

My first intuition was to initialize a type=git special remote pointing to the same location, but it does not support encryption (initremote fails with git-annex: Unexpected parameters: encryption keyid).

There is the gcrypt special remote (and it worked with the forgejo instance I tried), but it encrypts / obfuscates everything (file names, commits etc.) and turns each push into a force push.

The advantage of having the annexed files but not the git repo encrypted is that the file tree, commit history, readme and all the things typically displayed by the site would still be viewable (communicating repository layout, contents), but GPG keys would be used to control practical access (possibly on top of site's access premissions).

Thanks in advance for considering! -- MSz

done, by implementing the mask special remote --Joey

RSS Atom

👍 +1 for encrypting the annex on regular git remotes

Funny, playing around with my own forgejo-aneksajo instance, I thought about exactly that 😀 Being able to encrypt only the annex but keeping the repo open would be cool.

Comment by nobodyinperson — Thu Sep 12 14:51:20 2024

Remove comment

comment 2

Another place this came up is https://git-annex.branchable.com/design/passthrough_proxy/#index14h2 where a proxy to an encrypted special remote necessarily does encryption server side, but the user may not want the server to see their unencrypted files.

There I suggested "adding a special remote that does its own client-side encryption in front of the proxy". Such a layered special remote could also be used with a git remote. There would be some complexity cost, since you would have two remote names, one used for git and the other for git-annex.

Implementing object encryption in git remotes is certianly possible, but it would be a special case and the existing code for encrypting special remotes (particularly Remote.Helper.Special.specialRemote) would not be able to be reused.

There's also the problem that, if such a git repository is added as a regular remote, and the git-annex branch that indicates that it is encrypted has not yet been pulled, git-annex would not realize that it is supposed to be encrypted, so would send unencrypted objects to it. This seems like an easy situation to accidentially get into eg:

git remote add foo http://example.com/
git annex move --to foo # oops unencrypted

Overall I prefer the idea of layering an encrypted special remote to complicating the git remote with encryption. Enabling that special remote could make git-annex treat the underlying remote as annex-ignore, to prevent accidentially sending unencrypted objects to it.

There could also be situations where you want to store some files unencrypted on a git hosting site to let them be accessible via its UI, but encrypt other files, and the layered special remote also allows for that kind of thing.

Comment by joey — Wed Sep 18 12:17:02 2024

Remove comment

comment 3

I've had the opportunity to revisit this old question of mine, and I'd like to ask some further questions.

I understand the preference for git+annex remotes to not support encryption. However, I'm not sure how to understand layering the special remote (in particular, in front of a proxy).

Is this possible with the existing git-annex tooling, or hypothetical? As far as I understand, a proxy 1) has to be a git repo thus cannot have encryption enabled, and 2) can push to an encrypted special remote (which must be a different type than git). It cannot pull encrypted annex keys from one special remote and put them unmodified into another (especially not into an annex-supporting git remote), right?

The (modified) Forgejo instances we use support git-annex, i.e. git remotes which do not ignore the annex and accept content pushes (I call that git+annex). AFAIK the internal layout of such a Forgejo repository is not different from a bare repository (DataLad blog: forgejo-anexajo - behind the curtain). The goal would be to have the annex objects sent encrypted to a Forgejo instance, inside or alongside the git repository. It seems that we would need a "layer" sitting on top of a normal git remote - but I don't see what that layer could look like.

The best proxy set-up I came up with was a like this, with the encrypted remote behind the proxy (I'm using bare repository as the push target - not sure if Forgejo could be bent to our will like that):

local repository ----> (bare repository on a server) --proxy--> (directory special remote on the same server)
                                 "origin"                          "storage" / proxied as "origin-storage"

It worked, also with encryption, but the setup has limitations. First, encryption happens server-side. Second, only sharedpubkey encryption does not require private keys to be on the server -- in which case pushing to the proxied "origin-storage" works, but getting (necessarily) requires enabling "storage" locally.

Comment by msz — Mon Apr 7 17:06:15 2025

Remove comment

comment 4

What I was talking about is still hypothetical. But I think it would be fairly easy to implement.

This would be a regular special remote, so it supports encryption=yes and related settings as usual. When a file is stored to this special remote, it would take the object (which would be encrypted if it were so configured), and store it on the remote it is layered on top of. Retrieval would get the object from the layered remote. And so on.

That could probably be implemented outside git-annex as an external special remote. It might be better to build it into git-annex, to allow for better streaming of files through it.

When used on top of a regular git remote, it would result in the remote having .git/annex/objects/ containing some encrypted keys. (It could also contain un-encrypted keys stored in it as usual.)

The proxy would not be needed to use it. A proxy is just another case where a layered special remote could be useful, when the user wants client-side encryption.

A few gotchas I can see:

Running `git-annex unused against the repository storing those encrypted keys would see them as unused.
If the special remote did not use encryption, it would be possible to get into situations where drop violates numcopies. Eg, a drop could verify that the key being dropped from the special remote is present in the remote it's layered on top of and so count it as a copy. But then dropping from the special remote would remove it from the other remote. Probably the solution is for the special remote to require encryption.
If a file is stored on both this special remote and on the underlying remote, that would count as 2 copies. But losing a single repository risks losing both copies at once. Same problem if multiple of these special remotes are set up all storing to the same underlying remote. I think this is minor, because there would be 2 actual copies, just copies that happen to be on the same drive.
encryption=shared will not add any security if the underlying remote is a git repository, because pushing the git-annex branch there will make the encryption key available to anyone who can access the git repository. Instead will need to use encryption=pubkey. (Since this is a bit non-obvious, it should probably reject attempts to do that.)

I have some early work (documentation) in the maskremote branch.

Comment by joey — Wed Apr 9 15:17:25 2025

Remove comment

comment 5

Thank you, this is very informative!

Could such a special remote use the same "transport" as the underlying remote (thinking of p2p http in particular), which would mean the same authentication & the same set of permissions server side?

Comment by msz — Thu Apr 10 16:56:50 2025

Remove comment

comment 6

Implemented this as the mask special remote.

For example, to make a remote that stores annexed files encrypted in the origin remote:

git annex initremote encryptedorigin type=mask remote=origin encryption=hybrid pubkey=id@joeyh.name

Comment by joey — Fri Apr 11 17:20:08 2025

Remove comment

comment 7

Could such a special remote use the same "transport" as the underlying remote (thinking of p2p http in particular), which would mean the same authentication & the same set of permissions server side?

Yes, the underlying remote is used as-is, whatever it is.

Comment by joey — Fri Apr 11 17:23:59 2025

Remove comment

Thank you for implementing

Thank you! I just tried the mask remote with our forgejo-aneksajo instance with HTTPS push and the mask remote works just as intended. This will be very useful!

Comment by msz — Tue Apr 29 17:00:36 2025

Remove comment

Add a comment