The idea is to have a type of key that is based on a hash of the annexed file, but with some form of encryption or password protection.

So someone who can access the git repository is prevented from checking known hashes of files, and learning about content in the repository that is not available to them. (However, they would be able to see filenames and commit metadata, which might also expose information about what the files are. This is not a fully encrypted git repository like can be made with gcrypt.)

This might be an added layer of security, or the git repository might be made public, including perhaps some of the annexed files, while this is used to obscure the hashes of some other annexed files. For example, a scientific dataset might make public files that are derived from patient data, and use this for the patient data files.


If two such repositories were merged, git-annex would need to somehow be able to tell how to decrypt a key, which could have come from either repository. So it seems that the key needs to include within it some identifier of the secret that is used to encrypt it. For example:

ENC--ident--foo

Where ident would be something like a UUID of the secret, and foo is the file's hash (and perhaps size, or indeed a whole regular key) that is encrypted by the secret.


Should the key's size be included only in encrypted form, or in plaintext?

Since git-annex constructs a Key without using the associated Backend, it currently parses the size field the same way for each type of key. So, it does not seem to be possible to encrypt the size field and use that encrypted size to populate keySize. What would be doable though, is to replace uses of keySize with a new Backend method that returns the size.


Should the encryption method be reversible?

If the encryption method is not reversible, the key's size would need to be included in plaintext, or left out entirely.

But it does not seem necessary for the encryption method to be reversible otherwise. Consider if scrypt was used. When adding a file, git-annex would first hash it, and then run it through scrypt. That is not reversible, so when fscking a file, just repeat the same process and compare the resulting scrypt keys.

Not being reversible is a nice benefit, because it makes it much harder for an attacker to brute-force. If it's reversible the attacker can brute-force the user's password, looking for a password that decrypts to something that looks right.

--Joey

wontfix; see description --Joey