While using HMAC instead of "plain" hash functions is inherently more secure, it's still a bad idea to re-use keys for different purposes.

Also, ttbomk, HMAC needs two keys, not one. Are you re-using the same key twice?

Compability for old buckets and support for different ones can be maintained by introducing a new option and simply copying over the encryption key's identifier into this new option should it be missing.

Bug was filed prematurely, but was a good bit of paranoia, and gpg and hmac are given different secret keys done --Joey

Thanks :) -- RIchiH

S3 doesn't support encryption at all, yet.

It certainly makes sense to use a different portion of the encrypted secret key for HMAC than is uses as the gpg symmetric encryption key.

The two keys used in HMAC would be the secret key and the key/value key for the content being stored.

There is a difficult problem with encrypting filenames in S3 buckets, and that is determining when some data in the bucket is unused for dropunused. I've considered two choices:

  1. gpg encrypt the filenames. This would allow dropunused to recover the original filenames, and is probably more robust encryption. But it would double the number of times gpg is run when moving content in/out, and to check for unused content, gpg would have to be run once for every item in the bucket, which just feels way excessive, even though it would not be prompting for a passphrase. Still, haven't ruled this out.

  2. HMAC or other hash. To determine what data was unused the same hash and secret key would have to be used to hash all filenames currently used, and then that set of hashes could be interested with the set in the bucket. But then git-annex could only say "here are some opaque hashes of content that appears unused by anything in your current git repository, but there's no way, short of downloading it and examining it to tell what it is". (This could be improved by keeping a local mapping between filenames and S3 keys, but maintaining and committing that would bring pain of its own.)

Comment by joey Wed Mar 30 14:32:34 2011

After mulling this over, I think actually encrypting the filenames is preferable.

Did you consider encrypting the symmetric key with an asymmetric one? That's what TrueCrypt etc are using to allow different people access to a shared volume. This has the added benefit that you could, potentially, add new keys for data that new people should have access to while making access to old data impossible. Or keys per subdirectory, or, or, or.

As an aside, could the same mechanism be extended to transparently encrypt data for a remote annex repo? A friend of mine is interested to host his data with me, but he wants to encrypt his data for obvious reasons.

Comment by Richard Wed Mar 30 17:01:40 2011

Yes, encrypting the symmetric key with users' regular gpg keys is the plan.

I don't think that encryption of content in a git annex remote makes much sense; the filenames obviously cannot be encrypted there. It's more likely that the same encryption would get used for a bup remote, or with the directory remote I threw in today.

Comment by joey Wed Mar 30 18:15:18 2011
Picking up the automagic encryption idea for annex remotes, this would allow you to host a branchable-esque git-annex hosting service. (Nexenta with ZFS is a cheap and reliable option until btrfs becomes stable in a year or five).
Comment by Richard Wed Mar 30 18:20:56 2011
This is brain-storming only so the idea might be crap, but a branch could keep encrypted filenames while master keeps the real deal. This might fit into the whole scheme just nicely or break future stuff in a dozen places, I am not really sure yet. But at least I can't forget the idea, now.
Comment by Richard Wed Mar 30 18:59:19 2011
OTOH, if encryption makes a bup backend more likely disregard the idea above ;)
Comment by Richard Wed Mar 30 19:02:20 2011