Hi,
I'm trying to update the script provided in topic "Decrypting files in special remotes without git-annex" to add support for different MAC algorithms, chunks and sharedpubkey encryption scheme.
I can already decrypt the files using my GPG keys directly, and I'm trying to make the key lookup function to work.
I have difficulties to parse git-annex source code, I cannot find the exact way the special remote keys are computed. I used the script from "yibe" as a starting point (refer to comment #1) because it is in pure bash I can understand:
From the doc, sharedpubkey cipher is unencrypted in the remote, it is only base64-ed and limited to 256 characters. So I added: cipher="$(echo -n "$cipher" | base64 -d | head -c 256)" like it is done for the "shared" encryption scheme.
From Yibe's script, chunks keys have an extra "-S-C--" values inside the annex key. But I doubt this part. I tried with and without it: no success.
Yibe's script handles MAC algorithms correctly. I see no issue there.
In the end, I cannot get the right remote keys for my test file. Question: is there a documentation somewhere about all this encoding chain?
internals is the place to start for all of this documentation.
I believe that the special remote key names (written by Joey) are the same as the keys you see in
.git
which are documented here: key format, you can see this is independent of encryption scheme, and hashing talks about the nested directories structure, which you would also need to know.Some more context on the Decrypting files in special remotes without git-annex tip is at the related forum post Future proofing / disaster recovery with an encrypted special remote. Encryption documentation is at encryption.
But, since you don't actually want to experiment with how keys are stored, and you don't mind relying on
git-annex
I think an easier way to experiment with different encryption schemes would be to implement your own special remote, that page contains anexample.sh
code block which is well documented. Withexample.sh
you would just need to changedoretrieve
to decrypt before retrieving anddostore
encrypt before storing.Thks. My apologies for the long answer delay.
Let's clarify what I'm trying to do: from the files of an encrypted special remote, with their data and filenames encrypted, I'd like to recover the original files back, without using git-annex at all, just in case one day I 'm not able to use git-annex anymore. This is the purpose of the script presented in Future proofing / disaster recovery with an encrypted special remote
"I believe that the special remote key names (written by Joey) are the same as the keys you see in .git"
This is wrong, and the confusion is this: the local repository key does not change (it is not encrypted), but the file sent to the special remote uses an encrypted key (filename if you prefer). Yes, the file sent to the special remote has its data encrypted and its filename hashed and encrypted too.
"I think an easier way to experiment with different encryption schemes would be to implement your own special remote"
I did almost this actually: I'm using git-annex "--debug" command switch which shows all git commands under the hood. In these, I can see the final encrypted key, which is different than the original one.
An exemple is better than a long speech: I'm using a rclone special remote with the "shared pubkey" encryption scheme (see Encryption, section sharedpubkey). In my local test repo, I have a single file. I can upload and download the file from the special remote as expected.
Local filename: ./test.pdf
Local file: .git/annex/objects/F3/pf/SHA256E-s127597--abc14a6cf4ebb79fdc2eb0d1bf9c304cfce30959661e72e98536faf1bb1b393b.pdf/SHA256E-s127597--abc14a6cf4ebb79fdc2eb0d1bf9c304cfce30959661e72e98536faf1bb1b393b.pdf
Local key: SHA256E-s127597--abc14a6cf4ebb79fdc2eb0d1bf9c304cfce30959661e72e98536faf1bb1b393b.pdf
Special remote debug log:
git annex get ./test.pdf --debug
As you can see, the hashed key is:
00a/620/GPGHMACSHA512--9cbf6fe8def32a6b434c8bfc8991916ff425a0c990be48fffe647c5ab7a6b294ba38e96fa58aebd59eb5b14d0e98475b241cd6098f08f2c10953b999d1bcd01c
It has been hashed (openssl HMACSHA512) and encrypted with GPG. I'd like to be able to recover the original key (SHA256E-s127597--abc14a6cf4ebb79fdc2eb0d1bf9c304cfce30959661e72e98536faf1bb1b393b.pdf) without git-annex. Then I have backup-ed the map between local keys and original filenames, so I'm able to get my file back without using git-annex.
Aaah. I see you have commented on the Future proofing / disaster recovery with an encrypted special remote as well. I'll try to test with
sharedpubkey
if I get a chance.Have you tried looking at the generated cipher?
git show git-annex:remote.log | grep 'name='"your-remote-name "
? Did you try addingIFS= read -rd '' cipher < <( printf "$cipher\n" )
?Yes, I had a look at the cipher, and I cannot tell much about it. It is ... random And yes, I tried the removal of the '\n' characters like in pubkey encryption scheme, with no success so far.
I'll post my test script, which is a basic variation of the one given in Decrypting_files_in_special_remotes_without_git-annex to make multiple calls to lookup_key().
OK. Thanks for all of the patience with the back-and-forth. After trying to follow these steps and re-reading your posts I believe that what you are trying to do is unfortunately not possible with your current setup.
Although you are using public keys to encrypt your content,
git-annex
is hashing the original key names (using a one-way hash) aka SHA1 Digest signed with your public key before storing on the special remote. The public key is just used by the digest algorithm for signing and does not enable you to recover the hashed key. Thegit-annex
special remote protocol does not require special remotes to actually storegit-annex
keys, it only requires that special remotes can retrieve content given a key.In Joey's source code
Crypto.hs
in reference to this key generation process he does have a comment to this effect, "The encryption does not need to be reversable". I assume Joey used hashes for simplicity and so the filenames could stay short.In Future proofing / disaster recovery with an encrypted special remote Joey also mentions “That's what's "special" about special remotes vs regular git remotes: They only store the content of annexed files and not the git repository. Back up the git repository separately”
So, it seems, in order to recover the original key names you will either have to keep a backup of the original repository or create a new special remote that stores these in a recoverable fashion (instead of using a digest). Perhaps some git commit hook that zips up the
.git
directory and adds it to your repository could be of use?Hi, I understand your points, but I think there is a little misunderstanding.
I agree there may be no way to revert the hashing, even given the public key and its corresponding private key (that I have of course, to be able to decrypt the files content). But this is not what I'm trying to do; actually, I want to do the opposite: from the local key, compute the hashed key. In other words, I want to do exactly what git-annex does already internally, but in a little shell script, independent of git-annex (in case git-annex is unusable one day, a.k.a the "disaster").
This is the whole purpose of the function "lookup_key()" of the shell script of page Future proofing / disaster recovery with an encrypted special remote: the functions tries to hash/use the keys the same way git-annex does. It prints the final hashed key used in the special remote. This function allows a user to find which file in the special remote corresponds to a given file in the local repository.
This mapping "local file name" / "local key" / "special remote hashed key" is what I want to backup. In case of trouble to run git-annex one day, this mapping would allow me to rename my special remote files to their original filename, after downloading them using third party-tools (sftp, scp, rsync, whatever) and decrypting their content using my private key. All this without git-annex and its special remote third-party script.
Do you see what I'm trying to do ? I rely much on this lookup_key() function, which is basically already implemented inside git-annex. My main pb is I don't understand Haskell, I can only lurk around in the code, but I didn't identify the sequence of operations about these hashed keys.
I have addressed the sharedpubkey thing in the other thread.
Chunk keys may have a -S as well as the -C, if the special remote was set up with new-style chunking enabled.
A remote can have several different chunk sizes over its lifetime; the chunk size used for a given key is in the .log.cnk file in the git-annex branch, documented in internals.
The easy way to test if you are generating the right key, prior to HMAC encrypting it, is to set up a non-encrypted special remote with the same chunking configuration, and look at the chunk keys used when files are stored in it.
sharedpubkey
seems to be using a different method thanshared
to created encrypted filenames.I tested the script fragement that I posted.
The obvious difference between shared and sharedpubkey, if you look at that script fragement, is that shared only uses the first 256 bytes of the cipher to encrypt filenames, while sharedpubkey uses the entire cipher.
I managed to get the lookup_key() working with sharedpubkey, chunks and mac SHA512, by combining Joey's cipher and yibe's script.
I have to clean the script before posting, but it is basically Yibe's script (at the bottom of this page) with the sharedpubkey cipher above, and nothing more.
It is working good for my test file, but this file is made of 1 chunk only. Before relying on the script, more tests should be done with bigger files, e.g more chunks.
Anyway, thks to all of you for your help.
The original script with sharedpubkey support is available here
Yibe's enhanced script with sharedpubkey, different mac and chunk support is available here. Chunk management should be verified before relying on it, I haven't done it yet except for a basic case (1 chunk per file). Mac support seems ok.