I am interested in using git annex to manage encrypted backups to Amazon S3/Glacier. So git annex will be used with the main file directory in direct mode and an encrypted S3 or Glacier remote set up in archive mode and then git annex add . and git annex sync will be run periodically. The intent is for this set up to be a backup for catastrophic failure, so I want to make sure I take care of future-proofing and disaster recovery properly. So my basic question is what would I need to have backed up and what would I have to do if the computer with the main repository died. I try to break that out into more specific questions below.

  1. S3/Glacier remotes store the contents of .git/annex/objects in encrypted form with hashes for file names and nothing else (other than a uuid). The hashes do not match the keys in the main repo. Are they the same keys encrypted? Is there a way to look up the S3 file name corresponding to a file in the repo?

  2. For shared encryption, I see the cipher text in remote.log in the git-annex branch. Assuming I didn't have access to git annex, what would I need to do to convert that cipher text into a form that I could use with gpg to decrypt files?

  3. Same question but for hybrid encryption rather than shared. I assume the answer is similar but I need to decrypt the cipher first with my gpg key? How do I do that?

  4. Assuming I did have access to git annex, what would I need to create a new repo on a new computer with access to all of the files in the S3/Glacier bucket? I think I would need my Amazon credentials (possibly already embedded in the git repo), my gpg key if using hybrid or public key encryption, and the .git folder as it was the last time files were pushed to the S3/Glacier remote (which would have the necessary decryption information for shared encryption). Is that right? I guess mainly I am checking that the remote does not store any metadata about the repo, so for git annex to be able to pull files back out I would need a backup of the .git directory and that back up would need to be up to date (can't just copy remote.log and have git annex work out the rest from the remote's contents). So for a full backup, my script would need to tar the .git directory, encrypt it, and push it to S3/Glacier separately after git annex does a sync. Then I could recover everything as long as I had a secure backup of my Amazon credentials and my encryption key(s).