I currently use duply/duplicity to back up my networked computers to my home server. I have two external HDDs and, every week or so, I bring one of these home and copy the backup files to the hard drive (I leave them on the server to easily restore files and because I have a large hard drive in that). I use some hand-written scripts to keep a copy these backup files in the cloud (Ubuntu One) until they have been copied to both hard drives, ensuring that there are always two copies of the files somewhere offsite. Out of paranoia, I also have some "standalone backups" that are just huge encrypted archives of important folders (say, my entire Photos directory) as at a certain date - in case for some reason duplicity ever stops working or I need to roll something back to a version years earlier. I am less worried about these standalone backups and manually keep one copy of each somewhere.
It sounds like Git-Annex could automate things quite nicely (and give me some neat extras, like knowing where they were). This is how I understand I should do it, but please let me know if it is the right approach or if you have any suggestions:
- Create a folder on my server called "annex" and make a Git-Annex "large backup" repository in there.
- Create a folder within that called "archive" and put a "backup" folder within that. I understand that having the backups within an archive folder will mean that they aren't automatically copied to my desktop machines etc.
- Within that "backup" folder, create two folders, one called "duplicity" and one called "standalone". Put the backups in the respective folders.
Set up gcrypt Git-Annex repositories on my two external HDDs as "small backups". This seems to just start copying files across. That surprised me, as the files are in the archive folder and I thought the default was numcopies=1. Is there some autosync option that I need to turn off? Ideally, I would like it to encrypt/decrypt primarily with my server GPG key (which I'm not worried about copying around my computers), but also encrypt to my personal GPG key (where I'd only put my public key on the server, but I know I will not lose the secret key for that). Am I right that to do that I would need to set the repos up manually with:
git init --bare /mnt/externalHDD1
git annex initremote externalHDD1 type=gcrypt gitrepo=/mnt/externalHDD1 keyid=$serverkey keyid=$personalkey
git annex sync externalHDD1
Or should the gitrepo be the location of my main Git-Annex repository? How do I make it sync up with my other repos?
I understand that I would then need to set numcopies=3 in a .gitattributes file in the "archive/backup/duplicity" directory and, say, a numcopies=2 in the "archive/backup/standalone".
- I could then add a cloud repository as a "transfer" repository and Git-Annex should only keep files on that that are not already in the right number of places (similar to what my scripts are doing now).
- I have recently upgraded my hard drive, so I have my old 1TB internal hard drive that I will be putting in a cupboard somewhere. I was thinking that I could make this an archive drive for things like one copy of my duplicity/standalone backups. I wouldn't want it to be the only copy of anything. If I just set it as an archive drive, would this work?
- Are there more clever ways of doing this? I consider my external HDDs and the cloud repo as "offsite" repositories and ideally there would always be one copy of my backups offsite (in addition to at least three overall). There would also ideally be one of each of my files "live" (in most cases my server) that could instantly push files into a cloud repo and then to wherever I am. Is there any ability to put repositories in groups and write rules like that?
Any thoughts greatly appreciated!
Aaron
This all sounds reasonable to me.
gitrepo= is the location of the repository you are setting up with gcrypt, not your existing git-annex repository.
numcopies configures the lower bound of the number of copies, not the upper bound. There's no "small backup" type, you must mean either small archive, or incremental backup. Either type of repository is going to want to have files that have not previously been archived, or backed up.
You might eventually want to write your own preferred content expressions to handle offsite repositories. I'd recommend starting simple and building up.
Thanks Joey!
Great, thanks. If I set these up outside of the WebApp, what is the best way to link them up to my main repositories?
Yes, I meant incremental backup. Thanks, understood. The page on preferred content ( http://git-annex.branchable.com/preferred_content/ ) made things a bit clearer. I didn't realise that the incremental backups etc want a copy even if it isn't necessary to satisfy numcopies. Does that mean that once it is on one backup drive, it will not be sent to the other unless I increase numcopies?
The webapp can use remotes that you have configured at the command line just as well as it can use remotes you configure using the webapp.
Each file only gets backed up to one incremental backup repository. If you have a full backup repository it will (try to) get another copy of every single file.