After you've used git-annex for a while, you will have data in your repository that you don't want to keep in the limited disk space of a laptop or a server, but that you don't want to entirely delete.
This is where git-annex's support for offline archive drives shines.
You can move old files to an archive drive, which can be kept offline if
it's not practical to keep it spinning. Better, you can move old files to
two or more archive drives, in case one of them later fails to spin up.
(One consideration when future proofing your archive.)
To set up an archive drive, you can take any removable drive, format it with a filesystem you'll be able to read some years later, and then follow the walkthrough to set up a repository on it that is a git remote of the repository in your computer you want to archive. In short:
cd /media/archive
git clone ~/annex
cd ~/annex
git remote add archivedrive /media/archive/annex
git annex sync archivedrive
Don't forget to tell git-annex this is an archive drive (or a backup drive; see preferred content.). Also, give the drive a description that matches something you write on its label, so you can find it later:
git annex group archivedrive archive
git annex wanted archivedrive standard
git annex describe archivedrive "my first archive drive (SATA)"
Or you can use the assistant to set up the drive for you.
(Nice video tutorial here: git-annex assistant archiving)
(Keeping the archive drive in an offsite location? Consider encrypting it! See fully encrypted git repositories with gcrypt.)
Then, when the archive drive is plugged in, you can easily copy files to it:
cd ~/annex
git-annex copy --auto --to archivedrive
Or, if you're using the assistant, it will automatically notice when the drive gets plugged in and copy files that need to be archived.
When you want to get rid of the local file, leaving only the copy on the archive, you can just:
git annex drop file
The archive drive has to be plugged in for this to work, so git-annex can verify it still has the file. If you had configured git-annex to always store 2 copies, it will need 2 archive drives plugged in. You may find it useful to configure a trust setting for the drive to avoid needing to haul it out of storage to drop a file.
Now the really nice thing. When your archive drive gets filled up, you can simply remove it, store it somewhere safe, and replace it with a new drive, which can be mounted at the same location for simplicity. Set up the new drive the same way described above, and use it to archive even more files.
Finally, when you want to access one of the files you archived, you can just ask for it:
git annex get file
If necessary git-annex will tell you which archive drive you need to pull out of storage to get the file back. This is where the description you entered earlier comes in handy.
Shouldn't it be git annex sync archivedrive instead of git annex sync archive in the first examples. As the name of the remote is "archivedrive", IMO "sync" should be called with the name of the remote.
Is the feature "git-annex copy --auto --to usb" working? I have created a backup repo on my usb drive but when I call "git-annex copy --auto --to usb" nothing happens.
I have called "git annex group usb backup" and "git annex sync usb" to set up the repo.
What is the correct way to get the data out to the backup repo?
Best regards, Georg
The example was missing a preferred content setting, without which --auto doesn't copy anything unless needed to satisfy numcopies:
git annex wanted archivedrive standard
I'd like to store multiple copies of my data but I'm not sure how to implement it with drives of various sizes.
Let's assume that I'd like 3 copies of every file and I'd like the data to be laid out in 3 disk groups:
The idea is to keep 1 group permanently at home while the 2 others would be stored in different remote locations and resynchronized from time to time. Each disk group would hold all the data, so that a loss of one of them wouldn't matter. Also the composition of those disk group would allow me to easily know which disks can be put aside while being ensured that they could contain all the data.
How could this model be implemented with git-annex?
This is like the "archive" group, but you want 3 different ones. git-annex already has a built-in "smallarchive" group that is not quite what you want (see standard groups), so let's call them "archive1", "archive2", and "archive3".
Configure the groups like this, so any member of a group wants to contain a file only if it will be the only member of the group to contain a copy.
Then use
git-annex group
to put to put each drive's repository in its respective group, and setwanted
to "standard" to make it use the groupwanted expression. For example:Then,
git-annex copy --to bigdrive --auto
and other similar commands with--auto
(orgit-annex sync --content
) will distribute files to the drives.Thanks Joey for your clear and comprehensive answer.
Quoting a part of your answer:
According to the documentation the "standard" groupwanted expression only allows to use the predefined expressions, so I assume that the groupwanted expression shall be set to "groupwanted" instead: