If your git-annex repository contains 10 versions of a 100 megabyte file, it will need 1000 megabytes of disk space to store them all. To save space those old versions can be moved to a remote, but most remotes also don't store similar versions efficiently.
Borg is a deduplicating archiver with compression and encryption. This makes it a good solution to this problem, only the differences between the old versions of the file will be stored by borg.
Borg can be used with git-annex as an unusual kind of remote.
git-annex is not able to store files in borg itself. Instead the way this
works is you use borg to store your git-annex repository, and then
git-annex sync
scans the borg repository to find out what annexed files are
stored in it.
Let's set that up. Run this from the top directory of your git-annex repository to create a borg repository next to it that stores all the files in it, and let git-annex treat it as a remote.
# borg init --encryption=keyfile ../borgrepo
# git annex initremote borg type=borg borgrepo=../borgrepo
# borg create ../borgrepo `pwd`::{now}
# git annex sync borg
Now git-annex knows that all the files in the repository, including all the old versions, have been stored in borg. But when you try to drop a file, you'll find that git-annex does not trust the borg repository.
drop file (unsafe)
Could only verify the existence of 0 out of 1 necessary copies
Also these untrusted repositories may contain the file:
ca863c47-9ded-4dd0-bd7d-9b65e5624171 -- [borg]
Why is this? Well, you could use borg delete
or borg prune
to delete
the content of the file from the borg repository at any time, so git-annex
defaults to not trusting it. This is fine when you're using borg to take
backups, and need to delete old borg archives to free up space on the
backup drive. And it can be useful to use git-annex with such borg backups.
But our goal is instead to move old versions of files to borg. So, you need
to tell git-annex that you will only use borg to append to the borg
repository, not to delete things from it.
# git annex enableremote borg appendonly=yes
Now all the old versions of files can be dropped from the git-annex repository, freeing up disk space.
# git annex unused
# git annex drop --unused
You can continue running borg create
and git-annex sync
to store
changed files in borg and let git-annex know what's stored there.
It's possible to access the same borg repository from another clone of the
git-annex repository too. Just run git annex enableremote borg
in that
clone to set it up. This uses the same borgrepo
value that was passed
to initremote, but you can override it, if, for example, you want to access
the borg repository over ssh from this new clone.