Amazon Glacier provides low-cost storage, well suited for archiving and backup. But it takes around 4 hours to get content out of Glacier.
Recent versions of git-annex support Glacier. To use it, you need to have glacier-cli installed.
First, export your Amazon AWS credentials:
# export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
# export AWS_SECRET_ACCESS_KEY="s3kr1t"
Now, create a gpg key, if you don't already have one. This will be used
to encrypt everything stored in Glacier, for your privacy. Once you have
a gpg key, run gpg --list-secret-keys
to look up its key id, something
like "2512E3C7"
Next, create the Glacier remote.
# git annex initremote glacier type=glacier keyid=2512E3C7
initremote glacier (encryption setup with gpg key C910D9222512E3C7) (gpg) ok
The configuration for the Glacier remote is stored in git. So to make another repository use the same Glacier remote is easy:
# cd /media/usb/annex
# git pull laptop
# git annex enableremote glacier
initremote glacier (gpg) ok
Now the remote can be used like any other remote.
# git annex move my_cool_big_file --to glacier
copy my_cool_big_file (gpg) (checking glacier...) (to glacier...) ok
But, when you try to get a file out of Glacier, it'll queue a retrieval job:
# git annex get my_cool_big_file
get my_cool_big_file (from glacier...) (gpg)
glacier: queued retrieval job for archive 'GPGHMACSHA1--862afd4e67e3946587a9ef7fa5beb4e8f1aeb6b8'
Recommend you wait up to 4 hours, and then run this command again.
failed
Like it says, you'll need to run the command again later. Let's remember to do that:
# at now + 4 hours
at> git annex get my_cool_big_file
Another oddity of Glacier is that git-annex is never entirely sure if a file is still in Glacier. Glacier inventories take hours to retrieve, and even when retrieved do not necessarily represent the current state.
So, git-annex plays it safe, and avoids trusting the inventory:
# git annex copy important_file --to glacier
copy important_file (gpg) (checking glacier...) (to glacier...) ok
# git annex drop important_file
drop important_file (gpg) (checking glacier...)
Glacier's inventory says it has a copy.
However, the inventory could be out of date, if it was recently removed.
(unsafe)
Could only verify the existence of 0 out of 1 necessary copies
To avoid this problem, you can either use git annex move
to move
content to Glacier, or you can set the remote to be ?trusted.
A final potential gotcha with Glacier is that glacier-cli keeps a local
mapping of file names to Glacier archives. If this cache is lost, or
you want to retrieve files on a different box than the one that put them in
glacier, you'll need to use glacier vault sync
to rebuild this cache.
See Glacier for details.
I setup a glacier remote on one machine and it successfully created the vault and is syncing files to it.
One another machine, after git-annex sync'ing, I did:
So then I try:
What am I missing?
Also, why is it trying to create the valut? It's already there with content in it!
@greg, the only thing you might have missed is the need to use
glacier vault sync
to build a cache if enabling the glacier remote in another place. And that whole issue with it needing a local cache may mean few people are using glacier with more than one repository accessing the remote.However, this sounds like a bug. There is a comment in the source code that "glacier vault create will succeed even if the vault already exists." .. perhaps it has changed since that was written. Or perhaps the command failed for some other reason, I don't know.
Along with stupid python problems which are now fixed (all my fault, and hopefully didn't cause more noise here than needed), the only thing that didn't go as stated was:
The guide says that info is sync'd.
Hi, when I try to create a glacier remote, the command freezes without further output:
I can see the following processes in sleep state:
I'm on fedora 22, git-annex version: 5.20140717. Any suggestions appreciated, thanks!
@ben, it's generating the encryption key, and is blocked waiting on enropy. You can pass --fast to use lower-quality randomness.