This special remote type stores file contents in Amazon Glacier.
To use it, you need to have glacier-cli installed.
The unusual thing about Amazon Glacier is the multiple-hour delay it takes to retrieve information out of Glacier. To deal with this, commands like "git-annex get" request Glacier start the retrieval process, and will fail due to the data not yet being available. You can then wait approximately four hours, re-run the same command, and this time, it will actually download the data.
configuration
The standard environment variables AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
are used to supply login credentials
for Amazon. You need to set these only when running
git annex initremote
(or enableremote
), as they will be cached in
a file only you can read inside the local git repository.
A number of parameters can be passed to git annex initremote
to configure
the Glacier remote.
encryption
- One of "none", "hybrid", "shared", or "pubkey". See encryption.keyid
- Specifies the gpg key to use for encryption.embedcreds
- Optional. Set to "yes" embed the login credentials inside the git repository, which allows other clones to also access them. This is the default when gpg encryption is enabled; the credentials are stored encrypted and only those with the repository's keys can access them.It is not the default when using shared encryption, or no encryption. Think carefully about who can access your repository before using embedcreds without gpg encryption.
datacenter
- Defaults to "us-east-1".vault
- By default, a vault name is chosen based on the remote name and UUID. This can be specified to pick a vault name.fileprefix
- By default, git-annex places files in a tree rooted at the top of the Glacier vault. When this is set, it's prefixed to the filenames used. For example, you could set it to "foo/" in one special remote, and to "bar/" in another special remote, and both special remotes could then use the same vault.
The glacier-cli tool seems to have been abandoned, and there are a number of outstanding issues with it. boto has a
glacier
tool, but it doesn't seem to include caching, which seems to be something git annex needs.Looking through the PRs, it seems like we should build a tool specifically tailored to git annex's needs. It seems that there are at least three of us willing to hack on this if it's in Python. I'm not sure any of us knows haskell, though...
I'm the glacier-cli author. It is not abandoned!
glacier-cli is supposed to map to Glacier exactly, so that it is compatible with all other tools. Most of the outstanding PRs break this essential behaviour, so I have not merged them. Many of the feature requests and bugs related to the upstream boto library, which is just about the best maintained client library that exists for AWS on any platform (and Amazon have adopted it now, IIRC). I have written appropriate reviews on all the PRs.
If there is specific behaviour that git-annex needs, them I am happy to accept PRs for this, provided that they do not break the ability (and default) for glacier-cli to talk to Glacier natively without an extra layer of interpretation. If an extra layer of interpretation is needed (eg. forbidding duplicate "keys"), then this needs to be an option, or wrapped in a separate tool, or written into git-annex's Glacier special remote.
Hi!
The main issue I'm hitting is the "Multiple rows were found for one()" error. I think I get this when git-annex tries to upload the same file twice (which may be a bug in git-annex, which could apply de-duplication earlier), but I think I also get it when trying to upload a file whose upload I've canceled in the past.
I don't quite understand what git-annex needs here, and I totally understand that you're writing a general-purpose tool. But there does seem to be an issue that git-annex needs fixed one way or another.
I'm happy to try fixing it myself if you can help me understand what's going on (I didn't quite understand your review in the PR), but if I'm the only person in the world using git-annex to back up to glacier, that scares me a little!
Is there a way to estimate the cost of storing a repo on glacier?
I'm especially worried because of the cost of STORE and RETRIEVE requests; there are hundreds of thousands of small files in my annex repo, so that request cost could easily dominate storage cost. Does the glacier remote do anything to minimize the number of objects stored in glacier?
I'm setting up git-annex with glacier-cli for the first time. I have installed git-annex via Yum and glacier-cli according to the instructions on Github. The
glacier
command is in my path. I did not set up hooks with Git annex as it appears that using hooks for glacier is no longer required.Here is my version information for git-annex:
When I attempt to add my Glacier remote, here is what I see:
Is there something else I need to do in order to correctly install Glacier integration with git-annex? I'm having trouble finding up-to-date information that describes the installation process.
Have done a full setup as much as I can, with all the GPG / AWS stuff but it keeps doing non stop...
@forbesmyester, I think that the "glacier" program you have installed is the one from boto, not the one from glacier-cli. git-annex only supports the glacier-cli one.
Note that, since version 5.20150219, git-annex probes to see if the "glacier" program in PATH is the one from boto, and fails with a nicer error message.
Hi Joey,
Thanks for the hand, it started uploading once I had manually created the vault but then borked with:
This part:
Tells me that you need to read glacier-cli problem report #61.
There is a one-line code change in a library named boto (glacier-cli depends on it) which will fix this. (And probably that change will get merged in sometime, so you won't have to do this anymore.)
I sometimes receive the following error when trying to upload files to glacier:
It happens only sometimes. glacier-cli can upload files without problems. The progress of the file upload is also erratic, it jumps to ~90% and then gets stuck. Can I do something to resolve this?
I am using glacier-cli from git master.
For a couple of years already I've been regularly uploading my pictures to glacier through git-annex. After losing my local disk I'm trying to restore my files. However I get these errors:
No files have been deleted, the glacier bucket still exists and has all its data. Has anything changed in the naming scheme or anything? Or is it possible that this error is masking a different issue?
@Tim
Sounds like you need https://github.com/basak/glacier-cli/#user-content-cache-reconstruction