git-annex can transfer data to and from configured git remotes. Normally those remotes are normal git repositories (bare and non-bare; local and remote), that store the file contents in their own git-annex directory.

But, git-annex also extends git's concept of remotes, with these special types of remotes. These can be used just like any normal remote by git-annex. They cannot be used by other git commands though.

The above special remotes are built into git-annex, and can be used to tie git-annex into many cloud services.

Here are specific instructions for using git-annex with various services:

Want to add support for something else? Write your own!

Ways to use special remotes

There are many use cases for a special remote. You could use it as a backup. You could use it to archive files offline in a drive with encryption enabled so if the drive is stolen your data is not. You could git annex move --to specialremote large files when your local drive is getting full, and then git annex move the files back when free space is again available. You could have one repository copy files to a special remote, and then git annex get them on another repository, to transfer the files between computers that do not communicate directly.

The git-annex assistant makes it easy to set up rsync remotes using this last scenario, which is referred to as a transfer repository, and arranges to drop files from the transfer repository once they have been transferred to all known clients.

None of these use cases are particular to particular special remote types. Most special remotes can all be used in these and other ways. It largely doesn't matter for your use what underlying transport the special remote uses.

Unused content on special remotes

Over time, special remotes can accumulate file content that is no longer referred to by files in git. Normally, unused content in the current repository is found by running git annex unused. To detect unused content on special remotes, instead use git annex unused --from. Example:

$ git annex unused --from mys3
unused mys3 (checking for unused data...) 
  Some annexed data on mys3 is not used by any files in this repository.
    NUMBER  KEY
    1       WORM-s3-m1301674316--foo
  (To see where data was previously used, try: git log --stat -S'KEY')
  (To remove unwanted data: git-annex dropunused --from mys3 NUMBER)
$ git annex dropunused --from mys3 1
dropunused 12948 (from mys3...) ok
MediaFire offers 50GB of free storage (max size 200MB). It would be great to support it as a new special remote.
Comment by Jon Ander Thu Jan 17 12:17:54 2013
Mediafire does not appear to offer any kind of API for its storage.
Comment by joeyh.name Thu Jan 17 16:44:25 2013
Wouldn't this be enough? http://developers.mediafire.com/index.php/REST_API
Comment by Jon Ander Thu Jan 17 16:53:41 2013

Similar to a JABOD, this would be Just A Bunch Of Files. I already have a NAS with a file structure conducive to serving media to my TV. However, it's not capable (currently) of running git-annex locally. It would be great to be able to tell annex the path to a file there as a remote much like a web remote from "git annex addurl". That way I can safely drop all the files I took with me on my trip, while annex still verifies and counts the file on the NAS as a location.

There are some interesting things to figure out for this to be efficient. For example, SHAs of the files. Maybe store that in a metadata file in the directory of the files? Or perhaps use the WORM backend by default?

Comment by Andrew Sat Jan 19 08:34:32 2013
The web special remote is recently able to use file:// URL's, so you can just point to files on some arbitrary storage if you want to.
Comment by joeyh.name Sat Jan 19 16:05:13 2013
It'd be awesome to be able to use Rackspace as remote storage as an alternative to S3, I would submit a patch, but know 0 Haskell :D
Comment by Greg Wed Jan 30 11:33:12 2013

Would it be possible to support Rapidshare as a new special remote? They offer unlimited storage for 6-10€ per month. It would be great for larger backups. Their API can be found here: http://images.rapidshare.com/apidoc.txt

Comment by Nico Sat Feb 2 16:49:58 2013

Is there any chance a special remote that functions like a hybrid of 'web' and 'hook'? At least in theory, it should be relatively simple, since it would only support 'get' and the only meaningful parameters to pass would be the URL and the output file name.

Maybe make it something like git config annex.myprogram-webhook 'myprogram $ANNEX_URL $ANNEX_FILE', and fetching could work by adding a --handler or --type parameter to addurl.

The use case here is anywhere that simple 'fetch the file over HTTP/FTP/etc' isn't workable - maybe it's on rapidshare and you need to use plowshare to download it; maybe it's a youtube video and you want to use youtube-dl, maybe it's a chapter of a manga and you want to turn it into a CBZ file when you fetch it.

Comment by Alex Sun Feb 24 15:05:27 2013
A ridiculously cool possibility would be to allow them to match against URLs and then handle those (youtube-dl for youtube video URLs, for instance), but that would be additional work on your end and isn't really necessary.
Comment by Alex Sun Feb 24 15:13:16 2013
It'd be really cool to have Rackspace cloud files support. Like the guy above me said, I would submit a patch but not if I have to learn Haskell first :)
Comment by Ashwin Fri Mar 22 08:20:40 2013
@Alex: You might see if the newly-added wishlist: allow configuration of downloader for addurl could be made to do what you need... I've not played around with it yet, but perhaps you could set the downloader to be something that can sort out the various URLs and send them to the correct downloading tool?
Comment by andy Fri Apr 12 08:54:47 2013

Sorry if it is RTFM... If I have multiple original (reachable) remotes, how could I establish my preference for which one to be used in any given location?

usecase: if I clone a repository within amazon cloud instance -- I would now prefer if this (or all -- user-wide configuration somehow?) repository 'get's load from URLs originating in the cloud of this zone (e.g. having us-east-1.s3.amazonaws.com/ in their URLs).

Comment by site-myopenid Wed May 22 14:06:48 2013

This should be implemented with costs

I refer you too: http://git-annex.branchable.com/design/assistant/blog/day_213__costs/

This has been implemented in the assistant, so if you use that, changing priority should be as simple as changing the order of the remotes on the web interface. Whichever remote is highest on the list, is the one your client will fetch from.

Comment by develop Wed May 22 14:15:03 2013
You do not need to use the assistant to configure the costs of remotes. Just set remote.<name>.annex-cost to appropriate values. See also the documentation for the remote.<name>.annex-cost-command which allows your own code to calculate costs.
Comment by joey Wed May 22 14:30:00 2013

Thank you -- that is nice!

Could costs be presented in 'whereis' and 'status' commands? e.g. like we know APT repositories priorities from apt-cache policy -- now I do not see them (at least in 4.20130501... updating to sid's 0521 now)

Comment by site-myopenid Wed May 22 18:33:11 2013

Is there any remote which would not only compress during transfer (I believe rsync does that, right?) but also store objects compressed?

I thought bup would do both -- but it seems that git annex receives data uncompressed from a bup remote, and bup remote requires ssh access.

In my case I want to make publicly available files which are binary blobs which could be compressed very well. It would be a pity if I waste storage on my end and also incur significant traffic, which could be avoided if data load was transferred compressed. May be HTTP compression (http://en.wikipedia.org/wiki/HTTP_compression) could somehow be used efficiently for this purpose (not sure if load then originally could already reside in a compressed form to avoid server time to re-compress it)?

Comment by site-myopenid Wed May 22 18:48:59 2013

ha -- apparently it is trivial to configure apache to serve pre-compressed files (e.g. see http://stackoverflow.com/questions/75482/how-can-i-pre-compress-files-with-mod-deflate-in-apache-2-x) and they arrive compressed to client with

Content-Encoding: gzip

but unfortunately git-annex doesn't like those (fails to "verify") -- do you think it could be implemented for web "special remotes"? that would be really nice -- then I could store such load on another website, and addurl links to the compressed content

Comment by site-myopenid Wed May 22 19:17:33 2013

All special remotes store files compressed when you enable encryption. Not otherwise, though.

As far as the web special remote and pre-compressed files, files are downloaded from the web using wget or (of wget is not available) curl. So if you can make it work with those commands, it should work.

Comment by joey Thu May 23 23:25:02 2013

FWIW -- eh -- unfortunately it seems not that transparent. wget seems to not support decompression at all, curl can do with explicit --compressed, BUT it doesn't distinguish url to a "natively" .gz file and pre-compressed content. And I am not sure if it is possible to anyhow reliably distinguish the two urls. In the case of obtaining pre-compressed file from my sample apache server the only difference in the http response header is that it gets "compound" ETag: compare ETag: "3acb0e-17b38-4dd5343744660" (for directly asking zeros100.gz) vs "3acb0e-17b38-4dd5343744660;4dd5344e1537e" (requesting zeros100) where portion past ";" I guess signals the caching tag for gzipping, but not exactly sure on that since it seems to be not a part of standard. Also for zeros100 I am getting "TCN: choice"... once again not sure if that is any how reliably indicative for my purpose. So I guess there is no good way ATM via Content-Type request.

Comment by site-myopenid Sat May 25 06:41:37 2013

Is there a unit test or integration test to check for the behavior of a special remote implementation and/or validity?

I don't speak Haskell, so maybe there are some in the source but maybe I wouldn't recognize, so I haven't checked. If there are any tests how should I use it?

Thank you, Bence

Comment by Bence Sun Nov 24 08:24:36 2013

@Bence the closest I have is some tests of particular special remotes inside Test.hs. The shell equivilant of that code is:

set -e
git annex copy file --to remote # tests store
git annex drop file # tests checkpresent when remote has file
git annex move file --from remote # tests retrieve and remove
Comment by joeyh.name Sun Nov 24 15:58:30 2013

Hi Joey,

I am thinking about using google drive as an encrypted backup for my important files. However, I fear that if all my git annex repositories are unrecoverable that the encrypted data on the special remote will not help me much. Assuming I have backed up my gpg key I still get a bunch of decrypted files but the folder structure is lost. Would it be possible to implement something like a safety feature that also uploads an (encrypted) tar of all symlinks (pointing to the respective encrypted files) of the (current/or master-branch) git working tree?

I am almost sure this is already implementable using hooks however I could not find information on which types of hooks are available. I am looking for one that is triggered once after all copy/move operations to a special remote are finished. Can you point me in the right direction?

Marek

Comment by donkeyicydragon Tue Apr 22 21:08:49 2014