• git-annex is not a backup system. It may be a useful component of an archival system, or a way to deliver files to a backup system. For a backup system that uses git and that git-annex supports storing data in, see bup.

  • git-annex is not a filesystem or DropBox clone. However, the git-annex assistant is addressing some of the same needs in its own unique ways. (There is also a FUSE filesystem built on top of git-annex, called ShareBox.)

  • git-annex is not unison, but if you're finding unison's checksumming too slow, or its strict mirroring of everything to both places too limiting, then git-annex could be a useful alternative.

  • git-annex is more than just a workaround for git scalability limitations that might eventually be fixed by efforts like git-bigfiles. In particular, git-annex's location tracking allows having many repositories with a partial set of files, that are copied around as desired.

  • git-annex is not some flaky script that was quickly thrown together. I wrote it in Haskell because I wanted it to be solid and to compile down to a binary. And it has a fairly extensive test suite. (Don't be fooled by "make test" only showing a few dozen test cases; each test involves checking dozens to hundreds of assertions.)

  • git-annex is not git-media, although they both approach the same problem from a similar direction. I only learned of git-media after writing git-annex, but I probably would have still written git-annex instead of using it. Currently, git-media has the advantage of using git smudge filters rather than git-annex's pile of symlinks, and it may be a tighter fit for certain situations. It lacks git-annex's support for widely distributed storage, using only a single backend data store. It also does not support partial checkouts of file contents, like git-annex does.

  • git-annex is similarly not git-fat, which also uses git smudge filters, and also lacks git-annex' widely distributed storage and partial checkouts.

  • git-annex is also not boar, although it shares many of its goals and characteristics. Boar implements its own version control system, rather than simply embracing and extending git. And while boar supports distributed clones of a repository, it does not support keeping different files in different clones of the same repository, which git-annex does, and is an important feature for large-scale archiving.

  • git-annex is not the Mercurial largefiles extension. Although mercurial and git have some of the same problems around large files, and both try to solve them in similar ways (standin files using mostly hashes of the real content).

I haven't used git-media, but from the README it looks as though they now support several backends. Might want to update the (very helpful!) comparison.
Comment by bergey [dreamwidth.org] Sat Jul 14 15:42:05 2012
How does sparkleshare and git-annex (and git-annex assistant) compare?
Comment by Bret Thu Sep 6 08:09:17 2012
My understanding of sparkleshare (I've not used it) is that it uses a regular git repository, so has git's problems with large files and will not support partial checkouts. However, you might want to try it out and see if it works for you.
Comment by joeyh.name Thu Sep 6 14:50:43 2012

Hi,

I used sparkleshare lately in a project involving 3 computers and 2 people. and for ascii texts and even a few smaller binary things it works ok.

But it does "to much" for media. at least at the moment, it just uses git for saving the data. That has a possitive and a negative aspect.

possitive:

  1. you have a full history, if you delete a file its not gone for ever, so if you change it, the older version is still recoverable.
  2. if you would as example use it from a laptop in a train without internet and you use a git server in the internet for the central server, and would change some files, then you or somebody else would write on the same txt file as example (html or something... latex...) you would be able to merge this files.
  3. its not totaly bad for backup, because you can restore old files even if you delete it localy, because it will hold all history

negative:

  1. for bigger data its cracy. if you use it for movies as example, you would in git annex delete some stuff you want not to see anytime again, so you would delete it everywhere. and its really away, not beeing still there in the history
  2. git as it is has issues with saving/transfairing very big files, and its slow on even mid-sized files lets say 100 5mb big files it would be slow. because at the moment sparkleshare uses git all this disatvantages are there.
  3. as many clients you use lets say a projekt with 10 people, each of them have all files and all the history of this projekt/directory on their pc.
  4. you need a central data-store git folder you can use a seperate pc for that or save it on a client, if you use a client for that you have to save the data double on this pc.

(so you see for big files even if git would handle them faster you would waste massivly hard disk space) but again for pdfs a few pictures text files even some office files and stuff <100mb its great and easy to do.

I try it in a few words, sparkleshare is like dropbox but with file history ( I think dropbox dont have that???) but because git is not designed (yet) for big files it works somewhat ok for < 100mb stuff if you go very much higher > 1GB it will not be optimal.

git annex dont saves the data itself in git but only the locations and the checksums. so its more like a adress book of your data. its a abstraction layer to your data, you can see on as many devises as you want even without no netzwerk internet connection active and only a very small hd see all your 5 Terrabyte of Data you might have, and move around directories sort around them... delete stuff you dont want if you can deside that by the name... and then when you come back to the connection you sync your actions and it does it to the files.

And one big feature like joey said is that you cannot partialy load files from the repos to your device if it has as example only enough space for 1/10 of it.

There is another thing, but because it is "only" a abstraction layer, it is theoreticaly easy to implement extentions to save your data on anything not only git repositories...

Sparkleshare will switch to something else than git, maybe but then it will switch to this single protocol and stick to that. because it does not abstract stuff so hard.

btw there is a alternative out there it forces you not to use git as vcs but you have to use a vcs (like git) and you dont have to use the client written in mono but only a smaller python script:

http://www.mayrhofer.eu.org/dvcs-autosync

but the idea behind it is the same except this 2 points ;)

but many free software developers dont like mono, so the change that it gets more love from more people is not totaly unlikely.

So way to long post but hope that helps somebody ;)

Comment by Stefan Sat Sep 15 01:28:05 2012

or to make it more simple ;)

sparkleshare is for proejects and maybe backup your documents folder

annex is for managing big binary files that not get modified most of the time and only added/synced or deleted.

hope thats on the point, try to start using it also now, but am a bit blowen away what it all can do and what not... and how to get a good use case, and mixing media-management with backup of home and thinking on solving that all with annex without having it used ever ;)

Comment by Stefan Sat Sep 15 01:35:06 2012

Stefan: "annex is for managing big binary files that not get modified most of the time and only added/synced or deleted."

While this is true, the kickstarter title for assistant was "Like dropbox", and dropbox makes it transparent to edit files and they work with the filesystem. So with assistant, lock/unlock should be automated and transparent to the user. Otherwise it's confusing and not simple at all to use, and at least OSX keeps giving errors because of the way it handles aliases. So something like sharebox is essential to be included in assistant in my opinion.

Comment by Tiago Thu Sep 27 10:17:18 2012

hi joey,

i'm excited by your project here but also confused by its direction. the kickstarter page has the header: "git-annex assistant: Like DropBox, but with your own cloud." this page says "git-annex is not a ... DropBox clone." these seem to be in direct opposition.

i'm looking for what is described by the header on your kickstarter page. i assume your backers are looking for the the same thing (a self hosted DropBox). for my use, dropbox is perfect, except for the fact that i have to pay a monthly fee to store my data on someone else's server when i would like to buy my own storage medium and run some open source dropbox clone on my own server.

can you explain more clearly what dropbox features your project lacks (/will lack)? and why where is a difference between your fundraising page and this one?

maybe i'm just confused by the difference between git-annex and git-annex assistant. does git-annex assistant truely aim to be a dropbox clone?

Comment by l3iggs Sun Feb 3 03:57:05 2013

It's pretty much exactly what he said:

git-annex is not a filesystem or DropBox clone. However, the git-annex assistant is addressing some of the same needs in its own unique ways.

The git-annex assistant is not exactly like DropBox; it's not a drop-in replacement that works exactly the way dropbox works. But as it stands, right now, it can (like Dropbox) run in the background and make sure that all of your files in a special directory are mirrored to another place (a USB drive, or a server to which you have SSH access, or another computer on your home network, or another computer somewhere else which has access to the same USB drive from time to time, or has accesss to the same SSH server or S3 repository or....

It works as is but is still under heavy development and features are being added rapidly. For example, up until a month or two ago, the files in your annex were replaced with softlinks whose content resided in a hidden directory. This caused some problems esp. on OS X where native programs don't handle softlinked files very gracefully. So Joey added an entirely new way of operating called "direct mode" which uses ordinary files, much like Dropbox does.

So -- what you should expect from git-annex assistant is a program which solves many of the same problems Dropbox does (keeping a set of files magically in sync across computers) but does it in its own way, which won't be exactly like Dropbox; it will be more flexible but might require a little learning to figure out exactly how to use it the way you want. It's possible to get a very Dropbox-like system out of the box, especially now that you don't need to use softlinks, if you've got a place on the network you can use as a central remote repository for your files, or if you only want to synchronize two or more computers on the same local network.

"git-annex" itself is the plumbing used by git-annex assistant, or to put it another way, the engine that the assistant has under the hood. Git-annex itself is extremely simple and stable but should only be used by people already familiar with the command line, perhaps even people already familiar with git.

That's my point of view as an enthusiastic user. Joey may have his own perspective to share. :)

Comment by edheil [wordpress.com] Mon Feb 4 03:17:06 2013

I'd like to understand the "not a backup system" point better. My current plan was to use git annex assistant to save my annex folder onto two hard drives, one at home and one at work. Step two is to move most big stuff into .../archive/... folders, and step three is to add an annex folder on my wife's laptop, so we can use the hard drives to back up everything.

Does that sound unreasonable?

Comment by Zellyn Tue Dec 10 20:55:05 2013

@Zellyn, what you describe does not sound unreasonable. But it's hard to say if it's a backup. For example, if you delete a file from the archive folder, and that happened to be the only copy of the file, it's gone.

It's definitely possible to use git-annex in backup-like ways, but what I want to discourage is users thinking that just putting files into git-annex means that they have a backup. Proper backups need to be designed, and tested. It helps to use software that is explicitly designed as a backup solution. git-annex is more about file distribution, and some archiving, than backups.

Comment by joeyh.name Wed Dec 11 15:40:35 2013
@joeyh - ok, good to know. So as long as I realize I need to sync with my hard drives before deleting, it should work fine as a backup solution. Sweet!
Comment by Zellyn Wed Dec 11 17:38:39 2013
@joeyh.name But if I set numcopies=2 it won't let me drop the file right? I don't think we are mean to directly modify the archive; but if we do would git-annex detect the corruption and discourage us from dropping the other file?
Comment by John Wed Feb 5 10:45:45 2014
Yes, git-annex ensures your configured numcopies is met before dropping a file.
Comment by joeyh.name Thu Feb 6 17:00:59 2014