git-annex is not a backup system. It may be a useful component of an archival system, or a way to deliver files to a backup system. For a backup system that uses git and that git-annex supports storing data in, see bup.
git-annex is not a filesystem or Dropbox clone. However, the git-annex assistant is addressing some of the same needs in its own unique ways. (There is also a FUSE filesystem built on top of git-annex, called ShareBox.)
git-annex is not unison, but if you're finding unison's checksumming too slow, or its strict mirroring of everything to both places too limiting, then git-annex could be a useful alternative.
git-annex is also not a folder mirroring system like syncthing (although syncthing could be supported as a special remote) or gut, but it can be used to sync files such a way, with certain limitations (for example, it doesn't like syncing
.git
directories so much).git-annex is also not a distributed file system like Bittorrent or ipfs but both are supported as special remotes with more work in making git-annex more distributed underway.
git-annex is more than just a workaround for git scalability limitations that might eventually be fixed by efforts like git-bigfiles. In particular, git-annex's location tracking allows having many repositories with a partial set of files, that are copied around as desired.
git-annex is not some flaky script that was quickly thrown together. I wrote it in Haskell because I wanted it to be solid and to compile down to a binary. And it has a fairly extensive test suite. (Don't be fooled by "make test" only showing a few dozen test cases; each test involves checking dozens to hundreds of assertions.)
git-annex is not git-media, although they both approach the same problem from a similar direction. I only learned of git-media after writing git-annex, but I probably would have still written git-annex instead of using it. git-media uses git smudge filters (recently supported in git-annex as well; see unlocked files) and may be a tighter fit for certain situations. It lacks git-annex's support for widely distributed storage, using only a single backend data store. It also does not support partial checkouts of file contents, like git-annex does.
git-annex is similarly not git-fat, which also uses git smudge filters, and also lacks git-annex's widely distributed storage and partial checkouts.
Similarly, git-annex is not git-lfs, which also uses git smudge filters, and appears to lack git-annex's widely distributed storage and partial checkouts.
git-annex is also not boar, although it shares many of its goals and characteristics. Boar implements its own version control system, rather than simply embracing and extending git. And while boar supports distributed clones of a repository, it does not support keeping different files in different clones of the same repository, which git-annex does, and is an important feature for large-scale archiving.
git-annex is not the Mercurial largefiles extension. Although mercurial and git have some of the same problems around large files, and both try to solve them in similar ways (standin files using mostly hashes of the real content).
See also the related software page for software that git-annex is similar to.
Hi,
I used sparkleshare lately in a project involving 3 computers and 2 people. and for ascii texts and even a few smaller binary things it works ok.
But it does "to much" for media. at least at the moment, it just uses git for saving the data. That has a possitive and a negative aspect.
possitive:
negative:
(so you see for big files even if git would handle them faster you would waste massivly hard disk space) but again for pdfs a few pictures text files even some office files and stuff <100mb its great and easy to do.
I try it in a few words, sparkleshare is like dropbox but with file history ( I think dropbox dont have that???) but because git is not designed (yet) for big files it works somewhat ok for < 100mb stuff if you go very much higher > 1GB it will not be optimal.
git annex dont saves the data itself in git but only the locations and the checksums. so its more like a adress book of your data. its a abstraction layer to your data, you can see on as many devises as you want even without no netzwerk internet connection active and only a very small hd see all your 5 Terrabyte of Data you might have, and move around directories sort around them... delete stuff you dont want if you can deside that by the name... and then when you come back to the connection you sync your actions and it does it to the files.
And one big feature like joey said is that you cannot partialy load files from the repos to your device if it has as example only enough space for 1/10 of it.
There is another thing, but because it is "only" a abstraction layer, it is theoreticaly easy to implement extentions to save your data on anything not only git repositories...
Sparkleshare will switch to something else than git, maybe but then it will switch to this single protocol and stick to that. because it does not abstract stuff so hard.
btw there is a alternative out there it forces you not to use git as vcs but you have to use a vcs (like git) and you dont have to use the client written in mono but only a smaller python script:
http://www.mayrhofer.eu.org/dvcs-autosync
but the idea behind it is the same except this 2 points
but many free software developers dont like mono, so the change that it gets more love from more people is not totaly unlikely.
So way to long post but hope that helps somebody
or to make it more simple
sparkleshare is for proejects and maybe backup your documents folder
annex is for managing big binary files that not get modified most of the time and only added/synced or deleted.
hope thats on the point, try to start using it also now, but am a bit blowen away what it all can do and what not... and how to get a good use case, and mixing media-management with backup of home and thinking on solving that all with annex without having it used ever
Stefan: "annex is for managing big binary files that not get modified most of the time and only added/synced or deleted."
While this is true, the kickstarter title for assistant was "Like dropbox", and dropbox makes it transparent to edit files and they work with the filesystem. So with assistant, lock/unlock should be automated and transparent to the user. Otherwise it's confusing and not simple at all to use, and at least OSX keeps giving errors because of the way it handles aliases. So something like sharebox is essential to be included in assistant in my opinion.
hi joey,
i'm excited by your project here but also confused by its direction. the kickstarter page has the header: "git-annex assistant: Like DropBox, but with your own cloud." this page says "git-annex is not a ... DropBox clone." these seem to be in direct opposition.
i'm looking for what is described by the header on your kickstarter page. i assume your backers are looking for the the same thing (a self hosted DropBox). for my use, dropbox is perfect, except for the fact that i have to pay a monthly fee to store my data on someone else's server when i would like to buy my own storage medium and run some open source dropbox clone on my own server.
can you explain more clearly what dropbox features your project lacks (/will lack)? and why where is a difference between your fundraising page and this one?
maybe i'm just confused by the difference between git-annex and git-annex assistant. does git-annex assistant truely aim to be a dropbox clone?
It's pretty much exactly what he said:
The git-annex assistant is not exactly like DropBox; it's not a drop-in replacement that works exactly the way dropbox works. But as it stands, right now, it can (like Dropbox) run in the background and make sure that all of your files in a special directory are mirrored to another place (a USB drive, or a server to which you have SSH access, or another computer on your home network, or another computer somewhere else which has access to the same USB drive from time to time, or has accesss to the same SSH server or S3 repository or....
It works as is but is still under heavy development and features are being added rapidly. For example, up until a month or two ago, the files in your annex were replaced with softlinks whose content resided in a hidden directory. This caused some problems esp. on OS X where native programs don't handle softlinked files very gracefully. So Joey added an entirely new way of operating called "direct mode" which uses ordinary files, much like Dropbox does.
So -- what you should expect from git-annex assistant is a program which solves many of the same problems Dropbox does (keeping a set of files magically in sync across computers) but does it in its own way, which won't be exactly like Dropbox; it will be more flexible but might require a little learning to figure out exactly how to use it the way you want. It's possible to get a very Dropbox-like system out of the box, especially now that you don't need to use softlinks, if you've got a place on the network you can use as a central remote repository for your files, or if you only want to synchronize two or more computers on the same local network.
"git-annex" itself is the plumbing used by git-annex assistant, or to put it another way, the engine that the assistant has under the hood. Git-annex itself is extremely simple and stable but should only be used by people already familiar with the command line, perhaps even people already familiar with git.
That's my point of view as an enthusiastic user. Joey may have his own perspective to share.
I'd like to understand the "not a backup system" point better. My current plan was to use git annex assistant to save my annex folder onto two hard drives, one at home and one at work. Step two is to move most big stuff into .../archive/... folders, and step three is to add an annex folder on my wife's laptop, so we can use the hard drives to back up everything.
Does that sound unreasonable?
@Zellyn, what you describe does not sound unreasonable. But it's hard to say if it's a backup. For example, if you delete a file from the archive folder, and that happened to be the only copy of the file, it's gone.
It's definitely possible to use git-annex in backup-like ways, but what I want to discourage is users thinking that just putting files into git-annex means that they have a backup. Proper backups need to be designed, and tested. It helps to use software that is explicitly designed as a backup solution. git-annex is more about file distribution, and some archiving, than backups.
I'm doing something perhaps unreasonable and weird, and I'm wondering if there's a better way.
I'm running a wget -mbc of a particular web site. It replicates down to a tree. Then I'm ingesting the content into git annex via the normal 'git annex add' sequence.
Later, when I'm going to update my replica of the website, I am running a 'git annex unlock' on the whole tree (90 gig in this case), and then running the 'wget -mbc ; git annex add' command sequence again.
Is there any mechanism to convince git-annex to scan the file, and ingest (copy) it into objects if it is new content, while leaving the original files unlocked? This would give me the ability to avoid the 'git annex unlock' copy operation, which is lengthy.
I'm aware this is inherently space inefficient.
I'm sure there's some other problem with this idea that I'm missing.
Thanks.
@Jeff, why did you post that comment here? Please use the forum for questions.
(You may find it useful to use direct mode, or
git annex import --skip-duplicates
or something.)Suppose I have two Samba fileservers in two different locations. Can I use git-annex in thin mode + git-annex assistant to automatically synchronize these two fileserver? Specifically, I am trying to understand if:
1) git-annex preserve owner/group ID and POSIX ACLs; 2) it can efficiently manage very large number of file/directory (500K+ files) 3) it can be used alongside inotify to efficiently transfer only changed files
One last thing: I which sense git-annex is not like Unison? It seems it can be configured to have very similar functionality. I am missing something?
Thank you all.