Hello,
first of all thank you very much for Git-annex: I think it is a great tool to keep files in sync. The feature I like most is the possibility of setting multiple remote servers and execute syncs in a decentralized fashion; this is great for me since I have multiple computers spreaded around but not all of them are always up and almost none is "99.99% guaranteed" and I need high availability.
Getting to the poing: I set up git-annex manually on my computers (most remote are only accessible via ssh tunnels) and I am using it in direct mode, with the assistant running in background on every node. From time to time I open the webapp to check if everything looks fine. Finally, I set a couple of nodes (with plenty of storage) as 'full backup' even if I don't really understand what that setting is really for.
My question is: in my setup, where do deleted files and older versions go? If I ever need one in the future where should I look for it?
Thanks a million!
Have a look at the standard groups page. Backup Repositories want to get hold of anything.
So if you have the assistant running it will take care of pushing the old versions in the backup.
While Marco is right about backup repositories, note that the assistant can only move the content of deleted/old versions of files to a backup repository if that repository is set up as a remote of the repository where the assistant is running.
So, it's quite possible to have some backup repositories that are not well enough connected for deleted/old files to reach them. The deleted/old files will then be kept in whatever repositories happen to contain them.
Where are the deleted files stored? Underneath .git/annex/objects/
Also, the webapp has a configuration page to control what to do with unused files, and can be configured to delete them periodically if you'd prefer.
I'm sorry if I have overlooked something obvious, but after playing with a "simple" setup (2 desktops and 1 laptop all syncing their ~/annex directory with the assistant/webapp with direct ssh access and XMPP, including one "full backup" repository on an ssh server accessible by all) and moving a few directories in and out of ~/annex to verify that it syncs between the clients, the deleted files seem to be present only in the backup repository - simply judging by the size of the repositories on the different machines. However, how can I get them back after deleting them and letting that sync out by the assistant? On the clients, git annex list/log will only print the files still there, not the deleted ones. On the backup repository (which seems to be a bare repository), git annex list/log is mostly empty. The backup repository was automatically created by the webapp on the first client I set up, I only created the (empty) directory and assigned appropriate user ACLs on the ssh server. I have not manually created a (non-bare) git repository there.
Is this a supported/sane configuration for using assistant/webapp? The idea is to have the ~/annex directories behave like Dropbox (simplified point of view), but still be able to get to deleted or old versions of files. I am aware that there is no UI support for this yet, but I understood this to be supported by the "full backup" repositories.
Dear r,
I did lots of testing about recovering deleted files or previous versions with git-annex but I didn't have much success. I really found it difficult to recover files from git-annex, too time consuming. At the end I found a chep and dirty solution. I write it here in case it may be useful to you or to somone else. I now run a daily backup of my git-annex repo with Bup (http://bup.github.io/) on a couple of my computers, so I can easily restore to a previous state if needed. Bup implements data deduplication so you can have daily backups without consuming too much space. Bup is extremely easy to use too, you can mount backups as fuse filesystems or view them via a web browser. The only problem with this approach is that if you change a file multiple times during a day you won't be able to recover those changes.
All you need is a clone of the repository that is not using direct mode. (The assistant normally sets up repositories using direct mode.) So either stop the assistant and run "git annex direct" inside ~/annex, or
git clone
the backup repository to a temporary location.In that clone, you can use regular git commands to check out past versions of the tree. Inside such a checkout, you can use
git annex get
to get any of the files in that checkout from your backup repository or wherever the content of the file has been stored.For example, using the git-annex repository that I use to publish builds of git-annex:
Your backup repository can be used the same way.
Deleted and old versions of files will be retained in it, as long as the assistant was able to send them to the backup repo in the first place before they got deleted or modified. When direct mode is used, if you change a file before it gets copied to the backup, its old contents are gone for good. To avoid that and make sure that all annexed files always get backed up, switch your ~/annex to use indirect mode.
Hello joey,
but that doesn't use any kind of data deduplication, am I correct? In my particular use case the indirect backup repo could grow huge so I may be better of avoiding that.
git-annex automatically deduplicates files. The backup repo will store one copy of each version of each file, which may indeed get large depending on what you're doing.
That's right, bup can store multiple versions of a file as efficiently as git itself can (or maybe a little more).
You can use bup as a special remote of git-annex, and then your backup repository would be a bup repository. Restoring old versions of files from it would work the same way I described above.