So after more than a year I think I am slowly beginning grasping git-annex. I'd like to put all my questions and ideas into this one thread, maybe you guys can help me sort things out. I'll number everything so you're more than welcome to chime in with regards to a particular item only. Happy to get links to read up stuff, don't hesitate to just paste a link if you think it is useful.
*1.* Right or wrong: I can only "work" with files in "CLIENT" repositories, not with backup, archive or anything else?
*2.* How does the copying or synching work? Is it comparable to rsync? How good is it at solving conflicts?
*3.* Can I get some feedback if git-annex is a solution for my scenario?
I have 1 Macbook, 1 NAS (Freenas running freeBSD) and access to S3 and box.com as well as a few external HDs. My current setup involves a lot of manual copying, (very careful usage of) rsync and basically ends up me having a copy/backup of everything on the NAS and the files I currently work with on my Macbook.
Ideally I'd want to achieve something like this:
- Have all files on the NAS
- Only have current work files on my Macbook
- Use external HDs only when necessary or transferign stuff
- S3 and box.com can be used however needed.
*4.* DOCUMENTS
So in case I am travelling without Macbook, I use another PC or my Android phone to connect to my router at home via VPN then access a share on my NAS, work on a file, print it, save it back so whenever my MACbook is online again, it syncs the latest version. To achieve this with got-annex I guess I need the repository on the NAS to be a client repo, right? But the problem I see is that if I move a sub-folder from Docs into Archive, i.e. a folder of manuals I don't need on my MACbook all the time, it also gets moved out of the client repo on the NAS into an archive repo so how would I access it remotely if necessary? Also, talking about archiving, doesn't this get messy if you have a complicated folder structure inside a repo? How would you put stuff back from archives exactly where it was? Sorry if this sounds a bit silly but I rely on a very precise folder system, everything is properly placed where I know it will be so if I now drag/drop all sorts of files/folders into archive I'll never figure out what's what again.
*5.* MUSIC
This should be fairly easy, all my music is on my NAS and on my macbook, whenever I add music to my macbook it'll automatically sync to the NAS.
*6.* PICTURES
This is where it gets complicated. I have a folder: Pictures with subfolders for each year, aka 2010, 2011 ... 2015 and inside is a sub folder for each photographed event. Now these pics get too big for my macbook so I'd like all images to be on my NAS, available for access, and to be able to simply archive say 2014. Now the question is if later I remember I need access to a particular event in 2014, can I browse that folder and un-archive that particular folder?
*7.* Also, say I'd like to be able to take a folder/sub-folder of images with me for external editing on a HD, and when coming back with them I'd like them to be synced back when I plug the HD back in, what type of repository would i have to set this HD to?
*8.* Obviously I'd need to setup a different repository for each Documents, Pictures, Music with their own settings, right?
*9.* At the moment it is not possible to have separate number of minimum number of copies per repository?
So, what do you think, is git-annex suitable for my needs? Partially at least?
I think it can be a good fit. Let me try to answer some of your questions in the order that seems most reasonable to me
8 It is not a strict requirement to hav different repos for the different things, but I guess it might simplify things. Plus You can start with one of them and when that is properly set up continue with the next one.
5 your music solution seems simple enough. Numcopies=2 and everything should work.
1, 4, 9 There is nothing magic with client remotes. It is just a group, that has some default rules. You can create your own groups and also change their rules. For my documents, I have two custom groups, "offsite" and "cloud". I have set up rules so that files prefer to be in one offsite remote and one cloud remote. However until I had possibility to plug my computer into an off site disc and sync it all files want to be in two cloud remotes.
4, 6 I personally do not run the assistant, but do everything on the command line. For me the setup where old things that I will probably never access again lies in an archive folder, if I do need it I can get the file back wit a simple "git annex get $filename". For my documents tree I only add files and folder or move folders into an archive folder. (Also note that archive folders can be located anywhere in your file tree, and that you can have arbitrarily many
Hi Carl, thanks for the feedback.
I'll be very slowly in replying, taking some late night classes this week but I WILL ask more questions once I get around to try some tests with dummy data.
Meanwhile here is one question that has been bugging me for days now:
Take the following scenario:
So NAS and MACbook are in sync when I decide I don't need a specific folder on the MACbook and archive it.
The big question: Will it be archived on the NAS too or not? (assistant running on both devices and please assume there are enough copies of this folder in backup and archive repos)
This is really an important part of understanding git-annex for me as it would simply be a sync tool if it archives on both devices but if it does not it solves my problem of being able to decide waht data to carry with me on my MACbook and still be able to use a VPN and access a share on the NAS (the client repo folder).
As I understand your question, you will move some files into an "archive-folder" The client repo in your mac will then check if it allowed to drop the file. It seem that it will as there are enough copies on S3 et al. After syncing the file structure on the git repo will be updated on your NAS, as it is a client it will also check if it allowed to drop, and as enough copies are available it will.
If this is not what you want, then you do not want to have your NAS as a client. My setup is not dissimilar from yours, and my home server (corresponding to your NAS) is in the backup group. It thus wants to hold on to everything.
Hi Carl,
thanks for clarifying so the question is, if I don't use my NAS as a client, what other function can I assign, I'd really want to access its files via shares too. Can I make it a backup but keep the files "in the clear" so I am able to access them via a share?
you lost me there. so I'm using the command line on the NAS not the webapp but on the MACbook I do use the webapp. so when creating a repo through the webapp I chose: Remote server - Set up a repository on a remote server using ssh. and then I can chose between: Unencrypted repository, Simple shared encryption and Encrypt with GnuPG key
Unfortunately the descriptions aren't quite that self-explanatory. So far I have tested the 3rd option but here files get stored inside .git/annex/objects so I cannot access the files straight away. I'll be testing a few more option awaiting a reply here, maybe I'll figure it out on my own. The config of hte created repo on the NAS contained this: [core] repositoryformatversion = 0 filemode = true bare = true sharedrepository = 1
Thanks for bearing with me Carl, you mention the walkthrough and the command line so lets see:
on my nas I follow the walkthrough:
mkdir ~/annex
cd ~/annex
git init
git annex init
which yields in a BARE repo:
cat .git/config [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [user] name = gitannex email = gitannex [annex] uuid = 36bca403-027b-4e90-9504-123456789012 version = 5
So where do I stand now? You mention I need a non-bare repo and suggested to follow the walkthrough. Kinda stuck here now and I really would like to get to use gitannex.
Maybe joey will give some advice?
Would you mind sharing more of your setup? Theory or practical advice welcome
regarding an earlier question of mine, about how the syncing works: I understand its not possible to have git-annex running somewhere centrally and have two clients who only access parts of a repo?
i.e. on "server" have Documents with subfolder a) and b) then have 2 clients, i.e. Macbooka and Macbookb which would each only sync the a) respectively b) folders out of the complete repo?
git init will never create a bare git repository (unless you specify --bare), and a normal bare repository doesn't have a .git directory.
So I suspect that your repository is using git-annex's direct mode, which only sets core.bare as an implementation detail. Likely you trimmed off the end of the .git/config, where it said "direct = true"