Hello,
I am looking for a solution to replicate lots of files between 2 servers (with dedicated IP address) over WAN. (I already looked at GlusterFS, DRBD, Bittorent-Sync and ownCloud, but comments are welcome). Now I am evaluating git-annex but it is (kind of) hard to get the concepts as I am not familiar with git.
I could already connect the servers by following the steps of "remote sharing walkthrough" but I feel that using a 3rd special node (as "transfer repository" / rsync or ssh) is an overhead and should not be needed. But no matter how hard I tried, I couldn't make the 2 servers do a sync without a special node.
Could you please give me a hint how to do it? Or even better, some detailed steps.
Thanks,
David
The remote sharing walkthrough assumes that the computers are not servers, so they might not be turned on at the same time, and cannot directly contect each other.
With 2 servers, it's much simpler. Just add a git remote on each pointing at the git repository on the other server, and run the git-annex assistant on both to keep them in sync.
You can add the git remote with
git remote add otherserver ssh://otherserver/path/to/repo
, or you can use the webapp to do it (Add another repository -> Remote server)(If your WAN puts the servers on the same virtual subnet, ie a VPN, you can also probably use local pairing over the WAN to get to the same setup by a different route.)
Hello,
Thanks for your reply. I guessed that there is an easier, direct way to do it. But unfortunately I still need some clarification. This is how I tried:
And here comes the confusing part:
On server2 I clicked Add another repository / Remote server: and configured ssh server1 to the same folder: /media/mail. Then I received an error message:
Failed to make repository Something went wrong setting up the repository on the remote server.
Transcript:
init error: could not lock config file /media/mail-test/mail/.git/config: Permission denied git-annex: git [Param "config",Param "annex.version",Param "3"] failed
Cold you please help me where I did wrong?
Thanks,
David
Hey,
Yes, you were right. The ssh user did not have privileges to read/write the folder. Now I have corrected this issue and things started sync-ing.
But I am afraid, this is not exactly the thing I wanted to achieve. My goal was to sync from folder to folder ... and not from folder to repository both ways.
So, what I would like to do is that if I create a file at server1 (e.g. /media/mail/test01.txt) then it should appear at server2 at /media/mail/test01.txt.
How can I achieve this?
Thanks a lot.
David
Hello,
Yes, meanwhile files started appearing. Maybe I just did not wait enough time.
So, I have just one more question. Shall I set the remote ssh repositories to client or transfer? I read the docs and understand the difference, but in my case I simply cannot decide.
Thanks a lot for your hard work and great support.
David
Okay. One last question. Could you please help me understand how this sync works now?
So for example if I save a file on server1, it gets pushed to server2's repository. But how is it translated back to a normal file from the repository?
And in this case will this file be on server2's repository as well (so are there 2 copies on server2, one in the repository and one human-readabl)?
Thanks!
Great. Understood. Thanks.
Although there is one more thing that I don't really understand. Why do we need 2 repositories. In my mind (without knowing git) I thout it would work with one repository, where both server1 and server2 can upload/download files and they got synced through this.
Now they use different repositories and I don't get why. Also I don't get how conflict-handling can be done if there are 2 repositories for the 2 transfer-ways.
Sorry to bother you with this.
When using git, each place you access your files is a separate repository. Thus, you have:
server1 (repository) <--> server2 (repository)
Hello,
Thanks for your help and explanation. Now I had some time to test git-annex and I had a sad experience. I tried to sync the files from server1 to server2 (as described before). We are talking about 87000 small files (these are maildir folders and each file is an e-mail) with the total size of 19 Gbytes. After a few hours git-annex assistant consumed all memory on both servers ... and then all swap ... and then kernel killed git-annex. The servers have 512 MB RAM and 1 GB swap.
According to the Scalability page (http://git-annex.branchable.com/scalability/) the "memory usage should be constant" and I am sure I didn't run "git-annex unused".
Is this memory consumption normal or it might be a bug? What shall we do?
Thanks,
David
I see the assistant spike up to around 150 mb when adding a lot of files, then drop back down once it finishes one batch (limited to around 5000 files) and moves on to the next.
You might need to file a bug report with some more details so I can reproduce the problem.