I am currently trying to transfer git-annex managed files to my server.
I am getting them with git annex /path
. However, these are very many small files, and git annex seems to take a tiny break in between them, so the process takes considerably longer than uploading a single file of the same size would.
Do you have any suggestions for improving my transfer speed in this case?
I did this using a couple of tools (off the top of my head):
git annex find path/ --format '${key}\n' > /tmp/keyslist (outputs a list of keys) find .git/annex/objects -type f | grep -wFf /tmp/keyslist > /tmp/filelist (outputs a list of files)
Then I use 'resync -avhP --files-from=/tmp/filelist . othermachine:some/tmp/dir' to transfer the files to the other machine.
Then I 'git annex import some/tmp/dir' to inject the content, then just delete the additional symlinks (and reset the index).
This speeds things up a bit. Even more speed can be had using tar/netcat.
Hope this points you in the right direction.
I'd suggest running 2 or more concurrent transfers. Either by just running the same "git annex get" command multiple times, or by using "git annex get -J2" if you've got a new enough version of git-annex that has that feature.