I'm trying to use datalad to manage some scientific data repositories. Datalad uses git annex.
I've set up an annex for my datalad/git repository using git-annex-remote-rclone
. The setup went fine, but the transfers with a Gigabit connection are of the order of 50-100 kbs. I'm trying to troubleshoot the issue. I'm a new user of git annex. The repository has about 10 GB of stuff.
I'm focusing on git annex
because I sidestep datalad by using git annex copy --to=gdrive2
, this is as slow as using datalad push --to=gdrive2
, which makes sense as the latter is a thin wrapper around git-annex-copy
.
Using an external drive instead of Google Drive was a little better, but it still took hours to copy the 10 GB. Not sure what's going on.
I'm running this on a M1 MacBook Pro.
Any ideas on how to troubeshoot this?
How large are the individual files?
With small enough files, non-bandwidth related overhead and/or TCL slow start can take more time than copying the file does, preventing saturating a connection.
It sometimes helps to use -J10 or so.
The --fast option can also speed up
git-annex copy
, see the git-annex-copy man page for details about it.Hi Joey,
This is my current effort to copy ~6.7 GB using Datalad (git annex with rclone+Gdrive in the background)
There are many small files in this repository for sure, but I haven't been able to get specific advice as to whether these very small speeds (even copying to an external drive took many hours) are expected given the bottlenecks you mention. Any feedback is welcome.