Some statistics of git-annex sync --content
(but where there is no new content to sync):
$ time git-annex sync --content
commit
On branch master
Your branch is ahead of 'origin/master' by 3 commits.
(use "git push" to publish your local commits)
It took 2.15 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean
ok
pull origin
ok
________________________________________________________
Executed in 26.74 mins fish external
usr time 510.99 secs 325.00 micros 510.99 secs
sys time 129.53 secs 134.00 micros 129.53 secs
So >26mins. But I even had cases where this took several hours (also without content). (See also the comments here, although they are much less extreme, just a couple of minutes.)
Why is that so slow? Why does it take so long?
Is this expected? This is a reasonable fast hard disk, using ZFS, on Linux. While the repo is indeed quite big with many files, I definitely would not have expected this. I would have expected sth in the order of a couple of seconds (I think similar as rsync
or git status
).
As you see in this output, git status
was much faster (2 secs), so just going through the files doesn't seem to be the bottleneck. Maybe git status
does some clever caching. But then I would expect that git-annex
also does so.
How can I fix it such that this is fast (in the order of seconds, i.e. by at least 2 orders of magnitude faster)?
Side question: I wonder about that "Your branch is ahead ..." message. Shouldn't git-annex sync
exactly solve that? I called it already multiple times.
Hi,
Are you using the latest version? There have been large performance improvements in version 8.20200720.
How many files do you have in the repository?
The problem is that git-annex-sync has to check the location log for each key/file from the git-annex branch every time. Because whether a file needs to be copied and where might change depending on a lot of factors (See git-annex-preferred-content).
But there is a solution: Incremental git annex sync --content --all. This is not yet implemented in git-annex itself, so I wrote a bash script to do this. To get the speedup, you have to run
incremental-sync.sh --fast
. Beware that in the--fast
modeinclude=
andexclude=
in your preferred-content expression won't work correctly. And, while the scripts works fine for me so far, it still is experimental.Also, ZFS doesn't seem to perform too well with git-annex.
The "Your branch is ahead ..." is normal if you didn't run git-annex-sync on the remote in the meantime. git-annex-sync doesn't push to
master
directly, but tosynced/master
and when you run sync on the remote it will pick up the changes.I'm on Ubuntu 20.04, and just have the default git-annex installed:
git-annex version: 8.20200226
The results of that test should be taken with a grain of salt. That test used ZFS version 0.6.4 from 2015. In the following 6 years ZFS went through three major releases (0.7, 0.8, 2.0) with many improvements.
My personal experience is that ZFS RAID-Z2 on spinning disks with L2ARC is only marginally slower than LVM-based RAID6 (but plenty safer and more versatile).