Hi!
One of my annex repository has a very strange behavior. Every git annex
command is very slow.
I'm using MacOSX Mavericks, and this repository is a network drive, mounted with Samba. It's using direct mode, and the filesystem is crippled. I use annex for storing huge files, for example movies. I moved some files in this directory, and used git annex add
. It was long (as checksum was performed) and I thought that everything was OK. I tried git log -p
, it was OK too:
new file mode 120000
index 0000000..e58c65a
--- /dev/null
+++ b/Movies/movie.mp4
@@ -0,0 +1 @@
+../.git/annex/objects/FK/60/SHA256E-s346858581--053dca6a842376ab8022722df306ad5
\ No newline at end of file
However it was not. I tried to launch git annex sync another_repo
(with another_repo indirect and on a local disk) and it took ages. Even git annex list
takes ages, on every repository linked to this one. With ps -A
, I found out that the issue was created by git --git-dir=/Volumes/SAMBA_REMOTE/.git --work-tree=/Volumes/SAMBA_REMOTE -c core.bare=false checkout -q -B annex/direct/master
.
Have you ever noticed this behavior? Have I done something wrong?
Here is the output of git annex version
:
git-annex version: 5.20131117-gbd514dc
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV FsEvents XMPP DNS Feeds Quvi TDFA CryptoHash
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 WORM URL
remote types: git gcrypt S3 bup directory rsync web webdav glacier hook
local repository version: 4
default repository version: 3
supported repository versions: 3 5
upgrade supported from repository versions: 0 1 2 4
If you run git-annex with --debug, it will print out every command it is running. This is useful because A) I can see the commands in context and it includes timestamps, which is a little bit more informative than "it took ages". (Which can mean anything.)
Anyway, I figured out the problem. Upgrade from v4 to v5 does what should be a one-time git checkout, but it seems that the auto upgrade code neglected to update annex.version, so it started doing it on every command run in a v4 repo. Fixed in git. You can work around the bug by running "git config annex.version 5".
My problem isn't solved. I cloned the repository from my netbook to the SMD drive, and added a file in this new direct annex repository :
Let's try to sync to my macbookpro annex repository:
Now it's 21:41 CET, and it's been stuck for circa 1 hour.
The problem you reported is solved. You were not talking about syncing before.
If your setup is such that
git commit
takes a long time to run, thengit annex sync
is also necessarily going to take a long time to run.git commit
be slow? It's only committing a file with a line (../.git/annex/...), isn't it?I think the most likely reason for git commit to be slow on your setup is that it probably rewrites .git/index. If you have a lot of files in your repository, the index file will be large and rewriting the index file will involve re-transferring it all over the network to the SMB share.
It's also possible that git commit scans the whole work tree, although I don't think it should -- it's not been told to with -a.
You may be able to find what's taking a long time by
(or ltrace)
git annex copy --to smbshare
andgit annex get --from smbshare
as desired, which would probably be much more efficient.Thanks again for your answers.
.git/index
is only 30Ko big. It shouldn't be an issue, even through the network.However, I ran
lsof
, and strangely it lock the big file I added before (/Volumes/Video/Videos/Films/MyFamily.mkv
)I'm using MacOSX, so I can only use
dtrace
, which I don't know so much.The process is over, after 1H30:
Hmm. Seems to me that
git commit
is trying to download the whole big file from the SMB share. Perhaps just to compare it with the symlink it expects to be there?When I try this, in a large direct mode repository with some video from my family,
git commit
does not open the file (verified with strace), let alone read it, and so finishes in well under 1 second.Aha, I was able to reproduce
git commit
doing that with a repo on a FAT filesystem. git opens the file, and mmaps it, and I guess it proceeds to try to diff it against the symlink standin file it expected to be there.This is particularly unfortunate, since
git commit
is only, I think, doing this so that it can print out a "Changes not staged for commit" message. Which git-annex sync throws away. There seems to be no git commit option that disables this behavior.I think that this calls for making
git annex sync
not usegit commit
any longer, and instead manually build the commit usinggit-write-tree
andgit-commit-tree
. Since I don't know if I will get to this before the thanksgiving holiday, I am creating a bug: direct mode sync should avoid git commitAwesome! Thanks for your very efficient investigations. I'll stay tuned to the bug report.
Happy Thanksgiving!