Please describe the problem.
When the assistant starts it takes several hours to do the startup scan, even when there are no files to add.
The repo contains many small files but it is configured to add the smaller ones via gitattributes. In particular there are: 91949 files added to git repo and 1029 annexed. This is my gitattributes
* annex.largefiles=(largerthan=500kb)
annex.addunlocked is set to true
What steps will reproduce the problem?
Create a repo with ~90000 files smaller than 500k and ~1000 files larger (in my case ranging from 500k to 32M). Set addunlocked to true and annex.largefiles to largerthan=500kb. Start the assistant and let it finish adding the files. Restart the assistant.
What version of git-annex are you using? On what operating system?
git-annex version: 6.20160318 build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify XMPP ConcurrentOutput TorrentParser MagicMime Feeds Quvi key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external local repository version: 6
I'm running it on Arch Linux (packaged version)
Please provide any additional information below.
[2016-03-29 22:08:26.356586] main: starting assistant version 6.20160318
No known network monitor available through dbus; falling back to polling
(scanning...) [2016-03-29 22:08:41.426049] Watcher: Performing startup scan
[2016-03-29 23:05:40.533113] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 00:10:07.085051] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 01:23:29.784236] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 02:43:02.048312] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 03:37:53.273057] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 04:04:56.875573] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 04:31:14.370618] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 04:56:12.467889] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 05:21:09.021728] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 05:43:11.111616] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 06:14:38.096425] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 06:49:54.730879] Committer: Committing changes to git
(recording state in git...)
[2016-03-30 07:26:47.721929] Committer: Committing changes to git
(recording state in git...)
# End of transcript or log.
At this point I stopped the assistant that was still doing the startup scan...
Have you had any luck using git-annex before?
Sure!
Note that v6 is still an experimental feature. I have not tested the assistant with it much.
There is an issue documented on ?smudge where git can end up unncessarily running the smudge filter after git-annex eg, gets a file, or adds a file.
This could be related to that; after the assistant added a lot of files here, the first
git status
run was quite slow as it ran the clean filter on every file. Subsequentgit status
runs then went fast.But, I don't know why this would make the startup scan slow; it doesn't seem to use any git commands that would need to smudge files. I tested by exporting
GIT_TRACE=1
and starting the assistant; the startup scan went fast and there was nothing in .git/annex/daemon.log about smudging.Also, what are these changes that are apparently being committed to git during your startup scan? I don't see such commits, either here.
You're right, after the initial run there is no smudge filter overhead. I ran htop in tree view and got this result:
I exported GIT_TRACE and discovered that the assistant is re-adding to git the small files everytime I run it. Below there's a link to the trace (with filenames removed). All the files are not matching the annex.largefiles expression (which is okay), there are no duplicates (i.e. it's not a "I'm adding the same files in the same run" problem) and all of them were already added the first time I ran the assistant and are not new to git and git-annex as they are not shown in "git status" or in "git annex status".
The daemon.log is pretty brief:
You can find the trace here: https://gist.github.com/zarelit/815de89d972314f2e6495a2cdab91aca
As you can see it almost takes ten minutes before reaching the first git-add.
My git-annex version is now 6.20160418 and the git one is 2.8.0, what version of git did you use when tracing? I can try to reproduce in a different environment (e.g. Debian stable with backports) if it can be useful.
Cheers, David