I'm working in v7 mode with annex.largefiles
configured. When I add a lot of files (1000 in the example below), it takes 2 mins for the (recording state in git...)
phase to complete. The same happens if I just add files directly using git add
. Running git add -v
will slowly list the files, so it's apparent that the adding of the files to git is what's causing the delay. Is this slow add
performance normal when adding files to git through git-annex?
I didn't want to report this as a bug since I don't know if this is the result of a known process that git-annex is performing. However, I find it curious how adding the same files to git-annex directly (without annex.largefiles
configured) is very fast.
If this is not a bug, perhaps the (recording state in git...)
output could show the files being added instead to avoid the suspicion that the add
command is hanging.
I'm using git-annex version 7.20190912-gab739242a3 (with git version 2.23.0).
Here is an example I've been using to investigate the different conditions and the output from a run of the script:
#!/usr/bin/env bash
set -eu
dir=$(mktemp -d)
cd ${dir}
cleanup() {
chmod 777 -R ${dir}
rm -rf ${dir}
}
trap cleanup EXIT
# create 1000 files
for idx in {1..1000}; do
echo ${RANDOM} > file${idx}
done
git init
set -x
time git add . > /dev/null
git rm --cached -r . > /dev/null
git annex init
time git annex add -c annex.largefiles="largerthan=1M" .
git rm --cached -r . > /dev/null
time git add .
Initialized empty Git repository in /tmp/tmp.BdmH199gbk/.git/
+ git add .
real 0m0.049s
user 0m0.010s
sys 0m0.039s
+ git rm --cached -r .
+ git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
+ git annex add -c annex.largefiles=largerthan=1M .
real 2m4.617s
user 1m40.865s
sys 0m18.335s
+ git rm --cached -r .
+ git add .
real 2m13.367s
user 1m48.507s
sys 0m21.230s
+ cleanup
+ chmod 777 -R /tmp/tmp.BdmH199gbk
+ rm -rf /tmp/tmp.BdmH199gbk
This is because
git add
has to run git-annex once per file to smudge it. See git smudge clean interface suboptiomal.largefiles
option is used and files are added to git (instead of annex), what does the smudge filter do to the file or its contents that requires extra time? I guess I'm not clear on what exactly happens when a file is excluded from alargefiles
filter that takes longer than adding a file to git-annex. I would have expected the opposite to be true."what exactly happens when a file is excluded from a largefiles filter that takes longer" -- looking in
.git/info/attributes
:i.e. all files except dotfiles get passed through the clean/smudge filter defined in
.git/config
:It would be better if
git-annex
would only add unlocked files to .gitattributes (the one at the repo root), and remove them when they're locked. This would also make it easier to find the unlocked files.Right, that's understandable. Why is it faster to clean/smudge a file going into annex than it is going into git though?
I'm having a very similar issue. Adding files is quite slow, and it hangs for several minutes in
(recording state in git...)
now (that started after adding quite a few files already), and the time seems to increase (I fear that it will soon be hours, not minutes, making it basically unusable...).I have not really configured anything (i.e. it should use all the defaults). I just did
git init
andgit annex init
, and then started to import files usinggit annex import
. That's all.I don't really know about this smudge thing. Is that enabled by default? If that is causing problems, should I maybe disable it?
This will avoid the overhead of the smudge filter, when all the files you're adding are ones you want stored in git, not in git-annex.
I do think it would be possible for
git-annex add
to use the same method whenever it adds non-large files. But it might have unwanted other effects, since the way that manages to be fast is by avoding usinggit add
and having git-annex hash the file and add it to git itself. Opened ?speed up git annex add of small files to consider this.The only way to speed up
git add
is to disable the smudge filter, but then all files yougit add
will be stored in git, not in git-annex. And disabling the smudge filter also will prevent using unlocked annexed files. (See git smudge clean interface suboptiomal for background.)