Please describe the problem.
When empty files are committed to a repository, git status becomes slow because git annex smudge is run for every empty file under git annex even with clean HEAD. I know git annex is for large files (rather than infinity small ones), but I'm using it to manage a GRASS GIS database, which oddly uses empty files for some things:/
What steps will reproduce the problem?
# slow
touch emptyfile
git add emptyfile
git commit -a -m 'Added empty file.'
GIT_TRACE=1 git status
# fast
cat 1 > emptyfile
git commit -a -m 'Added a 1 to emptyfile.'
GIT_TRACE=1 git status
What version of git-annex are you using? On what operating system?
- git-annex version: 6.20170228-g7a32e08c4
- operating system: linux x86_64 (SLE 12.2)
- local repository version: 6
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes, git with git annex has revolutionised my scientific project file organisation and thats why I want to improve it.
I don't know why git is repeatedly smudging the empty file. (I was able to reproduce it.)
git-annex does not treat an empty file any differently than any other file. It seems that this is probably a bug in git.
Note that v6 git repos are known to be innefficient due to git's smudge interface not being very good. This is why they are not yet used by default.
I found a relatively simple solution by setting the following in .gitattributes:
Verified to still be the case with current git-annex and git 2.18.0.
I suppose one workaround would be to default annex.largefiles to your setting.
But, the user might want to use
git annex export
, which currently exports only annexed file, not in-git files. Or there might be a workflow with git-annex metadata, which likewise can only be added to annexed files. Such a special case by default may be more trouble than it's worth.I'm guessing that git uses 0 size in the index to indicate it doesn't know the stat information for the file, or something like that. Perhaps git could be changed to use 0 inode instead.
Since it's not clear from the above, git only behaves this way when the empty file is unlocked. So
git annex lock
is a good workaround too.(Also,
git annex export
has since gotten the ability to export non-annexed files, for the record.)I'm guessing that, in a v10 repo, this is less of a problem, since git-annex uses git's long-running filter-process interface then. So git can probably run
git-annex filter-process
once and use it to re-smudge all the empty files it (for whatever reason) wants to re-smudge.