I have a git annex repository that contains only photos and videos. I am using an NTFS partition on Linux (because dual-boot), and recently did a git annex upgrade
to version 6. (But the question below is general, and not about version 6). The size of my repo must be around 80G.
When I run commands like status and sync, nothing happens for a really long time. I finished a git annex fsck
today morning after the version upgrade, and so wanted to see what the repo now looks like:
git annex status --debug
This is stuck at internally calling git status -uall -z
with some other options. The process is stuck here for almost an hour, and I finally gave up and cancelled it. Like I said, this is not about the recent upgrade that I finished. I have previously seen the process succeed after one or two hours.
Is it normal for status
to take this long? Or is there something wrong with my repo? For example, maybe a large chunk of my files are checked into git without being annexed? My repo has a long history of making mistakes with git annex, so this is actually possible.
Generally git-annex takes longer the more files in the repository it needs to deal with. If a repository gets a great many files (typically hundreds of thousands to millions), various inneficiencies in git annex git-annex will slow things down enough that it gets annoying. Splitting the files into different branches (or separate repositories) is a common way to deal with that.
Also, running on a spinning disk tends to be a lot slower than a SSD.
Just for comparison,
git annex status
in a repository with 75000 files takes 0.5 seconds on my laptop's SSD.git status
takes 0.2 seconds.In your particular case, the NTFS partition and/or v6 mode seems likely to be the reason for slowdowns. Both git and git-annex record the inode numbers used for files in the repository. Those numbers are supposed to be stable, but mounting a filesystem on windows and then linux will make the inode numbers change. (Even remounting a FAT partition on linux will change the inodes, although that doesn't seem to happen for NTFS in a quick test).
When the inodes have changed, much slower code paths get activated, since git and git-annex have to then assume the contents of the files may have changed since the last time they saw them. In a v6 repository where this has happened,
git status
is quite likely runninggit-annex smudge
once per file in the working tree, which is quite slow.