If I understand correctly, 'git annex repair' first runs 'git fsck' and then tries to repair the repository by connecting to available remotes.
It can take a long time when 'git annex repair' runs 'git fsck' (at least 14 hours in my case). For me, it is not very convenient to have the remotes available during this entire time. It would be nice if I could connect to them only when necessary. This makes me wonder if it is possible to separate the running of git fsck from the operation of connecting to the remotes to repair the local repository. Or similarly, it would be nice to be able to run 'git annex repair' again quickly if it had just ran.
Thanks for any help on this
The command uses
git fsck
to get a list of git objects that are missing from the repository. It then tries to get those objects from remotes. So, skipping the fsck is not really possible.Also in some cases, it may need to fsck again after the initial pass.
You must have an exceptionally large git repo (and/or slow computer) for git fsck to take more than several minutes.
I wonder if passing --connectivity-only to git fsck speeds it up significantly for you? Actually corrupted blobs are probably much rarer than blobs that are just missing, so it might be that git annex repair could optionally use that to run fsck faster.