Hello,
while reading the release notes of git 2.11 I noticed a cool new feature has been merged:
If the filter command (a string value) is defined via
filter.<driver>.process
then Git can process all blobs with a single filter invocation for the entire life of a single Git command.
see the git documentation.
This has been developed in the context of git-lfs (see PR 1382).
If I understand correctly how it works this could speed up v6 repos. Looking at the history/website of git-annex there doesn't seem to be yet any work on this so I though it was worth calling the attention on the feature.
Thanks a lot for all the work on git-annex, it's a really amazing project! The more I study it the more cool features I discover
Yes, this will make ?smudge faster when eg checking out a lot of working tree changes. I will need to add support for it.
The "filterdriver" branch implements support for these.
However, it's actually slower than the old interface, because the new interface requires git-annex read the whole file content from git when adding a file, and the old interface let it not read any content.
Since the new interface does have capabilities, a new capability could prevent schepping the content over the pipe, and let the filter driver refer to the worktree file instead, and respond with the path of a file. This would be similar to my old patch set for the old interface.
While
git add
would be a lot slower when using this interface to add large files, it would makegit checkout
and other commands that update the work tree a lot faster.Since the smudge filter is not providing git with the file content any more, using filterdriver would avoid git running many git-annex smudge processes, greatly speeding up large checkouts.
Unfortunately,
git annex smudge --update
ends up running the smudge filter on all files that the clean filter earlier acted on, so even if filterdriver were used to speed up the clean filter, there would still be one process spawned per file for the smudge filter.So some interface improvement is needed before git-annex can usefully use this.