Does git-annex support git sparse checkout? Are there any known issues when using it in git-annex repos, especially in conjunction with git-worktree?
Does git-annex support git sparse checkout? Are there any known issues when using it in git-annex repos, especially in conjunction with git-worktree?
I just saw support for git sparse-checkout merged in BABS and frankly I never knew/used it before! Inspired by an enthusiastic Meng who
made a strategic mistake for her PhD progresspioneered use of use of git worktrees in DataLad having attended Distribits 2025, I thought to check if git-annex has support for thesparse-checkout.In conjunction with
sparse-checkout(existing already) support for worktrees ingit-annexcan make a perfect "couple" for an efficient ephemeral compute where we checkout only what is really needed, e.g. following thedatalad runinput/output specifications.This is just a summary of the potential research/implementation since may be it even somehow magically all works already given that BABS merged the sparse-checkout support and they extensively use
git annexalready...?In a sparse checkout, most git commands behave as is they were run in a worktree that contains only the files in the sparse checkout, and not other files. Since git-annex uses git commands extensively when identifying files to work on, its commands skip over files not in the sparse checkout.
There are exceptions, the main one seems to be
git ls-files, which does list files not in the sparse checkout. All commands that operate on annexed files and that usegit ls-filesto enumerate files though feed the files intogit cat-file --batch, and that will say a file is not found when it's not part of the sparse checkout. So git-annex skips those.The only exception I can find, and possibily the only one, is that
git-annex addwill add files that are in a subdirectory that is not included in the sparse checkout. (It usesgit ls-fileswithoutgit cat-file.) That is a different behavior thangit add, which refuses to add such files (though the --sparse option overrides and causes them to be added).I don't know if this
git-annex addbehavior would be a problem. The documentation for--sparsesays that the reason git add doesn't default to it is because, after it adds such a file, it could get removed from the worktree without warning. Which would make it hard to get the file's content back if it didn't get committed first.If that were a problem, it could be fixed by making git-annex run
git ls-fileswith the--sparseoption, which is supposed to filter out files not in the sparse checkout... Except that doesn't seem to work right when I try it. Maybe a bug in git (2.51.0)?Anyway, my impression is that this would all need playing with to determine if it happens to meet your needs. Bearing in mind that sparse checkout is itself an experimental feature (for 6+ years?) that is documented to be subject to future behavior changes.