Changing Largefile Specification for Imported Trees
If you want files to be large/small after already importing a tree from an importtree
enabled remote, well, it appears you can't.
I tried removing the imported branch via git branch -d --remote <tree>/<branch>
.
While this produces a new clean import commit upon running import
again, it does not respect changes to .gitattributes
.
Instead, git-annex
seems to hold onto information about which files were large/small in a given special remote.
So, the only way to change what are considered large files and small files is to create a new special remote entirely
For most people, this should not be too problematic since the history of imported trees isn't too important, but for some diffs on an external tree may be valuable. Is there any interest in addressing this issue? For a better understanding, here is a MWE to reproduce this:
- Create an
importtree
enabled special remote for a fresh repo without a.gitattributes
file (or at least one withoutannex.largefiles
attributes) - Import (e.g.
gx import -f tree main
) from this tree and note that all files are considered large (e.g.git log --raw tree/main
->git show <hash>
) - Modify/create a local
.gitattributes
file (and add it to the index) that would specify one of the tree files as small (i.e.annex.largefiles
does not match) - Attempt new import, or do
git branch -d --remote tree/main
and perform new import. - Note that all files are still considered large.
Maybe there's another way of fixing this that I don't know about, but as far as I know, from this point you have to delete the special remote and redo the above now with the desired .gitattributes
file staged for files in this external tree to be imported as small.
?done
Conclusion: Don't just delete the imported branch, update it with a commit to force small/large the files as desired.
git-annex has to maintain a considerable amount of state about the content of a special remote in order to efficiently import trees from it, and this caching is what is preventing the new configuration of annex.largefiles from being used.
In particular, git-annex knows the content identifier associated with the file you imported before. And the key associated with that content identifier is present in the repository. So it uses the existing content rather than download it again.
While it would be possible to either remove enough information from the git-annex branch to defeat that, or modify git-annex to have a mode where it redoes expensive work, it seems to me to be easier to just treat this as a case of an annexed file that you want to change to be stored in git instead. Since that is a general problem, with a general solution. See largefiles, "converting annexed to git".