Git submodules are supported by git-annex since version 5.20150303.
Git normally makes a .git
file in a
submodule, that points to the real git repository under .git/modules/
.
This presents problems for git-annex. So, when used in a submodule,
git-annex will automatically replace the .git
file with a symlink
pointing at the git repository. (When the filesystem doesn't support
symlinks, an adjusted unlocked branch is used, and submodules are
supported in that setup too.)
With that taken care of, git-annex should work ok in submodules. Although this is a new and somewhat experimental feature.
The conversion of .git file to .git symlink mostly won't bother git.
Known problems:
- If you want to delete a whole submodule,
git rm submodule
will refuse to delete it, complaining that the submodule "uses a .git directory". Workaround: Userm -rf
to delete the tree, and thengit commit
.
After setting up git-annex in one of my submodules, I noticed that executing
git checkout mybranch --recurse-submodules
will cause a fatal error (see error message below) and my working copy will be left in a state somewhere in between the origin and the destination branch.As a workaround, this two-step alternative seems to work fine though:
git checkout mybranch && git submodule update
.Everything above applies to command
git switch
as well.I use git version 2.27.0 and git-annex version 8.20200618
Error Message: fatal: could not open 'path/to/my/submodule/.git' for writing: Is a directory
The submodule .git file having been converted to a symlink is still a problem when using
git checkout
with--recurse-submodules
in the parent repo. The previous poster's solution to usegit checkout mybranch && git submodule update
does not work with git version 2.40.0 and git-annex version 10.20230407.Are there other suggestions for workarounds?
The original post states that the .git as a symlink fixes problems that exist for a .git file which points to the
.git/modules/
folder of the parent. What problems are these problems?$repo/.git/...
, but if.git
is a file, that doesn't work.$repo/.git/...
? If git can figure out that the.git
file points to the actual location of the.git
folder (in the parent repo) than I would think that git annex can conclude this as well. Again, I don't know how the guts of git annex works, so there is possibly other considerations that I am missing.The above issue is being discussed at Git checkout fails using --recurse-submodules.
DavidD's comment #2 is misleading,
git submodule update
does work fine when you've checkout out a branch and want to update an existing submodule.When
git submodule update
does not work 100% is the case where you checked out a branch, added a submodule in that branch, and then checked out another branch that does not contain the submodule. What happens then is:And the solution is to
rm -rf sub
manually. This is essentially the same problem discussed above on this page where it talks about deleting a whole submodule.Since git 1.8.5,
git mv projects/2023/prj_1 archives/2023/prj_1
can update local path of submodule. Currently,git-annex
doesn't detect submodule path changed and Just moving parent directories breaks git-modules thoroughly. The only way I found is to move all submodule to another tree structure one by one usinggit mv ...
.If the parent directory name(e.g: projects->01_Projects) or its depth was changed by chance, all submodules inside the directory are broken.
So I cannot use submodules to handle source code in my git-annex repo. Is there an easy (cool) way for this?
Thank you for information. I'll try datalad. I considered it for audio and transcription dataset management few years ago. But I didn't use it because datalad using git-annex in it and I already used git-annex for data management. It was seemd to be redundant to me.
I think submodule path detecting can be implemented still and be useful in
git-annex-assist
also.Datalad relies on git-annex's handling of submodules to work afaik so I don't see why using it would avoid this problem.
If I understand correctly, the .git symlink in the submodule that git-annex sets up gets broken when the submodule is relocated. But I would have expected that moving from
a/b/
tox/y/
would keep the symlink working, since it's at the same path depth.I'd need more information to fix this problem, so I recommend filing a ?bugs report with details.
Datalad handles submodules as subdatasets and add python code layers on it to handle datasets(e.g. dedup submodules). But it doesn't detect the submodules path changed like git.
So, it doesn't do my needs sadly.
@TTTTAAAx kindly posted a full example of their problem, which I've moved to detect and handle submodules after path changed by mv.
I do think that using
git mv
to rename directories that contain submodules is the right way to avoid that kind of problem. Note that renaming such a directory without using git followed by runninggit add
on the new directory has the same behavior as runninggit-annex assist
does. This is not a git-annex problem, but I think it could be considered a git problem; git could makegit add
of a moved submodule do the right thing.I agree with you. Specifically, this is related to git, not git-annex.
I wanted to run
git annex assist
just and expect to be fixed.I wrote a simple python script to fix the issue, which changed the path of the submodule. And by adding the script in pre-commit hook, I can do
git annex assist
now without fixing the submodules manually. (I used python to manipulate the text files such as dot-gitmodules etc. Perl or Ruby would be much compact in this job.)It seems to work well for my needs, even though just a prototype script.
After some polishing (and testing), I'll upload the script here.
Thank you.