Please describe the problem.
Consider a parent repository with a single submodule and two branches in the parent repository. When checking out a branch in the parent using --recurse-submodules, git returns the error:
fatal: could not open 'sub/.git' for writing: Is a directory
This issue was originally mentioned in this post where there is a statement that "The conversion of .git file to .git symlink mostly won't bother git." It is precisely this change that git has a problem with when changing to a new branch. The original poster said there was a solution to use "git checkout mybranch && git submodule update", but this does not work with git version 2.40.0 and git-annex version 10.20230407. In this case, rather than giving a fatal error, git is unable to remove the submodule directory and leaves it untouched. However, this then leaves the new branch in a dirty state (because the submodule is an untracked or modified file).
The end goal for using --recurse-submodules
is so that either
1) a submodule will exist only in a single branch and/or
2) different branches in the parent will automatically point to different branches in the submodule
What steps will reproduce the problem?
Here is a sequence of steps to reproduce the issue.
# create git repo to use as a submodule
SUB_DIR=/tmp/submodule-source
if [ -d $SUB_DIR ]; then
sudo rm -rf $SUB_DIR
fi
mkdir -p $SUB_DIR
cd $SUB_DIR
git init
git annex init
# add file to main
touch sub_file1.txt
git annex add .
git commit -m "Add sub_file1.txt"
# setup parent dataset
PRNT_DIR=/tmp/submodule-annex
if [ -d $PRNT_DIR ]; then
sudo rm -rf $PRNT_DIR
fi
mkdir -p $PRNT_DIR
cd $PRNT_DIR
git init
git annex init
# add file to main
touch file1.txt
git annex add .
git commit -m "Create file1.txt"
# Add branch 1 to parent dataset
cd $PRNT_DIR
git checkout -b branch_1 main
# add submodule 1 to branch 1 of parent dataset
git -c protocol.file.allow=always submodule add $SUB_DIR/.git sub
git commit -m "Add submodule 1"
# add new file to submodule annex
cd sub
touch sub_file2.txt
git annex add .
git commit -m "Create sub_file2.txt"
# register changes in the parent
cd ..
git add .
git commit -m "Register sub changes"
# return to main branch, this fails
git checkout main --recurse-submodules
# return to main branch, this fails
# git checkout main && git submodule update
What version of git-annex are you using? On what operating system?
git version 2.40.0
git-annex version: 10.20230407
operating system: darwin aarch64
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes! Been familiarizing myself with it on and off over the last year and am very excited about it's possibilities.
Notice that your script is doing something a bit unusual. You check out a branch, add the submodule in that branch, and then check out the original branch, which is before the submodule was added.
So, git needs to delete the submodule. Deleting the submodule is mentioned in submodules as a particular case where git-annex's hack to support submodules does not work very well.
If that's all this bug is about, its description is over-broad. It doesn't seem to prevent using submodules with git-annex in situations where you are not deleting a submodule, but are updating submodules
Yes,
git checkout main --recurse-submodules
will fail in those situations, but the workaround in this comment of usinggit checkout main && git submodule update
will work.And yes, I've verified that does still work, with git 2.40.0. I modified the test case to add the submodule before checking out the branch, then add
sub_file2.txt
, and commit that to the branch. At that point,git checkout main && git submodule update
worked fine.So, nothing changed in git, and git-annex's approach for submodules does broadly work, except git has issues deleting submodules that have a .git directory in them. The solution is to
rm -rf
the submodule after runninggit checkout
.I don't think it's possible to improve git-annex's behavior here much.
git-annex could avoid converting .git file to a directory, but then the git-annex symlinks would point to the wrong place. It could, when in a submodule, make the git-annex symlinks point up to ../.git/modules/sub/, but then the links would not work when the submodule was cloned by itself, or when the submodule was located at a different directory level.
The only other thing git-annex can do is avoid using symlinks at all, eg adjusted unlocked branches. I don't think it's a good trafeoff to do that. On the one hand, there is this minor issue with submodule deletion, and the need to avoid using
--recurse-submodules
and instead dogit submodule update
. On the other hand, unlocked annexed files use 2x as much disk space.Well, there is
git-annex adjust --fix
now. So in theory, git-annex in a submodule could leave .git a file and enter an adjusted fixed branch, and when generating that branch, make the annex object symlinks point to../.git/modules/sub/
Then git would commit the ref of the adjusted branch as the submodule. So checking it out elsewhere would need to either replicate that same commit sha with
git-annex adjust --fix
again (doable but tricky), or that ref would need to be pushed.The user would also need to use
git-annex sync
in the submodule after making changes if they want to propatate them out of the adjusted branch to the main branch.