Using nested git repositories in git is not possible and thus this also applies to git-annex. However, here is a good workaround that I found:
Rename the .git
directory of the nested repo to dotgit
(or similar), git annex add
it and then create a symbolic link from .git
to dotgit
. It's important that the link is created only after the nested repo has been git annex add
'ed. Also, the link needs to be created manually on each clone. Finally you'll need to hide the dotgit
directory from the nested repo itself by adding /dotgit
to dotgit/info/exclude
.
mv nested/.git nested/dotgit; echo "/dotgit" >>nested/dotgit/info/exclude
git annex add nested; git commit -m "add nested"
cd nested; ln -s dotgit .git # needs to be done on every clone
Using nested git repositories is well possible; if they are checked in they are called submodules, otherwise they just sit there unadded.
Apart from some odd quirx you never run into in normal operation, submodules work fine also with git-annex.
I agree, submodules are the usual way to nest git repositories, and will more or less just work with git-annex.
I think that the author of this tip is wanting to version control the contents of
.git
itself. Eg, to version control.git/config
and.git/hooks/
.One problem with this approach is that when the outer repository has "dotgit/annex/objects/
files added to it, running
git-annex dropinside the nested git repository will drop the content, but the outer repository will still contain a copy too. You would have to use
git-annex unused` to eventually clean up those copies. And it stores 2 copies of every annexed file to use it this way.On a similar topic, I also have multiple git repositories that I want to backup (multiple copies...). These repositories belong to a parent repository that is properly set up with git-annex, and the necessary remotes. I want to be able to recover the entire state of the parent folder (including these children repositories) at any given time.
I am quite unfamiliar with submodules, so feel free to correct me, but based on my experiments, using them makes each child repository/submodule independent/invisible to the parent one. If all these child repositories were submodules, I wouldn't be able to use the parent config to back them up, and I would have to repeat the same configuration on each submodule.
If I were to leave the repositories as they are, the enclosing files seem to be annexed by the parent repository as I would want them up, but the .git repository is ignored. To achieve my goal, I can imagine one solution where every child .git folder would be zipped and annexed alongside, maybe on a pre-commit hook, to be restored in certain occasions.
Is my understanding of the issue reasonable? Is there any other option?
Submodules can feel a bit clunky, that's right. They're 'invisible to the parent repo' in that they indeed have separate configs (remotes, etc.), so one needs to manually set it up again when replicating.
DataLad embraces this and provides e.g.
datalad save
, which commits recursively into all submodules and it will also initialize and handle submodules a bit more automatic. But it lacks the fully bidirectionalgit annex sync|assist
.I use submodules extensively and am not entirely happy with it due to the fragile manual config necessary.
Auto-enabled special remotes can help out a bit (those will be configured upon first submodule creation by git annex).
It seems to be that git has gotten smarter and now actively prevents you from adding a
.git
folder (I did this many years ago when before I learned about submodules); I'd like to do something like the following:bash git init --separate-git-dir=.gitannex . git --git-dir=.gitannex annex init git clone some_repo # A repo I'm pulling from GitHub/wherever and don't want a submodule of as it's not my personal project git --git-dir=.gitannex add some_repo
Essentially, I can override that
.git
folder name, but it still checks for other.git
folders; is there a way to remove this check?