Just started looking at git-annex and it's very interesting to me as I manage a large tree of different types of projects, spread across two computers and some hard-drives (some backups, some offline resources and archives)
But one question that's bothering me. A lot of my projects are already git repos. Is it possible to use git-annex with this tree or will the gits fight unless I explicitly make them submodules?
cheers
Phil
git-annex is first git, then some additional very neat functionality. Being git, it refuses to handle git repositories as that would create all kinds of potential confusion and inefficiency.
Submodules would be a way to go, if you like how they work. Or you could just ignore the git repos, as you probably have them replicated somewhere else.
You might also use http://myrepos.branchable.com/ as a somewhat more flexible alternative to submodules.
It's worth thinking about what would happen if you were able to check a git repository into a git (annex) repository. A git repository contains files like
.git/index
that are git internals, and binary files. Now what happens if you have two checkouts of that nested git-in-git repository, and git writes two different versions of the.git/index
file? You'd get a merge conflict that you have no way of resolving, involving two versions of an internal use binary file. This is a lot worse than a merge conflict involving some regular binary file like a jpeg, because at least with jpegs you can look at the two versions of the file and pick the better one.While git prevents checking in
.git
directories, you could technically work around it, if you really wanted to, by eg usingGIT_DIR
to rename the.git
directory to something else. But it's just setting yourself up for unresolvable merge conflicts and pain.It's likewise not good to check in other version control system directories, like
.svn
,.bzr
, or.hg
into git repositories or vice-versa.Sometimes people complain that the git-annex assistant should support syncing nested git repositories, because after all other directory syncing tools like dropbox support that. But, a little known fact about dropbox is that it too can end up with a conflicted merge type situation, and when this happens it will do something to auto-resolve it. That something almost certianly does not include leaving the git repository what was stored in dropbox in an ideal state. So, while people put git repos into dropbox and get away with it, they are just being lucky to not run into the edge cases where doing that blows up.
it's interesting to find this discussion, I just finished implementing this on my system.
I've been storing bare repos in annex in order to back them up with the same system. I have really huge files in some of my git repos and I wanted to get those files into the annex system but still keep a record of their changes (the git history).
Today I removed the core.bare = true setting on the repos and instead set core.worktree = projectdir, and ran git checkout in projectdir. I have the index file in .gitignore, so there won't be that weird unresolvable conflict. Now all my bloated git history is stored in the annex, and I can still work with it in the annexed checkout.
I was thinking I'd do this again with the root of the repository when the annex grows too large, to back up the old history in a connected way.