Hello. I am a kinda-new git-annex user. I think git-annex is very difficult to understand. Thus, in this topic, I would like to write out my explanation of how git-annex works and how I can work with it, so that you can tell me if I am misunderstanding something.
I am a machine learning engineer. I use git-annex to store code together data (large images) and artefacts (large images and model weights). Basically, I am trying to use git-annex as normal git, but with the ability to store large binary files. There is a central bare repository and a bunch of non-bare repositories cloned from it on different computers. Usually, the non-bare repositories connect only to the central bare repository, but sometimes I push/fetch between non-bare repositories directly.
I can fetch/push/merge/pull the normal branches I work in just as I do with normal git (without annex). However, the git-annexed files aren't fetched/pushed this way because the branch actually contains only symlinks, not the files themselves. I have disabled all the automatic mergin, pulling, etc. functionality of git because I like to have total control. So, I merge or rebase everything manually when there is a need for it. And I never use pull.
In each repository, there is a branch called git-annex. It contains some metadata that git-annex uses. While in a repository repo1, I can do git annex sync --only-annex --no-content --no-commit --no-pull --no-push --no-resolvemerge repo2
and git-annex will use some magic (consisting of pushing, pulling, and some kind of automatic merging) to sync the git-annex branch of repo1 and repo2, i.e. it will make them contain the same metadata. The options --no-commit --no-pull --no-push --no-resolvemerge
are needed to disable the dark magic that is useful for casual users but not for software developers who use git-annex as a git addon. The option --only-annex
prevents git-annex from creating "synced" branches which are, as far as I understand, another piece of dark magic useful for casual users but not for software developers. If I want, I can remove the --no-content
flag and git-annex will also download and/or upload the annexed data (does it affect only the data available in the current branch? or is it all data? i'm not sure). This is the only command I need to know to sync the git-annex branch. Supposedly, it's possible to do via normal git fetches, pushes, merges, and maybe pulls, but I don't know how to do that.
The actual annexed data is stored somewhere in the .git directory. I don't need to worry where. What I need to know is that I can use git annex copy
, git annex get
, and git annex sync --only-annex --no-commit --no-pull --no-push --no-resolvemerge otherrepo
with appropriate paths to copy the annexed data between repositories.
Ok, so I can use git annex sync
with a bunch of flags to sync the git-annex branch, I can use git annex sync
, git annex get
, git annex copy
to copy the data around, and I don't need the synced branches. Is my understanding correct?
Just
git pull
the git-annex branch from remotes yourself like any other git branch.git-annex will automatically merge those pulled git-annex branches into its own local git-annex branch the next time you run it.
Then you can
git push
the git-annex branch to any remotes you want to publish updated git-annex information to.