The git annex sync command provides an easy way to keep several repositories in sync.

Often git is used in a centralized fashion with a central bare repository which changes are pulled and pushed to using normal git commands. That works fine, if you don't mind having a central repository.

But it can be harder to use git in a fully decentralized fashion, with no central repository and still keep repositories in sync with one another. You have to remember to pull from each remote, and merge the appropriate branch after pulling. It's difficult to push to a remote, since git does not allow pushes into the currently checked out branch.

git annex sync makes it easier using a scheme devised by Joachim Breitner. The idea is to have a branch synced/master (actually, synced/$currentbranch), that is never directly checked out, and serves as a drop-point for other repositories to use to push changes.

When you run git annex sync, it merges the synced/master branch into master, receiving anything that's been pushed to it. (If there is a conflict in this merge, automatic conflict resolution is used to resolve it). Then it fetches from each remote, and merges in any changes that have been made to the remotes too. Finally, it updates synced/master to reflect the new state of master, and pushes it out to each of the remotes.

This way, changes propagate around between repositories as git annex sync is run on each of them. Every repository does not need to be able to talk to every other repository; as long as the graph of repositories is connected, and git annex sync is run from time to time on each, a given change, made anywhere, will eventually reach every other repository.

The workflow for using git annex sync is simple:

  • Make some changes to files in the repository, using git-annex, or anything else.
  • Run git annex sync to save the changes.
  • Next time you're working on a different clone of that repository, run git annex sync to update it.

Note that by default, git annex sync only synchronises the git repositories, but does not transfer the content of annexed files. If you want to fully synchronise two repositories content, you can use git annex sync --content. You can also configure preferred content settings to make only some content be synced.

Here's a way to get from a starting point of two or more peer directory trees not tracked by git or git-annex, to the point where they can be synced in the manner described above: syncing non-git trees with git-annex
Comment by Adam Sat Feb 25 15:02:18 2012

I cam upon git-annex a few months ago. I saw immidiately how it could help with some frustrations I've been having. One in particlar is keeping my vimrc in sync accross multiple locations and platforms. I finally took the time to give it a try after I finally hit my boiling point this morning. I went through the walkthrough and now I have an annax everywhere I need it. git annex sync and my vimrc is up-to-date, simply grand!

Thanks so much for making git-annex, Daniel Wozniak

Comment by Daniel Fri Jan 4 14:45:35 2013
Good for syncing indexes, but if I want to synchronise all data files too (specifically pushing to a remote bare repository), how do I do that?
Comment by Diggory Fri Jan 11 16:52:38 2013
Yes, sync only syncs the git branches, not git-annex data. To sync the date, you can run a command such as git annex copy --to bareremote. You could run that in cron. Or, the assistant can be run as a daemon, and automatically syncs git-annex data.
Comment by joeyh.name Fri Jan 11 18:18:07 2013
Sure assistant can sync git-annex data across remotes. But how do I tell a repo to sync git-annex data, but not so manually as to having to know what exactly needs to be copied from/to where?
Comment by chocolate.camera Fri Oct 11 09:58:12 2013

By default, git annex sync will sync to all remotes, unless you specify a remote. So, I have to specify, e.g., git annex sync origin. I can simplify this with aliases, I suppose, but I do a lot of teaching non-programmer scientists... so it'd be nice to be able to configure this (so beginning users don't have to keep track of as many things).

Is there (or will there be) a way to do this?

Comment by Dav Sun Nov 24 17:48:22 2013
I feel that syncing with all remotes by default is the right thing for git annex sync to do.
Comment by joeyh.name Tue Nov 26 20:08:33 2013

Just in case you haven't considered such a scenario - maybe you have suggestions for how to collaborate more effectively with git annex (and avoid warning messages):

I'm trying to teach beginning scientist programmers (mostly graduate students), and a common scenario is to fork some scientific code. I'd like forking on github to be mundane, and not trigger warnings, and generally have as little for folks to explicitly keep track of as possible (this seems to be a common concern we share, which leads you to prefer syncing to all remotes without the option to configure the default behavior!).

However, I am currently working with students on forking and fixing up scientific code where the upstream maintainer doesn't want to allow pushes upstream, except via pull request. So, part of our approach is to set up some common shared datasets in git annex (and these just end up in our fork). If we have an "upstream" remote, git annex will try to sync with it, and report an error.

So - that's why I'd like to be able to configure the deactivation of syncing to a defined branch (e.g., "upstream"). However, if you have other suggestions to smooth the workflow, I would also like to hear those!

Comment by Dav Sun Dec 8 19:20:26 2013

@Dav what kind of url does the upstream remote have? Perhaps it would be sufficient to make sync skip trying to push to git:// and http[s]:// remotes. Both are unlikely to accept pushes and in the cases where they do accept pushes it would be fine to need a manual git push.

Anyway, you can already configure which remotes get synced with. From the man page:

       remote.<name>.annex-sync
              If set to false, prevents  git-annex  sync  (and  the  git-annex
              assistant) from syncing with this remote.

So git config remote.upstream.annex-sync=false

Comment by joeyh.name Thu Dec 12 17:54:55 2013
The URLs in question in this case were read-only github https URLs. In any case, my problems are solved by what you've already suggested. I think a less error-sounding response to read-only https repos sounds nice!
Comment by Dav Sun Jan 26 22:51:28 2014