The git annex sync command provides an easy way to keep several repositories in sync.

Often git is used in a centralized fashion with a central bare repository which changes are pulled and pushed to using normal git commands. That works fine, if you don't mind having a central repository.

But it can be harder to use git in a fully decentralized fashion, with no central repository and still keep repositories in sync with one another. You have to remember to pull from each remote, and merge the appropriate branch after pulling. It's difficult to push to a remote, since git does not allow pushes into the currently checked out branch.

git annex sync makes it easier using a scheme devised by Joachim Breitner. The idea is to have a branch synced/master (actually, synced/$currentbranch), that is never directly checked out, and serves as a drop-point for other repositories to use to push changes.

When you run git annex sync, it merges the synced/master branch into master, receiving anything that's been pushed to it. (If there is a conflict in this merge, automatic conflict resolution is used to resolve it). Then it fetches from each remote, and merges in any changes that have been made to the remotes too. Finally, it updates synced/master to reflect the new state of master, and pushes it out to each of the remotes.

This way, changes propagate around between repositories as git annex sync is run on each of them. Every repository does not need to be able to talk to every other repository; as long as the graph of repositories is connected, and git annex sync is run from time to time on each, a given change, made anywhere, will eventually reach every other repository.

The workflow for using git annex sync is simple:

  • Make some changes to files in the repository, using git-annex, or anything else.
  • Run git annex sync to save the changes.
  • Next time you're working on a different clone of that repository, run git annex sync to update it.

Note that by default, git annex sync only synchronises the git repositories, but does not transfer the content of annexed files. If you want to fully synchronise two repositories content, you can use git annex sync --content. You can also configure preferred content settings to make only some content be synced.

Here's a way to get from a starting point of two or more peer directory trees not tracked by git or git-annex, to the point where they can be synced in the manner described above: syncing non-git trees with git-annex
Comment by Adam Sat Feb 25 15:02:18 2012

I cam upon git-annex a few months ago. I saw immidiately how it could help with some frustrations I've been having. One in particlar is keeping my vimrc in sync accross multiple locations and platforms. I finally took the time to give it a try after I finally hit my boiling point this morning. I went through the walkthrough and now I have an annax everywhere I need it. git annex sync and my vimrc is up-to-date, simply grand!

Thanks so much for making git-annex, Daniel Wozniak

Comment by Daniel Fri Jan 4 14:45:35 2013
Good for syncing indexes, but if I want to synchronise all data files too (specifically pushing to a remote bare repository), how do I do that?
Comment by Diggory Fri Jan 11 16:52:38 2013
Yes, sync only syncs the git branches, not git-annex data. To sync the date, you can run a command such as git annex copy --to bareremote. You could run that in cron. Or, the assistant can be run as a daemon, and automatically syncs git-annex data.
Comment by Fri Jan 11 18:18:07 2013
Sure assistant can sync git-annex data across remotes. But how do I tell a repo to sync git-annex data, but not so manually as to having to know what exactly needs to be copied from/to where?
Comment by Fri Oct 11 09:58:12 2013

By default, git annex sync will sync to all remotes, unless you specify a remote. So, I have to specify, e.g., git annex sync origin. I can simplify this with aliases, I suppose, but I do a lot of teaching non-programmer scientists... so it'd be nice to be able to configure this (so beginning users don't have to keep track of as many things).

Is there (or will there be) a way to do this?

Comment by Dav Sun Nov 24 17:48:22 2013
I feel that syncing with all remotes by default is the right thing for git annex sync to do.
Comment by Tue Nov 26 20:08:33 2013

Just in case you haven't considered such a scenario - maybe you have suggestions for how to collaborate more effectively with git annex (and avoid warning messages):

I'm trying to teach beginning scientist programmers (mostly graduate students), and a common scenario is to fork some scientific code. I'd like forking on github to be mundane, and not trigger warnings, and generally have as little for folks to explicitly keep track of as possible (this seems to be a common concern we share, which leads you to prefer syncing to all remotes without the option to configure the default behavior!).

However, I am currently working with students on forking and fixing up scientific code where the upstream maintainer doesn't want to allow pushes upstream, except via pull request. So, part of our approach is to set up some common shared datasets in git annex (and these just end up in our fork). If we have an "upstream" remote, git annex will try to sync with it, and report an error.

So - that's why I'd like to be able to configure the deactivation of syncing to a defined branch (e.g., "upstream"). However, if you have other suggestions to smooth the workflow, I would also like to hear those!

Comment by Dav Sun Dec 8 19:20:26 2013

@Dav what kind of url does the upstream remote have? Perhaps it would be sufficient to make sync skip trying to push to git:// and http[s]:// remotes. Both are unlikely to accept pushes and in the cases where they do accept pushes it would be fine to need a manual git push.

Anyway, you can already configure which remotes get synced with. From the man page:

              If set to false, prevents  git-annex  sync  (and  the  git-annex
              assistant) from syncing with this remote.

So git config remote.upstream.annex-sync=false

Comment by Thu Dec 12 17:54:55 2013
The URLs in question in this case were read-only github https URLs. In any case, my problems are solved by what you've already suggested. I think a less error-sounding response to read-only https repos sounds nice!
Comment by Dav Sun Jan 26 22:51:28 2014

I noticed that in a test with 2 local repositories and around 2'000 files "git annex sync" is still very fast, but "git annex sync --content" takes multiple seconds. Is this avoidable?

I have a central repo and client repos. I want to copy all content to the central repo after a commit. Right now, I use "git annex group central backup", "git annex wanted central standard", and a hook that triggers "git annex sync --content" after each commit. Maybe there is a more efficient way to do this? Thanks for sharing thoughts.

Comment by Matthias Tue Apr 22 20:37:05 2014

I too feel that syncing all remotes by default is the right thing to do, but I think it should be limited to the 'master' and 'git-annex' branch. I often create branches that I want to keep local and do not want them to be synced. But I want 'master' and 'git-annex' branches to be synced with all remotes.

So it would be nice to able to set an option to sync all branches or just the 'master' and 'git-annex' or to able to ignore some branches during git annex sync


Comment by mshri [] Fri Apr 25 15:37:53 2014

I agree with mshri. It’s confusing to have every local branch wind up on every remote (and it hinders «git annex unused»).

I tried working around this by just including relevant branches in the «fetch» refspec, but this will only work until another remote pushes the branches again.

Comment by zardoz Thu May 15 08:28:09 2014
We seem to have some rumor going around that git annex sync pushes all branches. It does not. It pushes only the git-annex branch and the currently checked out branch.
Comment by Thu May 15 19:53:16 2014
@Matthias, git annex sync --content has to check each file to see if any other repository wants it. This is necessarily going to get slow when there are a lot of files. The assistant does a similar syncing but uses some tricks to avoid scanning all the files too often, while still managing to keep them all in sync -- it can do this since it's a long-running daemon and is aware when files have changed.
Comment by Thu May 15 19:54:54 2014

git sync … >> fetches from each remote

Well, I have two git annex-ed repositories where "git remote -v" properly lists the other repo, and "git annex sync foo" manages to pull from foo, but "git annex sync" without a remote name simply does a local sync. Also, neither command pushes anything anywhere.

So, where does "git annex" get its list of remotes from? What could prevent it from accessing them?

Comment by Matthias Thu Jan 22 22:04:09 2015

If a remote has "remote..annex-sync" set to false in the git config, git-annex sync will skip that remote unless you specify the name. That's probably what's going on in your case.

Comment by joey Wed Feb 4 19:12:23 2015