The Distribits conference was a wonderful chance to meet with many scientific users of git-annex and learn about amazing things they are doing with it.

After giving my talk, titled "git annex is complete, right?", it turned out (spoilers) to not be complete. Indeed, the very next talk gave me a big idea that I have been working on for the past several weeks and have merged into master today. Michael Hanke described his git-remote-datalad-annex which lets git push, pull, and even clone from a git-annex special remote.

I immediately saw that this would be better implemented in git-annex, which would let it use its internals rather than some of the hacky workarounds Michael needed. Also, I saw that git bundles were a much better data format to use, which would allow cheap incremental git pushes.

At the Distribits hackathon, I got together with Michael and Timothy Sanders, and we thought through the data format to store on special remotes. We ended up with a quite simple data design, which can be used without git-annex if necessary. (See git-remote-annex.)

Getting git bundles to work right, especially incremental bundles, and dealing with all the quirks of git's gitremote-helper interface turned out to be more challanging than I thought. I ended up spending a week implementing a prototype in shell script to work through all the details. Then I had to reimplement it all in Haskell, ending up with over 1000 lines of code.

The result is the git-remote-annex, which will ship in the next git-annex release. It should work with most types of special remotes, including exporttree=yes ones (but not yet importtree=yes). But I've only tested it on the directory special remote so far. It's really neat to be able to git clone a repository from so many places, as well as incrementally push changes.

There is a git-remote-annex todo page, of which a lot of the remainder is various race conditions when two people are pushing different things to the same special remote at the same time. At least some of those will be dealt with, for now I recommend only using git-remote-annex when you know you're the only one pushing to a special remote.

This work was sponsored by Mark Reidenbach, Jake Vosloo, Lawrence Brogan, Graham Spencer, unqueued, and Erik Bjäreholt on Patreon