The Borg special remote is indeed an exciting next step for this amazing project. I’m currently testing it with a small team. For remote collaboration, each of us has provisioned:
- one workstation with a git-annex repo (accessible only to the workstation user)
- one server (NAS) with a Borg repo (accessible to all team members)
The idea was to implement a basic distributed topology by adding all the Borg repos as special remotes to all the git-annex repos. With such an implementation, a typical workflow would be:
- A team member:
(a) commits a change to the git-annex repo on the member’s own workstation,
(b) creates an archive of the updated git-annex repo in each of the Borg
repos, and
(c) informs the member’s own git-annex repo of the new Borg archives by running
git annex sync
. - All other team members inform their respective git-annex repos of the new
Borg archives by simply running
git annex sync
.
Step 1 worked, but Step 2 did not, so as a workaround, we added a bare git-annex repo alongside the Borg repo on each server, then added all the new bare repos as remotes to each of our respective workstation repos. It works but is less than ideal.
My simplified question: can two git-annex repos share a Borg special remote?
Joey commented back in December: “The Remote interface recently got importKey, which gets us unexpectedly a lot closer to making git-annex import --from borg a reality!” However, I’ve struggled to find any other clues.
Potentially related issues (all of which appear to have been addressed):
- sync --content with borg does not get content
- borg sync tree not grafted
- use same vector clock for content identifier updates in import
Thanks!
This should be possible to set up. This is just a special remote that is accessible from several clones of the repository, and git-annex is able to get files that were sent from another clone to the special remote, once the git-annex branch gets synced between the two clones.
You have to use
git annex initremote
a single time to set up the borg special remote, and then usegit annex enableremote
on every other clone to access it. Since initremote borg needs the borgrepo= parameter, which then gets used to access it from the other clones, it would make sense to use ssh user@host:path syntax, with a shared user account, or perhaps host:path if the permissions on the server allow multiple users to access the borg repo.Leave it to the Grand Wizard himself
With a few tweaks based on your explanation, this appears to be working smoothly. I think our issue was caused by attempting to connect repos that were individually initialized (i.e., with
git init
,git annex init
, andgit annex initremote
on each workstation); by performing this initialization routine only on a single workstation, then following through withgit clone
andgit annex enableremote
on each additional workstation, the syncing works as expected.Thank you for your work and guidance! This is very exciting.
The next thing to figure out is Borg repo mirroring to alleviate the overhead caused by Step 1(b) in the procedure above. Currently, the number of
borg create
operations each workstation must perform is multiplied by the number of Borg special remotes, which obviously doesn’t scale well. Ideally, a workstation could create an archive on a single server—say, the nearest available—offloading to the server the burden of creating archives on the remaining Borg repos. It sounds good in my head, but I struggle to find prior art for something like Borg-based swarms for eventual consistency.Yeah, that's the pattern for any special remote: Initialize once and enableremote everywhere else. Otherwise you have a bunch of different special remotes that happened to be initialized more or less the same but git-annex doesn't know you consider them all to be the same place.
I'd be keen to improve whatever docs might have led to the multiple initremote mistake. The man page for initremote does say to use enableremote in other clones. But maybe you were following a page like using borg for efficient storage of old annexed files and just assumed you'd follow that same procedure in each clone?
Indeed, we had been following that page. Considering that a git-annex repo is stored in its entirety within a Borg archive, the explanation that “
git-annex
sync scans the borg repository to find out what annexed files are stored in it” likely led to the mistaken assumption that simply adding the special remote would be enough for git-annex to know how to handle it. (We had also tried specifyingsubdir
to tell it exactly where to look but clearly were cargo culting by that point.)In retrospect, the outline of our intended implementation reveals a predisposition to make such an assumption: that the git-annex repo on each workstation would be “accessible only to the workstation user” suggests that there would be no provision for cloning in the first place. (Instead, there simply would be a bunch of repos that happened to be privately initialized more or less the same that all share a bunch of special remotes that also happened to be initialized more or less the same. Highly technical, I know!)
Also, I had read the man pages, but due to Borg being an “unusual kind of remote”—a special special remote, if you will—I was unsure how much of the information applied. Thus, the difference between
initremote
andenableremote
in this case was not immediately clear.