This special remote type stores file contents in a bup repository. By using git-annex in the front-end, and bup as a remote, you get an easy git-style interface to large files, and easy backups of the file contents using git.

This is particularly well suited to collaboration on projects involving large files, since both the git-annex and bup repositories can be accessed like any other git repository.

See using bup for usage examples.

Each individual key is stored in a bup remote using bup split, with a git branch named the same as the key name. Content is retrieved from bup using bup join. All other bup operations are up to you -- consider running bup fsck --generate in a cron job to generate recovery blocks, for example; or clone bup's git repository to further back it up.

configuration

These parameters can be passed to git annex initremote to configure bup:

  • buprepo - Required. This is passed to bup as the --remote to use to store data. To create the repository,bup init will be run. Example: "buprepo=example.com:/big/mybup" or "buprepo=/big/mybup" (To use the default ~/.bup repository on the local host, specify "buprepo=")

  • encryption - One of "none", "hybrid", "shared", or "pubkey". See encryption. Note that using encryption will prevent de-duplication of content stored in the buprepo.

  • keyid - Specifies the gpg key to use for encryption.

Options to pass to bup split when sending content to bup can also be specified, by using git config annex.bup-split-options. This can be used to, for example, limit its bandwidth.

notes

git-annex-shell does not support bup, due to the wacky way that bup starts its server. So, to use bup, you need full shell access to the server.

Hello,

I get this error when trying to use git-annex with bup and gnupg:

move importable_pilot_surveys.tar (gpg) (checking localaseebup...) (to localaseebup...) 
Traceback (most recent call last):
  File "/usr/lib/bup/cmd/bup-split", line 133, in 
    progress=prog)
  File "/usr/lib/bup/bup/hashsplit.py", line 167, in split_to_shalist
    for (sha,size,bits) in sl:
  File "/usr/lib/bup/bup/hashsplit.py", line 118, in _split_to_blobs
    for (blob, bits) in hashsplit_iter(files, keep_boundaries, progress):
  File "/usr/lib/bup/bup/hashsplit.py", line 86, in _hashsplit_iter
    bnew = next(fi)
  File "/usr/lib/bup/bup/helpers.py", line 86, in next
    return it.next()
  File "/usr/lib/bup/bup/hashsplit.py", line 49, in blobiter
    for filenum,f in enumerate(files):
  File "/usr/lib/bup/cmd/bup-split", line 128, in 
    files = extra and (open(fn) for fn in extra) or [sys.stdin]
IOError: [Errno 2] No such file or directory: '-'

I was able to work-around this issue by altering /usr/lib/bup/cmd/bup-split (though I don't think its a bup bug) to just pull from stdin:

files = [sys.stdin]

on ~ line 128.

Any ideas? Also, do you think that bup's data-deduplication does anything when gnupg is enabled, i.e. is it just as well to use a directory remote with gnupg?

Thanks! Git annex rules!

Albert

Comment by Albert Mon Oct 22 20:56:56 2012

@Albert, thanks for reporting this bug (but put them in bugs in future please).

This is specific to using the bup special remote with encryption. Without encryption it works. And no, it won't manage to deduplicate anything that's encrypted, as far as I know.

I think bup-split must have used - for stdin in the past, but now, it just reads from stdin when no file is specified, so I've updated git-annex.

Comment by joeyh.name Tue Oct 23 20:01:43 2012

Hi,

Is the bup remote available via the Assistant user interface?

Unrelated question;

If you are syncing files between two bup repos on local usb drives, does it use git to sync the changes or does it use "bup split" to re-add the file? (Basically, is the syncing as efficient as possible using git-annex or would I have to go to a lower level)

Many Thanks, Sek

Comment by sekenre Wed Mar 13 12:54:56 2013

I don't plan to support creating bup spefial remotes in the assistant, currently. Of course the assistant can use bup special remotes you set up.

Your two bup repos would be synced using bup-split.

Comment by joey Wed Mar 13 16:05:50 2013

I've run into problems storing a huge number of files in the bup repo. It seems that thousands of branches are a problem. I don't know if it's a problem of git-annex, bup, or the filesystem.

How about adding an option to store tree/commit ids in git-annex instead of using branches in bup?

Comment by Tobias Sun Mar 31 21:05:32 2013

bup-split uses a git branch to name the objects stored in the bup repository. So it will be limited by any scalability issues affecting large numbers of git branches. I don't know what those are.

Yes, it would be possible to make git-annex store this in the git-annex branch instead.

Comment by joey Tue Apr 2 21:24:06 2013

Tobias/joey,

I think there are at least two scaling issues that may be causing you trouble. One is that bup writes pack+idx files rather than bare objects, and if you send 1 file per call to bup-split, you end up with a pair of pack and idx files for each such call. When you later try to retrieve a blob, bup currently just calls git, and git will have to traverse all these tiny idx files looking for the right hash (bare objects you could at least find by name). You can probably ameliorate the pain by calling git repack (look at the -a and --max-pack-size switches) on your bup repository. The other is the "thousands of branches" issue, and I think "git pack-refs --all" (that's again on your bup repository) might help a little bit.

It would certainly help performance if you could store blob/tree ids in git-annex instead of branch names. For small files, all bup would need to store is a blob, but currently you end up storing a blob, a tree, and a commit (and looking-up all of those, plus the ref too, on calling bup-join). (you might want to patch bup-split, so it would allow you to ask it for "--blob-or-tree", because currently if you say you pass it -b for blob-ids, then for bigger files you get a series of IDs, whereas you'd be much better off with a tree-id there)

Comment by Yung-Chin Fri May 3 14:57:51 2013

Thinking about this some more, a very elegant way to make a bup remote could actually be to just pass the whole .git/annex tree into bup-index/save (you could avoid sending some files by only bup-indexing select subtrees, or by using --exclude-*'s, but you'd run bup-save over the whole .git/annex tree). You could then use bup-restore to retrieve files or whole subtrees, and you'd refer to the files you're retrieving by their actual pathname under which they live in .git/annex (if that doesn't make sense it's because I've misunderstood how git-annex is organised!), so something like "bup restore branch_name/latest/.git/annex/aa/bb/sha-of-some-sort" would work - that's cute, right? And you'd only have 1 branch.

However... somebody who is good with lazy-evaluation would need to rework bup.vfs: currently, if you'd call bup-restore on a path like that, it would instantiate a lot of vfs-nodes you don't need - to begin with, it would make a node for every commit you ever made (on any branch!) - on a big repository you'd wait ages for it to just find the commit objects...

Comment by Yung-Chin Fri May 3 16:34:05 2013

Hi All,

I managed to answer my questions above about copying changes between local bup repositories efficiently.

You run the following commands

git annex copy . --to bup_repo_1                      # Uses bup split in the background (slow)
rsync -av /mnt/repodisk1/repo/ /mnt/repodisk2/repo/ \
--exclude=config --exclude=*.bloom --exclude=*.midx   # rsync without bup-specific indices (speed depends on delta between repositories)
BUP_DIR=/mnt/repodisk2/repo/ bup midx -a && bup bloom # rebuild bup-specific indices on the target (this is extremely fast)
git annex copy . --to bup_repo_2                      # Records file is now available in repo2 (also extremely fast)

Now git annex whereis will show the correct location and git annex get <file> --from bup_repo_2 will work.

So far in my testing I haven't found any problems...

Comment by sekenre Tue May 7 16:46:34 2013

I set up 2 servers running git annex assistant, both with a ~/annex dir and an additional ~/annex-bup bup repo. There is no additional cloud repository. For test, I added my /etc dir which uploaded correctly from server1, but which never arrived on server2

bup@bup1:~/annex/etc$ git annex whereis updatedb.conf
whereis updatedb.conf (3 copies) 
    687d3a7f-4798-4dbe-8774-1785b8ab6b7d -- here (bup@bup1:~/annex)
    adfc1307-771f-40e9-b794-bae2e1f21b8b -- bup2-annex-bup
    e4e0ac0b-992a-4312-a4ac-fc8d3d9f7c0f -- bup1-annex-bup
ok

bup@bup2:~/annex/etc$ git annex whereis updatedb.conf
whereis updatedb.conf (1 copy) 
    687d3a7f-4798-4dbe-8774-1785b8ab6b7d -- bup1 (bup@bup1:~/annex)
ok

As you can see, server 2 just doesn't know the data is already on it's own disk in it's local bup repo. Is there a reason this data does not get synced? Should I set up a transfer repo?

Comment by Tim Wed May 15 15:08:54 2013
Sorry, looks like I did initremote twice on the same folder, instead of enableremote the second time...
Comment by Tim Wed May 15 15:39:31 2013

How can I restore the previous commit from bup archives created with bup-split? Yes, I know I can use bup-join local-arch~1 git notation, but I would like to use bup-join local-arch/2014-12-03-235617 (bup-ls local-arch results) ...but this method does not work ...

Comment by Sergiusz Thu Dec 4 19:38:07 2014