This special remote type stores file contents in a bup repository. By using git-annex in the front-end, and bup as a remote, you get an easy git-style interface to large files, and easy backups of the file contents using git.
This is particularly well suited to collaboration on projects involving large files, since both the git-annex and bup repositories can be accessed like any other git repository.
See using bup for usage examples.
Each individual key is stored in a bup remote using bup split
, with
a git branch named the same as the key name. Content is retrieved from
bup using bup join
. All other bup operations are up to you -- consider
running bup fsck --generate
in a cron job to generate recovery blocks,
for example; or clone bup's git repository to further back it up.
configuration
These parameters can be passed to git annex initremote
to configure bup:
buprepo
- Required. This is passed tobup
as the--remote
to use to store data. To create the repository,bup init
will be run. Example: "buprepo=example.com:/big/mybup" or "buprepo=/big/mybup" (To use the default~/.bup
repository on the local host, specify "buprepo=")encryption
- One of "none", "hybrid", "shared", or "pubkey". See encryption. Note that using encryption will prevent de-duplication of content stored in the buprepo.keyid
- Specifies the gpg key to use for encryption.
Options to pass to bup split
when sending content to bup can also
be specified, by using git config annex.bup-split-options
. This
can be used to, for example, limit its bandwidth.
notes
git-annex-shell does not support bup, due to the wacky way that bup starts its server. So, to use bup, you need full shell access to the server.
Hello,
I get this error when trying to use git-annex with bup and gnupg:
I was able to work-around this issue by altering /usr/lib/bup/cmd/bup-split (though I don't think its a bup bug) to just pull from stdin:
files = [sys.stdin]
on ~ line 128.
Any ideas? Also, do you think that bup's data-deduplication does anything when gnupg is enabled, i.e. is it just as well to use a directory remote with gnupg?
Thanks! Git annex rules!
Albert
@Albert, thanks for reporting this bug (but put them in bugs in future please).
This is specific to using the bup special remote with encryption. Without encryption it works. And no, it won't manage to deduplicate anything that's encrypted, as far as I know.
I think bup-split must have used - for stdin in the past, but now, it just reads from stdin when no file is specified, so I've updated git-annex.
Hi,
Is the bup remote available via the Assistant user interface?
Unrelated question;
If you are syncing files between two bup repos on local usb drives, does it use git to sync the changes or does it use "bup split" to re-add the file? (Basically, is the syncing as efficient as possible using git-annex or would I have to go to a lower level)
Many Thanks, Sek
I don't plan to support creating bup spefial remotes in the assistant, currently. Of course the assistant can use bup special remotes you set up.
Your two bup repos would be synced using bup-split.
I've run into problems storing a huge number of files in the bup repo. It seems that thousands of branches are a problem. I don't know if it's a problem of git-annex, bup, or the filesystem.
How about adding an option to store tree/commit ids in git-annex instead of using branches in bup?
bup-split
uses a git branch to name the objects stored in the bup repository. So it will be limited by any scalability issues affecting large numbers of git branches. I don't know what those are.Yes, it would be possible to make git-annex store this in the git-annex branch instead.
Tobias/joey,
I think there are at least two scaling issues that may be causing you trouble. One is that bup writes pack+idx files rather than bare objects, and if you send 1 file per call to bup-split, you end up with a pair of pack and idx files for each such call. When you later try to retrieve a blob, bup currently just calls git, and git will have to traverse all these tiny idx files looking for the right hash (bare objects you could at least find by name). You can probably ameliorate the pain by calling git repack (look at the -a and --max-pack-size switches) on your bup repository. The other is the "thousands of branches" issue, and I think "git pack-refs --all" (that's again on your bup repository) might help a little bit.
It would certainly help performance if you could store blob/tree ids in git-annex instead of branch names. For small files, all bup would need to store is a blob, but currently you end up storing a blob, a tree, and a commit (and looking-up all of those, plus the ref too, on calling bup-join). (you might want to patch bup-split, so it would allow you to ask it for "--blob-or-tree", because currently if you say you pass it -b for blob-ids, then for bigger files you get a series of IDs, whereas you'd be much better off with a tree-id there)
Thinking about this some more, a very elegant way to make a bup remote could actually be to just pass the whole .git/annex tree into bup-index/save (you could avoid sending some files by only bup-indexing select subtrees, or by using --exclude-*'s, but you'd run bup-save over the whole .git/annex tree). You could then use bup-restore to retrieve files or whole subtrees, and you'd refer to the files you're retrieving by their actual pathname under which they live in .git/annex (if that doesn't make sense it's because I've misunderstood how git-annex is organised!), so something like "bup restore branch_name/latest/.git/annex/aa/bb/sha-of-some-sort" would work - that's cute, right? And you'd only have 1 branch.
However... somebody who is good with lazy-evaluation would need to rework bup.vfs: currently, if you'd call bup-restore on a path like that, it would instantiate a lot of vfs-nodes you don't need - to begin with, it would make a node for every commit you ever made (on any branch!) - on a big repository you'd wait ages for it to just find the commit objects...
Hi All,
I managed to answer my questions above about copying changes between local bup repositories efficiently.
You run the following commands
Now
git annex whereis
will show the correct location andgit annex get <file> --from bup_repo_2
will work.So far in my testing I haven't found any problems...
I set up 2 servers running git annex assistant, both with a ~/annex dir and an additional ~/annex-bup bup repo. There is no additional cloud repository. For test, I added my /etc dir which uploaded correctly from server1, but which never arrived on server2
As you can see, server 2 just doesn't know the data is already on it's own disk in it's local bup repo. Is there a reason this data does not get synced? Should I set up a transfer repo?
How can I restore the previous commit from bup archives created with bup-split? Yes, I know I can use bup-join local-arch~1 git notation, but I would like to use bup-join local-arch/2014-12-03-235617 (bup-ls local-arch results) ...but this method does not work ...
s.
The buprepo parameter doesn't seem to work properly, at least for local repos. For example, I added a bup remote with buprepo=/media/hdd/bup, yet when I try to move files onto it, it still tries to use /home/foo/.bup (the default path). I suppose git-annex should be setting the envvar BUP_DIR before calling bup?
I can do it manually like BUP_DIR=/media/hdd/bup git annex move --to hdd foo, but that almost seems to defeat the purpose...
@darkfeline, setting buprepo= causes git-annex to run bup with -r. You can verify this by using the --debug switch.
IIRC, bup still creates ~/.bup when used this way, but doesn't store the contents of annexed files there. It uses it only to store some small index files, which are also stored in the repo specified with -r. This seems weird, but I don't think this is a bug on bup's part; it seems to intentionally do that, using path names in ~/.bup that are constructed to not conflict when -r is used with different repositories. I suppose bup has a good reason to do this, though I don't know what the reason is.
I can blow ~/.bup away, run "bup init" to make a fresh clean ~/.bup, and then git-annex can still get the content of files from the buprepo= repository. So, it seems that buprepo= is working ok.