This special remote stores file contents using Tahoe-LAFS. There are a number of commercial providers, or you can build your own tahoe storage grid.
Since Tahoe-LAFS encrypts all data stored in it, git-annex does not do any additional encryption of its own.
Note that data stored in a tahoe remote cannot be dropped from it, as Tahoe-LAFS does not support removing data once it is stored in the Tahoe grid. This, along with Tahoe's ability to recover data when some nodes fail, makes a tahoe special remote an excellent choice for storing backups.
Typically you will have an account on a Tahoe-LAFS storage grid, which
is represented by an "introducer furl". You need to supply this to
git-annex in the TAHOE_FURL
environment variable when initializing the
remote. git-annex will then generate a tahoe configuration directory for
the remote under ~/.tahoe/git-annex/
, and automatically start the tahoe
daemon as needed.
configuration
These parameters can be passed to git annex initremote
to configure
the tahoe remote.
shared-convergence-secret
- Optional. Can be useful to set to allow tahoe to deduplicate information. By default, a new shared-convergence-secret is created for each tahoe remote.embedcreds
- Optional. Set to "yes" embed the tahoe credentials (specifically the introducer-furl and shared-convergence-secret) inside the git repository, which allows other clones to also use them in order to access the tahoe grid.Think carefully about who can access your git repository, and whether you want to give them access to your tahoe system before using embedcreds!
Setup example:
# TAHOE_FURL=... git annex initremote tahoe type=tahoe embedcreds=yes
old version of tahoe special remote
An older implementation of tahoe for git-annex used the hook special remote. It is not compatible with this newer implementation. See tahoe-lafs.
Hi,
I would like to uses git-annex in combination with Tahoe-LAFS. The grid will consist of private Servers connected though slow DSL-Lines. Thus I would like to use the Tahoe-LAFS helper feature (like a Tahoe-LAFS upload proxy):
https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/helper.rst
This will result in a different FURL for each location pointing to the same Tahoe-LAFS grid.
How can I setup two git-annex clients to use two different FURLs for the same remote (the same Tahoe-LAFS grid)?
Thank you very much for your help!
Oliver
@junk, I think the thing for you to do is avoid setting embedcreds=yes. So, git-annex won't store the furl (or the shared-convergence-secret) in the git repository.
Instead, you can set TAHOE_FURL in the environment each time you use
git annex initremote
to set up your tahoe remote in a git-anex repository. (Also pass shared-convergence-secret= if you have one.)Or, you could bypass git-annex's automated setup of tahoe, and set it up yourself, however works for you. The automated setup is targeted at users who want to get tahoe working with git-annex but don't have complicated needs, so maybe it's best for you to just bypass it. The way to do that is to set the git config
remote.<name>.tahoe
, to point to a tahoe coniguration directory. You can do this before runninggit annex initremote
, and it should just use that directory. I have not tested this and may have gone too far in making git-annex automate tahoe setup, and too hard to use a manual tahoe setup -- please file a bug report if that doesn't work.Hi Joey,
thank you very much for your reply!
I've continued to wrap my head around this because it does not seem to be the intended way to do it in git-annex and also the helper is still in development (it only works for uploads so far). Right now I've figured out that I could also add two rsync-remote with the same uuid and have csync2 sync the two servers. This way I could use which ever remote is closest to upload and download my files and csync will make sure that the files end up on all remotes. So in my case this would achieve the same as the Tahoe-LAFS solution but use more of git-annexes fetures then of Tahoe-LAFS. Also I will be able to easily delete files from the git.
I'll keep you posted on how it went.
Best regards,
Oliver
@sanket you need to get your git repositories on the two systems connected in some way (ie, push one to github, pull in the other one). Once the git repositories are in sync,
git annex get
can be run in the second repository and will know how to download files that were uploaded to tahoe from the first repository.Nothing tahoe specific here, that's how any git-annex special remote works..
@JOEY
Where is the TAHOE URI - the file cap in tahoe stored when we use git annex and tahoe?
i am guessing: 1) in the meta data file in git annex 2) in ./git folder
Where is the mapping maintained? That xyz.txt is this particular tahoe URI.
The Tahoe file cap is stored in the git-annex remote state log, which is checked into the git-annex branch of the git-annex repository. It can be displayed using
git-annex whereis
.@joey
I tried your solution but git annex whereis displays the remotes and the uuid of the remotes and not the URI of the file on tahoe. Example in plain text file name is "hello.txt" containing some data and the URI for it in plain text on tahoe is "URI:CHK:wvyj4ah75mh77oehnwv236jogi:a2d4nx7c7jtyllfle573fgkdvfykci2o2glzknv54vhyo23qb2ya:1:1:110".
So i intend to find how and where is the plain text URI of the file stored ? Is it stored in some encrypted form? Where/how can i retrieve the plaintext URI.
Also Question 2) You mentioned about git annex branch. Where is the git annex branch? where is the state log?
Hmm, seems I was wrong about
git annex whereis
displaying the tahoe capability. That was not implemented. I have just made a commit that will implement it though.The git-annex branch is part of your git repository when you're using git-annex. See internals.
On debian buster, copying to or from tahoe seems to alway cause an attempt to start a new tahoe process, even if one is already running, resulting in error messages like this on second or subsequent attempts:
Not a massive problem as everything seems to work.