tips: special_remotes/hook with tahoe-lafs is a good start, but Zooko points out that using Tahoe's directory translation layer incurs O(N2) overhead as the number of objects grows. Also, making hash subdirectories in Tahoe is expensive. Instead it would be good to use it as a key/value store directly. The catch is that doing so involves sending the content to Tahoe, and getting back a key identifier.
This would be fairly easy to do as a backend, which can assign its own key names (although typically done before data is stored in it), but a tahoe-lafs special remote would be more flexible.
To support a special remote, a mapping is needed from git-annex keys to Tahoe keys, stored in the git-annex branch.
This is now done, however, there are 3 known problems:
- tahoe start run unncessarily https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2149
- web.port can conflict https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2147
- Nothing renews leases, which is a problem on grids that expire. https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2212
--Joey
Hm... O(N2)? I think it just takes O(N). To read an entry out of a directory you have to download the entire directory (and store it in RAM and parse it). The constants are basically "too big to be good but not big enough to be prohibitive", I think. jctang has reported that his special remote hook performs well enough to use, but it would be nice if it were faster.
The Tahoe-LAFS folks are working on speeding up mutable files, by the way, after which we would be able to speed up directories.
Whoops! You'd only told me O(N) twice before..
So this is not too high priority. I think I would like to get the per-remote storage sorted out anyway, since probably it will be the thing needed to convert the URL backend into a special remote, which would then allow ripping out the otherwise unused pluggable backend infrastructure.
Update: Per-remote storage is now sorted out, so this could be implemented if it actually made sense to do so.