Earlier this week I had the opportunity to sit in on a workshop at MIT where students were taught how to use git-annex as part of a stack of tools for reproducible scientific data research. That was great!
One thing we noticed there is, it can be hard to distribute files to such a
class; downloading them individually wastes network bandwidth. Today, I
added git annex multicast which uses uftp
to multicast files to other clones of a repository on a LAN.
An "easy" 500 lines of code and 7 hour job.
There is encryption and authentication, but the key management for this turned out to be simple, since the public key fingerprints can be stored on the git-annex branch, and easily synced around that way. So, I expect this should be not hard to use in a classroom setting such as the one I was in earlier this week.
I wonder: could this be used to send blobs to arbitrary hosts as well? I've been looking at the problem of sharing blobs without sharing the git repository (in semi-synchronized remotes), and while we have the tor/wormhole and SSH remotes that can be (ab)used for this purpose, uftp seems like a much better fit, feature-wise.
When you run:
git annex multicast --gen-address; git annex sync
, does thesync
command exchange git refs over multicast?When you run:
git annex multicast --send
, will that work for all git-annex blobs, or just the ones that are in the local checkout?Looking at the server usage, i see that multicast is the default, but that you can also unicast to specific hosts:
Could this be used to send to arbitrary, non-local hosts on the internet?
And how about tor support? Could this simplify the encrypted setup?
This amazing feature just brings up more questions for me, but i'm really glad to see this come. It certainly addresses parts of Bittorrent-like features, and is probably the first remote that supports bandwidth optimizations for multiple downloads, or rather, in this case, uploads.
Thanks so much for your hard work! The feature set of git-annex never ceases to amaze me - new features just keep on coming, this is great!
Multicast is only being used to send git-annex objects around, not git objects. There's assumed to be some way to sync git repositories, which is how the encryption keys for uftp are distributed.
git annex multicast --send
operates on files in the working tree. It would be possible to make it support--all
.I'm not sure if uftp can send outside the local LAN.
It would certianly be possible to have a special remote backed by uftp that thus only sends to a single host. Since multicast does not send to any particular remote, it did not make sense to implement it as a special remote.