**EDIT: Mistakenly posted this thread in the forum. I created a new post in todo
Do you think it would be possible to have bittorrent-like transfers between remotes, so that no one remote gets pegged too hard with transfers? It would be great if you distribute your files between multiple bandwidth-capped remotes, and want fast down speed. Obviously, this isn't a simple task, but the protocol is already there, it just needs to be adapted for the purpose (and re-written in Haskell...). Maybe some day in the future after the more important stuff gets taken care of? It could be an enticing stretch goal.
PS: still working on getting BTC, will be donating soon!
I agree. I mentioned it briefly here:
http://git-annex.branchable.com/design/assistant/polls/what_is_preventing_me_from_using_git-annex_assistant/#comment-5a5d46967b826f602c423d7f72ac6f5e
I did briefly look into torrents and the like, but couldn't work out a way to lock it down so only authorized users could use either the tracker or connect to pull the data down...
The security aspect is the kicker in my mind.
Okay I've got some ideas, but they need refining. I should be posting more tomorrow.
One thing I thought of though, there could be two types of torrent support:
You use a torrent file or magnet link as a special remote URL. This remote is read-only, naturally, and you'd only be able to leech (and seed, I suppose) the file in the torrent. Neat feature maybe. I wasn't previously thinking of this.
This isn't really bittorrent, just inspired by it. You use a bittorrent-type protocol to transfer between your private remotes. You could download from multiple remotes at once to get a file (like leeching, without a tracker, just pre-defined peers...aka remotes), or you could superseed to remotes to push a file, and have the remotes then share the parts of the file amongst themselves to each get a complete copy. Remote groups could possibly be built on, or special groups could be coded in to define the swarm. You update (add/remove) the swarm members locally, and push the changes to each of the old swarm members so they can add/remove other remotes (or themselves) from the swarm. Possibly, you could just push the updated swarm list to one remote already in the swarm, it adopts the new list, and shares it with the rest of the swarm. Each remote does the same. Maybe the swarm list should be GPG-signed and verified before adopted and passed forward?
The more I think about it, the more awesome I think it could be, but holy shit it would be a LOT of work.
Personally i think the easiest implementation to make would be smart transfers.
You have a repo on your laptop, and a repo at work. And you have(say) 5 special remotes in between them that they need to use to transfer data.
The laptop could upload 5 different files to those repositories. Wouldn't quite be bittorent transfer(not even close). But it would probably be much simpler to implement. And when syncing a lot of files(instead of just one really big one) the speed gain should really be about the same as if all the special remotes used some kind of bittorent thing.
Just a thought.
Disclaimer: I'm thinking out loud of what could make git-annex even more awesome. I don't expect this to be implemented any time soon. Please pardon any dumbassery.
Much easier to implement, but having your remotes (optionally!) act like a swarm would be an awesome feature to have because you bring in a lot of new features that optimize storage, bandwidth, and overall traffic usage. This would be made a lot easier if parts of it were implemented in small steps that added a nifty feature. The best part is, each of these could be implemented by themselves, and they're all features that would be really useful.
Step 1. Concurrent downloads of a file from remotes.
This would make sense to have, it saves upload traffic on your remotes, and you also get faster DL speeds on the receiving end.
Step 2. Implementing part of the super-seeding capabilities.
You upload pieces of a file to different remotes from your laptop, and on your desktop you can download all those pieces and put them together again to get a complete file. If you really wanted to get fancy, you could build in redundancy (ala RAID) so if a remote or two gets lost, you don't lose the entire file. This would be a very efficient use of storage if you have a bunch of free cloud storage accounts (~1GB each) and some big files you want to back up.
Step 3. Setting it up so that those remotes could talk to one another and share those pieces.
This is where it gets more like bittorrent. Useful because you upload one copy and in a few hours, have say, 5 complete copies spread across your remotes. You could add or remove remotes from a swarm locally, and push those changes to those remotes, which then adapt themselves to suit the new rules and share those with other remotes in the swarm (rules should be GPG-signed as a safety precaution). Also, if/when deltas get implemented, you could push that delta to the swarm and have all the remotes adopt it. This is cooler than regular bittorrent because the shared file can be updated. As a safety precaution, the delta could be GPG signed so a corrupt file doesn't contaminate the entire swarm. Each remote could have bandwidth/storage limits set in a dotfile.
This is a high-level idea of how it might work, and it's also a HUGE set of features to add, but if implemented, you'd be saving a ton of resources, adding new use cases, and making git-annex more flexible.
Obviously, Step 3 would only work on remotes that you have control of processes on, but if given login credentials to cloud storage remotes (potentially dangerous!) they could read/write to something like dropbox or rsync.
Another thing, this would be completely trackerless. You just use remote groups (or create swarm definitions) and share those with your remotes. It's completely decentralized!