Christophe-Marie Duquesne has just announced Sharebox, a FUSE filesystem relying on git-annex:
What are your goals? Seamless synchronization "à la dropbox". Ability to use with big binary files such as mp3/movies. Entirely decentralized. Don't use unnecessary space Keep it simple: avoid special VCS commands and keep a filesystem interface as much as possible.
While still alpha, this is promising. --Joey
This is what the assistant should aim to be..."Like Dropbox", but even better. I would guess many people who pledged were thinking of this.
@Yaroslav: I made one of these while I was messing with FUSE but found I didn't use it much.
If I can find it, I'll post it somewhere or if you really want it, I can just write a (much) better one!
what could I say to a "much better one" offer, besides "GO AHEAD" and "Thank you in advance"!
I wonder though what joey thinks about possible utility of a basic fuse wrapper for annex, and possibly shipping it along?
My primary use-case would be primarily oriented for testing, e.g. if I would like to run a (sub)collection of tests (e.g. on travis) which rely on having some data from annex available, now I would need either provide some project/language specific wrapping which would check if file is available or not and then fetch it. With FUSE I thought I could just do that transparently without requiring any per-project coding/setup. Similar use-case would be analysis of some large datasets, once again, without requiring pre-fetching them in entirety and/or piece-by-piece fetching. Another possible additional usecase/mode could also be -- expose only available files under FUSE. If easy to "trigger" it would help to provide that "lean" view I was blurbing about (https://github.com/datalad/datalad/issues/25) although it would be quite a suboptimal workaround (since if directory is heavily loaded with broken links, it would take a while for FUSE handler to first traverse the tree anyways)
Having the lean view would be easy to implement either as an option you pass when mounting or something you can toggle by touching a file ($MNT/.config/lean/{on,off}).
Regarding fetching of files, how would you like it to behave? My previous one would return EBUSY while downloading a file and ENODATA if it wasn't available and couldn't be fetched. I could, for example, make unavailable files appear as normal files (containing text regarding the download state) until they are available, then they become symlinks. What would work best for you?
for my use cases the best would be if FUSE simply didn't return until file becomes available. Making an option to return immediately with EBUSY/ENODATA could also be generally useful but not in my case I wonder if any timeout would kick in in some use-cases if it takes too long?
Okie dokie, I'll see what I can do.
Can you give me an idea of the annex file properties (file size, count, files per directory, directory count) etc. please?
Thanks for doing it and asking for detail!!! Repositories will vary quite a bit. I am currently testing how big we could actually make them (see https://github.com/datalad/datalad/issues/17)
Meanwhile here are sample few available for git clone/testing:
https://github.com/datalad/nih--videocast a good collection of heavyish video files http://psydata.ovgu.de/forrest_gump/.git/ a good single dataset with probably a somewhat typical amount of data http://data.pymvpa.org/datasets/haxby2001/.git/ relatively small dataset with typical data sizes
My concern about using FUSE has always been that I don't much like it when open() hangs indefinitely, with no progress indication, and is either downloading some large file from the network or .. just hung.
That doesn't strike me as a nice user interface in general, which is why I avoided using FUSE for the assistant.
It might make sense in the batch use cases Yaroslav gave. If something nice is developed, I would not be against including it in git-annex. (Bonus if it's implemented in haskell.)