Hello,
In order to save space/bandwith/... I would like to create a way to describe a file compared to another. You could see this as a kind of very special "remote" (which is local :P), that says "To produce file XXX, take file YYY, and run command CCC with args YYY".
You may ask why it is useful? I have several usecases:
1) my first usecase is that I would like to be able to generate thumbnails for my pictures in order to speed up display. A thumbnail can be easily created from a picture (with for example the convert command), but when you don't need the thumbnail, you may prefer to remove them locally to save space.
2) similarly, I have some RAW photo files, and a script to turn them into .JPG file. Or even better, I could have several scripts to convert my initial RAW files into several .JPG files, with different parameters/look. Keeping both RAW and developed JPG can be heavy, so this kind of tool could allow me to remove the .JPG file(s) when I don't need them anymore, so I don't mind to drop the .JPG file as soon as the RAW does exist (but if the RAW does not exist anymore, I shouldn't be able to remove the .JPG of course).
3) I also have on my desktop some compressed files (.iso for example, or old projects). Most of the time, I don't really need to keep the uncompressed .iso, but from time to time, I may need them. For now I manually uncompress them, use them, and delete them... But it could be cool to let git-annex deal with them automatically.
Does git-annex provide such functionnality? If not, do you think it could be implementable?
Thanks!
I still think it's doable and worth doing, don't have the bandwidth right now to implement it, but can help brainstorm. If you're interested in working on it, post in the github thread, and maybe we can refine the design.
The key issues are: (1) you can't just
git-annex-copy
a file to this remote, you'll need to usegit-annex-setpresentkey
andgit-annex-registerurl
to record that contents with a given key can be obtained by running a given command, and (2) the result of running a command depends not just on the command line and the input file(s), but also on the environment in which the command is run, so to get bit-for-bit reconstruction of the contents you'd need to use Docker, or at least something like conda. But even then, sometimes the exact output file depends on the current time or the name of some intermediate tempfile. So unless the command is 100% deterministic, re-running the command might produce contents that does not match the git-annex key.For local use, you could make a simple webserver that handles URLs like
http://localhost:3000/cgi-bin/make_thumbnail.sh?orig_file_key=MD5-xxxxxx
, and have the CGI script rungit-annex-get --key
to get the file contents and then extract the thumbnail and return that. Then you can usegit-annex-addurl
to store the file in git-annex.