While other remotes store the contents of annexed files somewhere, this special remote uses a program to compute the contents of annexed files.
To add a file to a compute special remote, use the git-annex-addcomputed
command. Once a file has been added to a compute special remote, commands
like git-annex get
will use it to compute the content of the file.
To enable an instance of this special remote:
# git-annex initremote myremote type=compute program=git-annex-compute-foo
The program
parameter is the only required parameter. It is the name of the
program to use to compute the contents of annexed files. It must start with
"git-annex-compute-". The program needs to be installed somewhere in the
PATH
.
Any program can be passed to git-annex initremote
. However, when enabling
a compute special remote later with git-annex enableremote
or due to
"autoenable=true", the program must be listed in the git config
annex.security.allowed-compute-programs
.
All other "field=value" parameters passed to initremote
will be passed
to the program when running git-annex-addcomputed. Note that when the
program takes a dashed option, it can be provided after "--":
# git-annex initremote myremote type=compute program=git-annex-compute-foo -- --level=9
See computing annexed files for more documentation.
compute programs
To write programs used by the compute special remote, see the compute special remote interface.
Have you written a generally useful (and secure) compute program? List it here with an example!
git-annex-compute-imageconvert
Uses imagemagick to convert between image formatsgit-annex addcomputed --to=imageconvert foo.jpeg foo.gif
git-annex-compute-singularity Uses Singularity to run a container, which is checked into the git-annex repository, to compute other files in the repository. Amoung other things, this can run other compute programs inside a singularity container. Examples here
git-annex-compute-wasmedge
Uses WasmEdge to run WASM programs that are checked into the git-annex repository, to compute other files in the repository. Examples here
git-annex does know what both the input and the output files are. It learns this by running the compute program and seeing what INPUT and OUTPUT lines it emits.
I considered having some
--input=
option, but decided that it was more flexible to have a more freeform command line, which the compute program parses.agree. And there could be some generic "helper" (or a number of them) which would then provide desired CLI interfacing over arbitrary command, smth like (mimicing datalad-run interface here):
as long as we can pass options like that or after
--
, e.g.which would then - ensure no stdout from
convert
- follow the compute special remote interface to let git-annex know what inputs/outputs wereAbsolutely!
You do need to use "--" before your own custom dashed options.
And bear in mind that "field=value" parameters passed to initremote will be passed on to the program. So you can have a generic helper that is instantiated with a parameter like --command=, which then gets used automatically when running addcompute:
I'm getting acquainted with this special remote. I cannot praise it enough. It is brilliant.
This is my first cut git-annex-compute-stripexif:
Along the way, I've learnt that EXIF metadata isn't the only metadata stored in a jpeg, so the name is now a bit of a misnomer. Also, as it was more proof-of-concept, the target name and location is not well thought out, and there's no preservation of file extension. It's indicative for now.
The aim is to aid (only) in the identifying two copies of the same jpeg, where only the metadata has been changed (eg. either by adjustments I made by script eons ago, or by apps like Microsoft photoviewer where orientation changes were made via metadata). I say aid only, because it's not going to help if the image is resized, etc. and I understand that.
To that end, I do have some questions. The first is... is it wise (or possible) to try to set metadata on the source files whilst in the script? (since writing this, I have come to understand that the compute script is not run within the working directory, and the implication is that you're not meant to run any git-annex commands)
Obviously, the idea would be to tag the source file with the computed key. I have already verified that if two copies of a jpeg that differ only by metadata, the computed file and key will be the same.
But what I found is, if I don't have that option to set metadata, then respectfully, git-annex-findcomputed may have some deficiencies.
From what I can gather, git-annex-findcomputed will not list the subsequent input file that when added, computes it. Only the first one.
So trying to post process the computed files to perform the setting of metadata on the source files would likely not work.
Also, I was curious about what happens if the input file moves within the archive? I haven't tried... but from what I can see, you wouldn't be able to backtrack from the computed file, because you won't know the key of the input file, in turn to go searching for it (eg. git-annex-whereused).
Is my use case way off base as to why you should use the compute remote?
I've realised that... I'm overlooking that the input filename itself is metadata. I have a methodology that I like now.
As per:
git-annex addcomputed --to=imageconvert foo.jpeg foo.gif
, where foo. is linking metadata, I can just generate a filename (and as I've learnt, path), that links back to the source by retaining it.I also see now that there is no need to avoid duplication of pointer files to the same computed file by key.
The uncomplicated existing approach is more than sufficient.