compute

While other remotes store the contents of annexed files somewhere, this special remote uses a program to compute the contents of annexed files.

To add a file to a compute special remote, use the git-annex-addcomputed command. Once a file has been added to a compute special remote, commands like git-annex get will use it to compute the content of the file.

To enable an instance of this special remote:

# git-annex initremote myremote type=compute program=git-annex-compute-foo

The program parameter is the only required parameter. It is the name of the program to use to compute the contents of annexed files. It must start with "git-annex-compute-". The program needs to be installed somewhere in the PATH.

Any program can be passed to git-annex initremote. However, when enabling a compute special remote later with git-annex enableremote or due to "autoenable=true", the program must be listed in the git config annex.security.allowed-compute-programs.

All other "field=value" parameters passed to initremote will be passed to the program when running git-annex-addcomputed. Note that when the program takes a dashed option, it can be provided after "--":

# git-annex initremote myremote type=compute program=git-annex-compute-foo -- --level=9

See computing annexed files for more documentation.

compute programs

To write programs used by the compute special remote, see the compute special remote interface.

Have you written a generally useful (and secure) compute program? List it here with an example!

git-annex-compute-imageconvert
Uses imagemagick to convert between image formats

git-annex addcomputed --to=imageconvert foo.jpeg foo.gif
git-annex-compute-singularity Uses Singularity to run a container, which is checked into the git-annex repository, to compute other files in the repository. Amoung other things, this can run other compute programs inside a singularity container. Examples here
git-annex-compute-wasmedge
Uses WasmEdge to run WASM programs that are checked into the git-annex repository, to compute other files in the repository. Examples here

RSS Atom

Any way to annotate what are input files?

I don't see an option to specify which annexed files are input files, so annex could get them for comparing to happen to produce output file. That's what we do in datalad run, and it is very handy since allows to not worry about figuring out what to get first

Comment by yarikoptic — Sat Mar 8 14:51:20 2025

Remove comment

Re: Any way to annotate what are input files?

git-annex does know what both the input and the output files are. It learns this by running the compute program and seeing what INPUT and OUTPUT lines it emits.

I considered having some --input= option, but decided that it was more flexible to have a more freeform command line, which the compute program parses.

Comment by joey — Mon Mar 10 20:42:26 2025

Remove comment

comment 3

Thank you for the clarification -- I have missed that there is an "entire" compute special remote interface. Cool!

Comment by yarikoptic — Tue Mar 11 15:09:20 2025

Remove comment

just thinking out loud

it was more flexible to have a more freeform command line, which the compute program parses

agree. And there could be some generic "helper" (or a number of them) which would then provide desired CLI interfacing over arbitrary command, smth like (mimicing datalad-run interface here):

git-annex addcomputed --to=runcmd -i foo.jpeg -o foo.gif

as long as we can pass options like that or after --, e.g.

git-annex addcomputed --to=runcmd -- -i foo.jpeg -o foo.gif -- convert {inputs} {outputs}`

which would then - ensure no stdout from convert - follow the compute special remote interface to let git-annex know what inputs/outputs were

Comment by yarikoptic — Tue Mar 11 15:15:15 2025

Remove comment

Re: just thinking out loud

And there could be some generic "helper" (or a number of them) which would then provide desired CLI interfacing over arbitrary command

Absolutely!

You do need to use "--" before your own custom dashed options.

And bear in mind that "field=value" parameters passed to initremote will be passed on to the program. So you can have a generic helper that is instantiated with a parameter like --command=, which then gets used automatically when running addcompute:

git-annex initremote foo type=compute program=git-annex-compute-generic-helper -- --command='convert {inputs} {outputs}'
git-annex addcomputed --to=foo -- -i foo.jpeg -o foo.gif

Comment by joey — Tue Mar 11 16:42:46 2025

Remove comment

comment 6

I'm getting acquainted with this special remote. I cannot praise it enough. It is brilliant.

This is my first cut git-annex-compute-stripexif:

#!/bin/bash

set -e

if [ -z "$1" ]; then
        echo "Specify the input image file, followed by the output image file." >&2
        echo "Example: foo.jpg foo.gif" >&2
        exit 1
fi

echo REPRODUCIBLE
echo "INPUT $1"
read input

if [ -n "$input" ]; then
        tf=$(mktemp)
        cp "$input" "$tf" >&2
        exiftool -overwrite_original -ALL= "$tf" >&2
        outfile="SANSEXIF-"$(git-annex calckey "$tf")
fi
echo "OUTPUT $outfile"
read output

cp -v "$tf" "$outfile" >&2
rm -v "$tf" >&2

Along the way, I've learnt that EXIF metadata isn't the only metadata stored in a jpeg, so the name is now a bit of a misnomer. Also, as it was more proof-of-concept, the target name and location is not well thought out, and there's no preservation of file extension. It's indicative for now.

The aim is to aid (only) in the identifying two copies of the same jpeg, where only the metadata has been changed (eg. either by adjustments I made by script eons ago, or by apps like Microsoft photoviewer where orientation changes were made via metadata). I say aid only, because it's not going to help if the image is resized, etc. and I understand that.

To that end, I do have some questions. The first is... is it wise (or possible) to try to set metadata on the source files whilst in the script? (since writing this, I have come to understand that the compute script is not run within the working directory, and the implication is that you're not meant to run any git-annex commands)

Obviously, the idea would be to tag the source file with the computed key. I have already verified that if two copies of a jpeg that differ only by metadata, the computed file and key will be the same.

But what I found is, if I don't have that option to set metadata, then respectfully, git-annex-findcomputed may have some deficiencies.

From what I can gather, git-annex-findcomputed will not list the subsequent input file that when added, computes it. Only the first one.

So trying to post process the computed files to perform the setting of metadata on the source files would likely not work.

Also, I was curious about what happens if the input file moves within the archive? I haven't tried... but from what I can see, you wouldn't be able to backtrack from the computed file, because you won't know the key of the input file, in turn to go searching for it (eg. git-annex-whereused).

Is my use case way off base as to why you should use the compute remote?

Comment by beryllium — Mon Jun 9 13:09:25 2025

Remove comment

comment 7

I've realised that... I'm overlooking that the input filename itself is metadata. I have a methodology that I like now.

As per: git-annex addcomputed --to=imageconvert foo.jpeg foo.gif, where foo. is linking metadata, I can just generate a filename (and as I've learnt, path), that links back to the source by retaining it.

I also see now that there is no need to avoid duplication of pointer files to the same computed file by key.

The uncomplicated existing approach is more than sufficient.

Comment by beryllium — Tue Jun 10 12:01:14 2025

Remove comment

Add a comment