draft

The ?compute special remote uses this interface to run compute programs.

When an compute special remote is initremoted, a program is specified:

git-annex initremote myremote type=compute program=git-annex-compute-foo

The user adds an annexed file that is computed by the program by running a command like this:

git-annex addcomputed --to myremote \
    --input raw=file.raw --value passes=10 \
    --output photo=file.jpeg

That command and later git-annex get of a computed file both run the program the same way.

The program is passed inputs to the computation via environment variables, which are all prefixed with "ANNEX_COMPUTE_".

In the example above, the program will be passed this environment:

ANNEX_COMPUTE_INPUT_raw=/path/.git/annex/objects/..
ANNEX_COMPUTE_VALUE_passes=10

Default values that are provided to git-annex initremote will also be set in the environment. Eg git-annex initremote myremote type=compute program=foo passes=9 will set ANNEX_COMPUTE_VALUE_passes=9 by default.

For security, the program should avoid exposing values from ANNEX_COMPUTE_* variables to the shell unprotected, or otherwise executing them.

The program will also inherit other environment variables that were set when git-annex was run, like PATH. (ANNEX_COMPUTE_* environment variables are not inherited.)

The program is run in a temporary directory, which will be cleaned up after it exits. It writes the files that it computes to that directory.

Before starting the main computation, the program must output a list of the files that it will compute, in the form "COMPUTING Id filename". Here "Id" is a short identifier for a particular file, which the user specifies when running git-annex addcomputed.

In the example above, the program is expected to output something like:

COMPUTING photo out.jpeg
COMPUTING sidecar otherfile

If possible, the program should write the content of the file it is computing directly to the file listed in COMPUTING, rather than writing to somewhere else and renaming it at the end. Except, when the program writes the file it computes out of order, it should write to a file somewhere else and rename it at the end.

If git-annex sees that the file corresponding to the key it requested be computed is growing, it will use its file size when displaying progress to the user.

The program can also output lines to stdout to indicate its current progress:

PROGRESS 50%

Anything that the program outputs to stderr will be displayed to the user. This stderr should be used for error messages, and possibly computation output, but not for progress displays.

If the program exits nonzero, nothing it computed will be stored in the git-annex repository.

When run with the "interface" parameter, the program must describe its interface. This is a list of the inputs and outputs that it supports. This allows git-annex addcomputed and git-annex initremote to list inputs and outputs, and also lets them reject invalid inputs and outputs.

The output is lines, in the form:

INPUT[?] Id Description
VALUE[?] Id Description
OUTPUT Id Description

Use "INPUT" when a file is an input to the computation, and "VALUE" for all other input values. Use "INPUT?" and "VALUE?" for optional inputs and values.

The interface can also optionally include a "REPRODUCIBLE" line. That indicates that the results of its computations are expected to be bit-for-bit reproducible. That makes git-annex addcomputed behave as if the --reproducible option is set.

An example git-annex-compute-foo shell script follows:

#!/bin/sh
set -e
if [ "$1" = interface ]; then
    echo "INPUT raw A photo in RAW format"
    echo "VALUE? passes Number of passes"
    echo "OUTPUT photo Computed JPEG"
    echo "REPRODUCIBLE"
    exit 0
fi
if [ -z "$ANNEX_COMPUTE_VALUE_passes" ]; then
    ANNEX_COMPUTE_VALUE_passes=1
fi
echo "COMPUTING photo out.jpeg"
frobnicate --passes="$ANNEX_COMPUTE_VALUE_passes" \
    <"$ANNEX_COMPUTE_INPUT_raw" >out.jpeg