add --json-progress support in push and pull

The pull and push commands do not have --json-progress support. Please add.

wontfix but see my comment at the end for alternatives. --Joey

They don't have --json, which would be a necessary first step.

This was considered in this_todo when adding --json to many commands, and the thinking for not adding it was:

git-annex-sync (while it would be pretty easy to support, it outputs different types of messages depending on what remotes it syncs with and what needs to be done. Eg, copy to remote, or export to remote, or import from remote. Each would be a different format of json message, which violates the principle that all git-annex json output should be discoverable by simply running the command. And of course, everything it does can be done by other commands, which can support json without having that problem.)

sync had not been split into pull and push at that point. Being split does reduce the space of different things, but it's still multiple things, so still a problem for json output discoverability.

(Also push and pull can drop from a remote or locally, and of course there are the git operations they do, which would probably have to become silent in json mode.)

Comment by joey — Mon Feb 5 17:57:49 2024

comment 2

To me, the purpose of any progress message is twofold: 1) let the user know the program is not stuck and 2) provide information about what was happening if it does get stuck. Push and pull do this now without JSON.

I agree the same JSON progress object format should be used for all commands. Currently the action progress information includes: command, file, input, byte-progress, total-size, and percent-progress.

The calling application is able to "know" which command it issued to cause the progress reports.

A separate, optional field could be added for the particular remote involved. For example, messages like "Copy file A to remote-one X% complete" and "Export file B to remote-two X% complete" should be possible.

I'm in favor of documenting data structures. Forcing developers to reverse-engineer the structure is inefficient at best. While the principle that all git-annex json output should be discoverable by simply running the command sounds good, I find it leaves much to be desired. Just to find out what's available, someone must setup and run the command. I suggest updating the requirement to "All git-annex JSON output objects are documented."

Users will get impatient and grow frustrated without some type of indication work is in progress.

I'm OK with leaving Git operations silent for now.

Please improve the end-user experience by devising a way to inform the user git-annex is making progress with JSON.

Comment by jstritch — Tue Feb 6 16:10:56 2024

comment 3

Additional clarification of my posts above:

1) JSON allows optional fields. That is, fields which are only present when needed. I am specifically saying not to include all fields every time with null values as needed.

2) Occasionally tacking the remote name onto the end of a string named command is not discoverable. The command key would have to be renamed to something like commandAndSometimesRemote to be discoverable. jstritch's naming rule #5: If a name needs a conjunction to accurately describe it, the design can be improved.

3) The optional field name is more easily observed than the optional string content.

4) Consider the application code consuming the JSON. A test for the presence of the remote name is required either way. Do you want those applications writing the test "if the command value contains one space" or "if the remote key is present"? Code is written once and read hundreds of times. The second test conveys the intent, reducing maintenance cost.

5) The application code to deal with splitting the string and handling each part becomes unnecessary with the optional field.

6) The documentation of the JSON could include a matrix showing the key name and its data type versus the commands, similar to a feature comparison table.

7) Documenting the JSON does not make it less discoverable.

I hope you find this information helpful to improve the end-user experience. Let me know if you have any questions.

Comment by jstritch — Wed Feb 7 15:52:18 2024

comment 4

Rather than using git-annex push/pull/sync with a complex json format, complications of knowing what remote it's acting on, etc, a program can simply use git-annex get/copy/drop/import/export to do the same operations, all of which already support json.

I'm OK with leaving Git operations silent for now.

In the case where the git operation needs to prompt for a password, this would leave the user with a password prompt with no prior indication of what is being done. I don't think that's acceptable.

Please improve the end-user experience by devising a way to inform the user git-annex is making progress with JSON.

I hope you find this information helpful to improve the end-user experience. Let me know if you have any questions.

Yikes, that almost triggered my ChatGPT detector.

Comment by joey — Tue Feb 27 17:22:52 2024

comment 5

datalad push currently does not use git-annex push and it would be good it it could in order to avoid some surprising behavior with its current implementation.

But, it parses the git push output to display its own progress messages. Since git-annex push interleaves that with whatever else it outputs, adapting to parsing it would be difficult.

In order for it to use git-annex push, it seems it would need --json-progress support, and either parsing of the git push in git-annex that feeds through to the --json-progress, or some form of machine readable delimiters in stdout and stderr around the git push output.

Comment by joey — Tue May 5 14:37:42 2026

comment 6

In the case where the git operation needs to prompt for a password, this would leave the user with a password prompt with no prior indication of what is being done. I don't think that's acceptable.

git pull and git push over ssh prompt for the password (to /dev/tty) before outputing anything else. So I suppose it is acceptable.

Comment by joey — Wed May 6 12:42:20 2026

comment 7

git pull outputs its progress to stderr. So --json could leave that alone and a program wanting to parse it just consume stderr. Delimiters could be added to stderr around the git pull (with a separate option) to make it easier for a program to find and parse it.

git pull also outputs some things to stdout. In particular, that includes the git merge output when the merge is successful. It seems to me that could be put in the json object, eg:

{"command":"pull","output":["Updating 8a433d0..9d47770" ...

While that will buffer it until the pull is complete. That seems ok; it's displayed by git pull after the usually more expensive network operation, so buffering it briefly wouldn't be too noticable if a json consumer chooses to show it to the user.

Note that git-annex pull will pull from the remote a second time after transferring content to/from it. So the json will have 2 "command":"pull" records. And stderr may contain 2 delimited git pull stderrs. The --json consumer may find that surprising, and it doesn't always happen, which gets back to the original problem of the --json not being discoverable.

Comment by joey — Wed May 6 12:56:10 2026

comment 8

datalad push wants to use the same git push operations as git-annex push does, which is nontrivial to reimplement, especially in its handling of the git-annex branch. See the long comment on pushBranch explaining the order of operations.

This is one place where git-annex push can't be emulated using other git-annex commands that do support --json.

But, git-annex push --no-content doesn't do much besides run pushBranch. So datalad push could use it when run in a git-annex repository. There's no need for it to support --json either, the regular git push output goes to stderr, so it can parse the git push progress out of stderr as before.

It may want to pass --quiet to avoid the usual git-annex output to stdout. AFAICS, git push does not itself output to stdout.

The only other thing that command does besides pushBranch is updateBranches, which updates view branches and adjusted branches when run in one.

Comment by joey — Wed May 6 13:12:21 2026

comment 9

FWIW, I've split updateBranches between pull and push now.

On git-annex push all it does is propagate adjusted branches changes back to the original branch.

On git-annex pull it handles updating the view branch and/or propagating changes from the original branch to the adjusted branch.

Also, git-annex push was fixed to not merge synced/master into master and to not update the adjusted branch when the original branch has changed.

Comment by joey — Fri May 29 13:49:46 2026

comment 10

I have done some work adjacent to this todo, implmenting a --wanted option and git-annex put command.

Now, if someone wants the equivilant of git-annex pull --json $someremote, they can run:

git-annex pull --no-content $someremote
git-annex get --wanted --json --from $someremote
git-annex drop --wanted --json

The git-annex pull above does not have json output, but outputs the usual git pull messages for the user to deal with as they see fit.

And, if someone wants the equivilant of git-annex push --json $someremote, they can run:

git-annex copy --wanted --json --to $someremote
git-annex drop --wanted --json --from $someremote
git-annex push --no-content $someremote

The git-annex push above does not have json output, but outputs the usual git push messages for the user to deal with as they see fit.

Similarly, the equivilant of git-annex pull --json with no remote specified:

git-annex pull --no-content
git-annex get --wanted --json
git-annex drop --wanted --json

And, the equivilant of git-annex push without a remote specified:

git-annex put --wanted --json
git-annex drop --wanted --json
git-annex push --no-content

So, the argument for adding --json to pull/push now seems to be reduced. Here are all the arguments I can think of for still doing that:

These command sequences won't behave completely identically to pull/push in all configurations, eg they don't look at remote.<name>.annex-pull and remote.<name>.annex-push configs.
A single git-annex push or pull (or sync) does less work than several git-annex commands. In the command sequences above, git-annex has to traverse the tree twice. That is a pretty small difference in overhead though most of the time.

Comment by joey — Thu Jun 4 13:59:25 2026

Add a comment