They don't have --json, which would be a necessary first step.
This was considered in
this_todo
when adding --json to many commands, and the thinking for not adding it
was:
git-annex-sync (while it would be pretty easy to support, it outputs
different types of messages depending on what remotes it syncs with and
what needs to be done. Eg, copy to remote, or export to remote, or import
from remote. Each would be a different format of json message, which
violates the principle that all git-annex json output should be
discoverable by simply running the command. And of course, everything it
does can be done by other commands, which can support json without having
that problem.)
sync had not been split into pull and push at that point. Being split does
reduce the space of different things, but it's still multiple things, so
still a problem for json output discoverability.
(Also push and pull can drop from a remote or locally, and of course there are
the git operations they do, which would probably have to become silent in
json mode.)
To me, the purpose of any progress message is twofold: 1) let the user know the program is not stuck and 2) provide information about what was happening if it does get stuck. Push and pull do this now without JSON.
I agree the same JSON progress object format should be used for all commands. Currently the action progress information includes: command, file, input, byte-progress, total-size, and percent-progress.
The calling application is able to "know" which command it issued to cause the progress reports.
A separate, optional field could be added for the particular remote involved. For example, messages like "Copy file A to remote-one X% complete" and "Export file B to remote-two X% complete" should be possible.
I'm in favor of documenting data structures. Forcing developers to reverse-engineer the structure is inefficient at best. While the principle that all git-annex json output should be discoverable by simply running the command sounds good, I find it leaves much to be desired. Just to find out what's available, someone must setup and run the command. I suggest updating the requirement to "All git-annex JSON output objects are documented."
Users will get impatient and grow frustrated without some type of indication work is in progress.
I'm OK with leaving Git operations silent for now.
Please improve the end-user experience by devising a way to inform the user git-annex is making progress with JSON.
1) JSON allows optional fields. That is, fields which are only present when needed. I am specifically saying not to include all fields every time with null values as needed.
2) Occasionally tacking the remote name onto the end of a string named command is not discoverable. The command key would have to be renamed to something like commandAndSometimesRemote to be discoverable. jstritch's naming rule #5: If a name needs a conjunction to accurately describe it, the design can be improved.
3) The optional field name is more easily observed than the optional string content.
4) Consider the application code consuming the JSON. A test for the presence of the remote name is required either way. Do you want those applications writing the test "if the command value contains one space" or "if the remote key is present"? Code is written once and read hundreds of times. The second test conveys the intent, reducing maintenance cost.
5) The application code to deal with splitting the string and handling each part becomes unnecessary with the optional field.
6) The documentation of the JSON could include a matrix showing the key name and its data type versus the commands, similar to a feature comparison table.
7) Documenting the JSON does not make it less discoverable.
I hope you find this information helpful to improve the end-user experience. Let me know if you have any questions.
Rather than using git-annex push/pull/sync with a complex json format,
complications of knowing what remote it's acting on, etc, a program can
simply use git-annex get/copy/drop/import/export to do the same operations,
all of which already support json.
I'm OK with leaving Git operations silent for now.
In the case where the git operation needs to prompt for a password, this
would leave the user with a password prompt with no prior indication of what is
being done. I don't think that's acceptable.
Please improve the end-user experience by devising a way to inform the
user git-annex is making progress with JSON.
I hope you find this information helpful to improve the end-user experience.
Let me know if you have any questions.
datalad push currently does not use git-annex push and it would be good
it it could in order to avoid some surprising behavior with its current
implementation.
But, it parses the git push output to display its own
progress messages. Since git-annex push interleaves that with whatever
else it outputs, adapting to parsing it would be difficult.
In order for it to use git-annex push, it seems it
would need --json-progress support, and either parsing of the git push
in git-annex that feeds through to the --json-progress, or some form of
machine readable delimiters in stdout and stderr around the git push
output.
In the case where the git operation needs to prompt for a password, this would leave the user with a password prompt with no prior indication of what is being done. I don't think that's acceptable.
git pull and git push over ssh prompt for the password (to /dev/tty)
before outputing anything else. So I suppose it is acceptable.
git pull outputs its progress to stderr. So --json could leave that alone
and a program wanting to parse it just consume stderr. Delimiters could
be added to stderr around the git pull (with a separate option)
to make it easier for a program to find and parse it.
git pull also outputs some things to stdout.
In particular, that includes the git merge output when the merge is
successful. It seems to me that could be put in the json object, eg:
While that will buffer it until the pull is complete. That seems ok;
it's displayed by git pull after the usually more expensive
network operation, so buffering it briefly wouldn't be too noticable if
a json consumer chooses to show it to the user.
Note that git-annex pull will pull from the remote a second time after
transferring content to/from it. So the json will have 2 "command":"pull"
records. And stderr may contain 2 delimited git pull stderrs.
The --json consumer may find that surprising, and it doesn't always happen,
which gets back to the original problem of the --json not being discoverable.
datalad push wants to use the same git push operations as
git-annex push does, which is nontrivial to reimplement,
especially in its handling of the git-annex branch.
See the long comment on pushBranch explaining the order of operations.
This is one place where git-annex pushcan't be emulated using other
git-annex commands that do support --json.
But, git-annex push --no-content doesn't do much besides run pushBranch.
So datalad push could use it when run in a git-annex repository.
There's no need for it to support --json either, the regular git push
output goes to stderr, so it can parse the git push progress out of
stderr as before.
It may want to pass --quiet to avoid the usual git-annex output to
stdout. AFAICS, git push does not itself output to stdout.
The only other thing that command does besides pushBranch is
updateBranches, which updates view branches and adjusted branches when
run in one.
The git-annex push above does not have json output, but outputs the
usual git push messages for the user to deal with as they see fit.
Similarly, the equivilant of git-annex pull --json with no remote
specified:
git-annex pull --no-content
git-annex get --wanted --json
git-annex drop --wanted --json
And, the equivilant of git-annex push without a remote specified:
git-annex put --wanted --json
git-annex drop --wanted --json
git-annex push --no-content
So, the argument for adding --json to pull/push now seems to be reduced.
Here are all the arguments I can think of for still doing that:
These command sequences won't behave completely identically to pull/push
in all configurations, eg they don't look at remote.<name>.annex-pull
and remote.<name>.annex-push configs.
A single git-annex push or pull (or sync) does less work than several
git-annex commands. In the command sequences above, git-annex has to
traverse the tree twice. That is a pretty small difference in overhead
though most of the time.
They don't have --json, which would be a necessary first step.
This was considered in this_todo when adding --json to many commands, and the thinking for not adding it was:
sync had not been split into pull and push at that point. Being split does reduce the space of different things, but it's still multiple things, so still a problem for json output discoverability.
(Also push and pull can drop from a remote or locally, and of course there are the git operations they do, which would probably have to become silent in json mode.)
To me, the purpose of any progress message is twofold: 1) let the user know the program is not stuck and 2) provide information about what was happening if it does get stuck. Push and pull do this now without JSON.
I agree the same JSON progress object format should be used for all commands. Currently the action progress information includes: command, file, input, byte-progress, total-size, and percent-progress.
The calling application is able to "know" which command it issued to cause the progress reports.
A separate, optional field could be added for the particular remote involved. For example, messages like "Copy file A to remote-one X% complete" and "Export file B to remote-two X% complete" should be possible.
I'm in favor of documenting data structures. Forcing developers to reverse-engineer the structure is inefficient at best. While the principle that all git-annex json output should be discoverable by simply running the command sounds good, I find it leaves much to be desired. Just to find out what's available, someone must setup and run the command. I suggest updating the requirement to "All git-annex JSON output objects are documented."
Users will get impatient and grow frustrated without some type of indication work is in progress.
I'm OK with leaving Git operations silent for now.
Please improve the end-user experience by devising a way to inform the user git-annex is making progress with JSON.
Additional clarification of my posts above:
1) JSON allows optional fields. That is, fields which are only present when needed. I am specifically saying not to include all fields every time with
nullvalues as needed.2) Occasionally tacking the remote name onto the end of a string named
commandis not discoverable. Thecommandkey would have to be renamed to something likecommandAndSometimesRemoteto be discoverable. jstritch's naming rule #5: If a name needs a conjunction to accurately describe it, the design can be improved.3) The optional field name is more easily observed than the optional string content.
4) Consider the application code consuming the JSON. A test for the presence of the remote name is required either way. Do you want those applications writing the test "if the command value contains one space" or "if the remote key is present"? Code is written once and read hundreds of times. The second test conveys the intent, reducing maintenance cost.
5) The application code to deal with splitting the string and handling each part becomes unnecessary with the optional field.
6) The documentation of the JSON could include a matrix showing the key name and its data type versus the commands, similar to a feature comparison table.
7) Documenting the JSON does not make it less discoverable.
I hope you find this information helpful to improve the end-user experience. Let me know if you have any questions.
Rather than using git-annex push/pull/sync with a complex json format, complications of knowing what remote it's acting on, etc, a program can simply use git-annex get/copy/drop/import/export to do the same operations, all of which already support json.
In the case where the git operation needs to prompt for a password, this would leave the user with a password prompt with no prior indication of what is being done. I don't think that's acceptable.
Yikes, that almost triggered my ChatGPT detector.
datalad pushcurrently does not usegit-annex pushand it would be good it it could in order to avoid some surprising behavior with its current implementation.But, it parses the
git pushoutput to display its own progress messages. Sincegit-annex pushinterleaves that with whatever else it outputs, adapting to parsing it would be difficult.In order for it to use
git-annex push, it seems it would need --json-progress support, and either parsing of thegit pushin git-annex that feeds through to the --json-progress, or some form of machine readable delimiters in stdout and stderr around thegit pushoutput.git pullandgit pushover ssh prompt for the password (to /dev/tty) before outputing anything else. So I suppose it is acceptable.git pulloutputs its progress to stderr. So --json could leave that alone and a program wanting to parse it just consume stderr. Delimiters could be added to stderr around thegit pull(with a separate option) to make it easier for a program to find and parse it.git pullalso outputs some things to stdout. In particular, that includes thegit mergeoutput when the merge is successful. It seems to me that could be put in the json object, eg:While that will buffer it until the pull is complete. That seems ok; it's displayed by
git pullafter the usually more expensive network operation, so buffering it briefly wouldn't be too noticable if a json consumer chooses to show it to the user.Note that
git-annex pullwill pull from the remote a second time after transferring content to/from it. So the json will have 2 "command":"pull" records. And stderr may contain 2 delimitedgit pullstderrs. The --json consumer may find that surprising, and it doesn't always happen, which gets back to the original problem of the --json not being discoverable.datalad pushwants to use the samegit pushoperations asgit-annex pushdoes, which is nontrivial to reimplement, especially in its handling of the git-annex branch. See the long comment on pushBranch explaining the order of operations.This is one place where
git-annex pushcan't be emulated using othergit-annexcommands that do support --json.But,
git-annex push --no-contentdoesn't do much besides run pushBranch. Sodatalad pushcould use it when run in a git-annex repository. There's no need for it to support --json either, the regulargit pushoutput goes to stderr, so it can parse thegit pushprogress out of stderr as before.It may want to pass
--quietto avoid the usual git-annex output to stdout. AFAICS,git pushdoes not itself output to stdout.The only other thing that command does besides
pushBranchisupdateBranches, which updates view branches and adjusted branches when run in one.FWIW, I've split updateBranches between pull and push now.
On
git-annex pushall it does is propagate adjusted branches changes back to the original branch.On
git-annex pullit handles updating the view branch and/or propagating changes from the original branch to the adjusted branch.Also,
git-annex pushwas fixed to not merge synced/master into master and to not update the adjusted branch when the original branch has changed.I have done some work adjacent to this todo, implmenting a
--wantedoption andgit-annex putcommand.Now, if someone wants the equivilant of
git-annex pull --json $someremote, they can run:The
git-annex pullabove does not have json output, but outputs the usualgit pullmessages for the user to deal with as they see fit.And, if someone wants the equivilant of
git-annex push --json $someremote, they can run:The
git-annex pushabove does not have json output, but outputs the usualgit pushmessages for the user to deal with as they see fit.Similarly, the equivilant of
git-annex pull --jsonwith no remote specified:And, the equivilant of
git-annex pushwithout a remote specified:So, the argument for adding --json to pull/push now seems to be reduced. Here are all the arguments I can think of for still doing that:
These command sequences won't behave completely identically to pull/push in all configurations, eg they don't look at
remote.<name>.annex-pullandremote.<name>.annex-pushconfigs.A single
git-annex pushorpull(orsync) does less work than several git-annex commands. In the command sequences above, git-annex has to traverse the tree twice. That is a pretty small difference in overhead though most of the time.