todo/api for telling when nonexistant or non git files passedgit-annexhttp://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/git-annexikiwiki2023-04-26T16:53:44Zcomment 1http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_1_672457817d5dca898f5aefbe97da9b23/yarikoptic2022-02-28T18:48:51Z2022-02-28T18:48:50Z
<blockquote><p>This does risk breaking things that parse the existing JSON and fall over on the new record, but I think git-annex should be free to add new records and fields to its JSON output in general, and it has probably at least added new fields before.</p></blockquote>
<p>Although sounds logical, FWIW, not sure if DataLad would be robust to such an assumption since I don't think it was exercised before and indeed all returned records were typically associated with some particular <code>"path"</code> so our code might rely on that. Also it makes it a bit unclear on what datalad code should do with "unrecognized" outputs -- are they sign of a problem, or could be safely ignored?</p>
<p>Ideally, IMHO and IIRC, DataLad should not concern itself with exit codes of <code>git</code> commands which <code>git-annex</code> ran -- it should be up to <code>git-annex</code> to decide on what any particular underlying <code>git</code> invocation supposed to do etc. I guess only in the cases of some faulty, unknown to git-annex how to handle, behavior where git-annex itself reports an error, might be worth including details on underlying failed <code>git</code> invocation in the output, so datalad also could channel it to the user.</p>
comment 2http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_2_b6a76bfc063dcba439b2ead004caabc9/joey2022-02-28T19:42:43Z2022-02-28T19:40:13Z
<p>But datalad is apparently concerning itself with just that, since
there was a bug when git-annex's "not found" stderr changed.</p>
<p>I do think this needs to have a good motivation before it's implemented,
and that's currently all I have, that datalad for some reason cares about
it. Maybe there is not a good reason for datalad to care though..</p>
<p>You may well be right about adding new JSON objects breaking parsers.</p>
comment 3http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_3_3f0872f13160a77a82bd5a95fc3f397d/joey2023-04-25T22:46:17Z2023-04-25T16:57:58Z
<p>I see that datalad parses the error messages to get the filename that
caused it, and so an exit status is not enough information.</p>
<p>(See <code>./datalad/support/annex_utils.py</code>
<code>_get_non_existing_from_annex_output</code>)</p>
<p>So the additional json would need to include the filename that didn't
exist.</p>
<p><a href="https://git-annex.branchable.com/projects/datalad/bugs-done/copy_does_not_reflect_some_failed_copies_in_--json_output/">this closed todo</a>
argues persuasively that --json-error-messages cannot be extended to
include these errors as-is. In particular, putting them in the current
json would mean outputting a record without a "key" field in situations
where that field is always present, so json consumers might
misbehave without it.</p>
<p>bpoldrack makes a good point
<a href="https://github.com/datalad/datalad/pull/6510#issuecomment-1057320621">here</a>:</p>
<blockquote><p>Ideally, there'd be some indication of the problem that isn't meant to
be a message to the user and not (so easily) subject to change - pretty
much something that's akin to an exception name. That's because there
are possible errors that we can deal with from within datalad if we can
"understand" the problem - not everything needs to be passed onto the
user.</p></blockquote>
<p>So eg:</p>
<pre><code>{"exception":"UNIQUEID", "file":"foo", "error-messages":["foo not found"]}
</code></pre>
<p>That seems about right to me, and it future proofs git-annex to be able to
report other exceptions in json output later on.</p>
comment 4http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_4_0217c8acd2a9eb3bba1fc497856cd96a/joey2023-04-25T18:30:33Z2023-04-25T17:49:23Z
<p>What should enable that new json output for exceptions,
--json-error-messages or a new option like --json-exceptions?</p>
<p>The risk is that, since this is a new object type, it breaks a json
consumer that expects to find a <code>{"command":...}</code> object.</p>
<p>I do think that git-annex should be free to add new fields to existing json
objects. But adding a new object type seems considerably more risky.
git-annex has never done that before (eg --json-progress outputs the
"command" object with updated values).</p>
<p>I took a look at datalad's parsing of json from git-annex, to see what it might
do if it got a new object type unexpectedly. I'm not convinced it wouldn't
misbehave. For example, in <code>datalad/interface/results.py</code> <code>annexjson2result</code>:</p>
<pre><code>res['status'] = 'ok' if d.get('success', False) is True else 'error'
</code></pre>
<p>Without a "status" field this will yield "error" which could go on to
cause unexpected behavior</p>
<p>So, I think this would need to be a new --json-exceptions option.</p>
comment 5http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_5_7501eed2e6cb732966277ef2f1ff28e9/yarikoptic2023-04-25T18:31:20Z2023-04-25T18:31:20Z
<p>Yes -- having some UNIQUEIDs for errors might be quite great!</p>
<p>As for specific field to identify what that error applicable to: the other changed behavior which broke our code/tests was the change to behavior of <code>annex info</code> to no longer list unknown to annex path in <code>"file"</code> and rather list it in <code>"input"</code> for which the <a href="https://github.com/datalad/datalad/pull/7372/files">PR</a> was cooked up already. So may be here it also just should be <code>"input"</code> field if <code>annex</code> desires to not make claims on either it is a <code>"file"</code> or may be some "pathspec" (well -- in DataLad nobody was brave enough to open that -- path-vs-pathspec -- pandora box <a href="https://github.com/datalad/datalad/issues/6933">see issues/6933</a>)?</p>
comment 6http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_6_7a5b3d2694a48e4e44436f6bf7ed6642/yarikoptic2023-04-25T18:45:58Z2023-04-25T18:45:58Z
<p>Just for completeness -- do we really need a new type of record for this? In immediate use case which is <a href="https://git-annex.branchable.com/bugs/started_to_escape_characters_in_the_output/">in the heart of the path escaping change discussion</a> let's consider other commands. E.g. <code>addurl</code> already reports <code>--json-errors</code> just fine in a similar case</p>
<pre><code class="shell">❯ git annex addurl --json --json-error-messages '\e[31mfo\o\e[0m'
{"command":"addurl","error-messages":["git-annex: bad url \\e[31mfo\\o\\e[0m"],"file":null,"input":["\\e[31mfo\\o\\e[0m"],"success":false}
addurl: 1 failed
</code></pre>
<p>so why just not to make <code>add</code>'s</p>
<pre><code class="shell">❯ git annex add --json --json-error-messages '\e[31mfo\o\e[0m'
git-annex: \e[31mfo\o\e[0m not found
add: 1 failed
</code></pre>
<p>into</p>
<pre><code class="shell">{"command":"add","error-messages":["git-annex: \\e[31mfo\\o\\e[0m not found"],"file":null,"input":["\\e[31mfo\\o\\e[0m"],"success":false}
add: 1 failed
</code></pre>
<p>or if you want to expand with unambigous ID:</p>
<pre><code>{"command":"add","error-messages":["git-annex: \\e[31mfo\\o\\e[0m not found"],"file":null,"input":["\\e[31mfo\\o\\e[0m"],"success":false, "errorid": "NOT_FOUND"}
add: 1 failed
</code></pre>
<p>or alike without requiring some new CLI option?</p>
Re: comment 6http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_8_f34293aa30b8b9abc71fac4e713aded3/joey2023-04-25T22:46:17Z2023-04-25T21:10:21Z
<p>Oh well spotted yarikoptic! I wish I had noticed your comment 2 hours ago,
but I was head down implementing --json-exceptions.</p>
<p>Ok, so <code>addurl</code> does <code>giveup "bad url"</code> and that does indeed result in json
output that lacks a <code>key</code> and has <code>"file":null</code> as well.</p>
<p>I'm sure that somewhere in <code>git-annex add</code>, it's possible for it to
<code>giveup</code> with an error too. Oh of course... a device file!</p>
<pre><code>joey@darkstar:/home/joey/tmp/xxx>git-annex add --force-small null --json --json-error-messages
{"command":"add","error-messages":["git-annex: null is not a regular file"],"file":"null","input":["null"],"note":"adding content to git repository","success":false}
add: 1 failed
</code></pre>
<p>Other perhaps more likely cases where add can <code>giveup</code> include when it's unable
to remove all write perms due to an xattr, and probably some permissions
problems.</p>
<p>So, json consumers of add already have to deal with the <code>key</code> being missing.</p>
<p>Now, it might be that some git-annex commands don't <code>giveup</code>, so this would be
a new complication for consumers of their json. But so would adding a <code>giveup</code>
for any reason to a command, and I don't worry about that.</p>
<p>While I've implemented --json-exceptions, I don't like the complexity, so this new
information inclines me to rip it back out and instead handle the case of the
nonexistant file like:</p>
<pre><code>{"command":"add","exception":"UNIQUEID","file":"foo","input":["foo"],"error-messages":["foo not found"],"success":false}
</code></pre>
comment 8http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_8_2550e1760dcfb90c9c2ca1ee145adcf1/joey2023-04-25T23:26:29Z2023-04-25T22:45:24Z
<p>Ok, implemented the simple alternative. Here's how it looks:</p>
<pre><code>joey@darkstar:~/tmp/xxx>git-annex add 'dne' --json --json-error-messages
{"command":"add","error-messages":["git-annex: dne not found"],"errorid":"FileNotFound","file":"dne","input":["dne"],"success":false}
add: 1 failed
</code></pre>
<p>The errorid will remain stable. I can add those to other error messages
now, on request BTW.</p>
<p>Note that when git-annex relies on <code>git ls-files --error-unmatch</code> to
complain about nonexistant or non-git files, the error messages from
git will still be displayed to stderr, not this nice json. So
datalad will need to keep its parser for that part.</p>
comment 9http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_9_d7e879ae1e7ae5e27e15dab9d5b99f16/yarikoptic2023-04-26T16:38:26Z2023-04-26T16:38:26Z
quick/small one -- given that there is <code>"error-messages"</code>, would it may be make sense to name new one <code>"error-id"</code> and not <code>"errorid"</code> for consistency?
comment 10http://git-annex.branchable.com/todo/api_for_telling_when_nonexistant_or_non_git_files_passed/comment_10_32ebb1ea1278b891b58dd3c4a8546453/joey2023-04-26T16:53:44Z2023-04-26T16:42:24Z
<p>I decided to rename it "message-id" since it could go on warning or other
messages too conceivably.</p>