todo/CHECKPRESENT-MULTIgit-annexhttp://git-annex.branchable.com/todo/CHECKPRESENT-MULTI/git-annexikiwiki2020-09-29T08:36:18Zbatch presence checkinghttp://git-annex.branchable.com/todo/CHECKPRESENT-MULTI/comment_1_181a908a9f77a4c51d1d09f1aff526b7/Ilya_Shlyakhter2020-06-17T01:18:32Z2020-02-27T20:33:44Z
Beyond individual large files, maybe this could speed up checking that a whole directory is present in an export remote, by using a command like <a href="https://cloud.google.com/storage/docs/gsutil/commands/ls"><code>gsutil ls</code></a> to list the present files, instead of calling <a href="https://cloud.google.com/storage/docs/gsutil/commands/stat"><code>gsutil stat</code></a> on each file individually.
comment 2http://git-annex.branchable.com/todo/CHECKPRESENT-MULTI/comment_2_87857aeaf45927846cde8cea70f9e6f4/joey2020-06-17T01:18:32Z2020-02-28T18:58:48Z
<p>Export remotes don't use chunks, and use CHECKPRESENTEXPORT rather than
CHECKPRESENT. And git-annex tries to not buffer the whole worktree in
memory, but stream through it, so it can support very large worktrees.
So this idea, which I do think is a good idea, seems limited to checking
chunks.</p>
<p>(I'd probably want to make the chunk handling code only include up to 1
million or so chunk keys in a request, again to avoid using too much memory.
1 million chunk keys needs 160 mb ram max, 80 or less typically.)</p>
<p>At least for purposes of chunks, the reply to CHECKPRESENT-MULTI only needs
to say if all the keys are present. If even one chunk is missing, the object
as a whole is not present in the remote. That seems like a useful simplication.</p>
<p>Internally, probably Remote.checkPresent should change to taking a <code>[Key]</code>
list. Simpler than adding a whole other method for this.</p>
<p>Remote.External could use CHECKPRESENT when there's one key in the list,
and CHECKPRESENT-MULTI when the are multiple, falling back to CHECKPRESENT
on an UNSUPPORTED-REQUEST reply. But, I think it ought to be an
extension to the protocol, to avoid that extra roundtrip.</p>
comment 4http://git-annex.branchable.com/todo/CHECKPRESENT-MULTI/comment_4_5228016c35ca0b13545b82bbd3b9455e/lykos2020-09-29T08:36:18Z2020-09-29T08:36:18Z
<p>I agree about the simplification. However, when resuming an upload with, say, 400 chunks where only 10 are missing, after CHECKPRESENT-MULTI-FAILURE, we'd need to CHECKPRESENT another 390 keys until we can continue. Sure, the remote could cache the replies, but another idea would be for the remote to reply with the last key in the list that is present.</p>
<p>Example:</p>
<pre><code>$ CHECKPRESENT-MULTI a b c d e # git annex calls CHECKPRESENT-MULTI with an ordered list
CHECKPRESENT-MULTI-SUCCESS # all keys are present
CHECKPRESENT-MULTI-FAILURE # all keys are missing
CHECKPRESENT-MULTI-FAILURE c # Everything up to c is present, d is missing. e could be present or missing
</code></pre>