Recent comments posted to this site:

I agree about the simplification. However, when resuming an upload with, say, 400 chunks where only 10 are missing, after CHECKPRESENT-MULTI-FAILURE, we'd need to CHECKPRESENT another 390 keys until we can continue. Sure, the remote could cache the replies, but another idea would be for the remote to reply with the last key in the list that is present.

Example:

$ CHECKPRESENT-MULTI a b c d e  # git annex calls CHECKPRESENT-MULTI with an ordered list
CHECKPRESENT-MULTI-SUCCESS      # all keys are present
CHECKPRESENT-MULTI-FAILURE      # all keys are missing
CHECKPRESENT-MULTI-FAILURE c    # Everything up to c is present, d is missing. e could be present or missing
Comment by lykos Tue Sep 29 08:36:18 2020

Switching SourceTree to look at /usr/local/git/bin fixed the problem.

This can be CLOSED.

Comment by mark Mon Sep 28 21:40:21 2020
And BTW, we really value git-annex and use it for keeping all of our golden reference outputs for regression testing in place and available for others who need to run tests.
Comment by mark Mon Sep 28 21:10:56 2020

Sorry for being anal with my questions and not just trying it out... but by The async extension to the protocol guarantees only a single process will be run. which of the following scenarios do you mean

  • a. there will be no parallel workers if --jobs N is specified (thus a single process for an external remote)
  • b. there are parallel workers, and a single instance of an external special remote will be ran and either
    • b.1. only one worker would be able to talk to it
    • b.2. parallel workers will talk to the same async external special remote

or may be some other c. or b.3 ?

Comment by yarikoptic Mon Sep 28 19:44:05 2020

The async extension to the protocol guarantees only a single process will be run. The remote might be asked to start several operations concurrently, but if it wants to queue them sequentially, that should be fine.

While it might be a bit of a round about way to get this functionality, since the extension complicates the protocol a bit, I'm inclined to feel it's enough, and not add this other extension, at least without some more compelling use case.

Comment by joey Mon Sep 28 14:00:00 2020

Eh, true. Our version is year old as the one which doesn't cause regressions in datalad. We are still fighting with some remaining regressions. See eg one of the last attempts https://github.com/datalad/datalad/pull/4915 .

We do have unannounced on the NeuroDebian debian-devel/ (complement to regular debian/ line in your apt listing) from the main NeuroDebian website (not mirrors). That is where we upload those versions to test.

Also there is now a daily built git annex .deb (and .dmg for osx) package as artifact of GitHub workflow, eg https://github.com/datalad/datalad-extensions/actions/runs/274613689 . I guess we better automatically upload them somewhere for easier fetching/deployment.

Comment by yarikoptic Sun Sep 27 13:42:15 2020
The comment about NeuroDebian providing more up to date builds is out of date, their version is now more than a year old. An alternative would be welcome, something between the official Ubuntu package and compiling from source.
Comment by mhauru Sun Sep 27 12:40:54 2020
Works great, Thanks!
Comment by Lukey Fri Sep 25 16:33:40 2020

One side effect of this optimisation is that, while sync --all used to tell the filenames it was getting or dropping, when operating on files in the working tree, when the optimsation is enabled it will only display the keys. So, its behavior in 2 different repos might seem inconsistent to a user, who doesn't know about all these gory 2 pass details.

I think, if that became a problem, the best fix would be to only display the keys, and never the worktree filenames, even when running the first pass. But I'll wait and see if that needs to be done, I suppose.

Comment by joey Thu Sep 24 19:04:32 2020

Indeed -- I think having disableremote (to complement initremote and enableremote) would make sense. E.g. I failed to "disable" it via config. ```shell (git)lena:/tmp/ds000001[master] $> git config remote.datalad.annex-ignore true

$> time SCHEMES="datalad-archives" PATH=~datalad/trash/speedyannex2:$PATH git annex fsck --from datalad --fast --debug sub-01/anat/sub-01_inplaneT2.nii.gz [2020-09-24 14:45:48.868954] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"] [2020-09-24 14:45:48.876348] process done ExitSuccess [2020-09-24 14:45:48.877109] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"] [2020-09-24 14:45:48.892383] process done ExitSuccess [2020-09-24 14:45:48.893294] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..cdad4332bce8deff94e89ae35e6378ab1b87e4df","--pretty=%H","-n1"] [2020-09-24 14:45:48.899769] process done ExitSuccess [2020-09-24 14:45:48.901763] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"] [2020-09-24 14:45:48.928262] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"] git-annex: there is no available git remote named "datalad" SCHEMES="datalad-archives" PATH=~datalad/trash/speedyannex2:$PATH git annex 0.03s user 0.08s system 87% cpu 0.126 total

$> git annex version | head git-annex version: 8.20200309-g07fcace

```

Comment by yarikoptic Thu Sep 24 18:49:02 2020