Recent comments posted to this site:

comment 5
FWIW -- never returned, doing on another box -- here is a complete command : git clone https://github.com/dandisets/000026.git && cd 000026 && git annex init && git annex find --not --in here --fast | head -n 1 -- stalled as well. I wonder if replicates for you. Also there with 2>/dev/null returned right away, but on rerun without - stalled again.
Comment by yarikoptic
comment 3

I'm curious if you also ran git-annex with FD 2 closed?

how do I discover that ? it worked fine if I redirect stderr:

$> git annex find --in here 2>/dev/null | head -n 1
derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-XFMdico7Tad2grpbold7Tad_run-01_bold.mat

$> echo $?
0

and damn it -- after that it started to work in that shell:

$> git annex find --in here | head -n 1   
derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-XFMdico7Tad2grpbold7Tad_run-01_bold.mat
git-annex: <stdout>: hFlush: resource vanished (Broken pipe)

:-/ I think it relates to dataset size as if I go to other small datasets in that shell now -- it returns fine as expected, but in huge /mnt/btrfs/datasets/datalad/crawl/dandi/dandisets/000026 -- still running (likely more stuff spit out to stdout), even with

(git)smaug:/mnt/btrfs/datasets/datalad/crawl/dandi/dandisets/000026[draft]git
$> git annex find --not --in here --fast 2>/dev/null | head -n 1
derivatives/Depth_normalized_OCT_volume/I38/sub-I38_ses-OCT_sample-BrocaAreaS01_depth-normed_OCT.ome.tiff

I will keep it going to see if ever returns, will check in an hour. FWIW -- that dataset is here

Comment by yarikoptic
comment 3

Turns out that a git-annex that does not use sanitizeTopLevelExceptionMessages will not have this behavior.

Without that, git-annex find | head -n1 does not display the broken pipe exception, and so the crash loop never happens when stderr is closed.

That is relatively new; it was added in git-annex 10.20230407.

And apparently what ghc's default exception display mechanism does is actually to avoid displaying anything for the broken pipe exception for stdout. Which leads to the more typical unix behavior of silently stopping on SIGPIPE.

See also https://mail.haskell.org/pipermail/haskell-cafe/2023-May/136194.html

Comment by joey
comment 2

I was able to reproduce this, but only when I ran git-annex with FD 2 closed.

perl -e 'close(STDERR); system("git-annex find --not --in here --fast | head -n 1")'

I'm curious if you also ran git-annex with FD 2 closed?

I suspect what must be happening is that the exception handler crashes when outputting to stderr, which causes an exception to be thrown, leading to a crash loop.

If so, it would seem likely to be a bug in ghc/base. And I'd not be surprised to find such a bug somewhere in its bug tracker. In fact, I almost remember finding this same behavior before, which may have helped me guess this due was FD 2 being closed.

With that said, I've not been able to reproduce the behavior yet with a simpler haskell program like this one:

import System.IO

main = do
    print "foo"
    hFlush stdout
    print "bar"
    hFlush stdout
    main

But, that simple program also doesn't throw "hFlush: resource vanished (Broken pipe)" when piped to head -n1, so it's not quite replicating what git-annex does.

Comment by joey
comment 1

It's arguably not git-annex's fault if it was pointed to an API endpoint that behaves enough like S3 to make it seems like it's stored data, but does not store the data.

With that said, for there to be data loss here, the file would need to be dropped from the local repository, relying on the copy "stored" in S3. So the S3 special remote's checkPresent could be improved to prevent such a bad endpoint from being treated as containing the content of an object.

For S3, checkPresent does pass in the VersionID when git-annex knows one. (Which that doesn't help if the API endpoint ignores that header.) What it does not do is check the response from S3 for a VersionID or ETag. Improving that seems like a possible way to avoid this kind of problem.

It does check the ETag when the S3 remote is configured with exporttree=true.

As for the idea of checking the annex-uuid write by reading the file back, the difficulty with that is S3 DEEP_ARCHIVE and similar can have an hours-long delay to get back out a file that is already stored in the bucket. Also, the annex-uuid file is not used for exporttree=yes remotes.

Comment by joey
comment 2
FWIW, I think this might be useful for openneuro as I think we run into cases (e.g. ds000113) where we have records on some elderly keys (such as SHA1--771e0eea7ceb32216a5a06c89c50d1f02bc79d6d) for which I think we no longer have any commit in the history pointing to them or even if we do -- we have no such key exported in the bucket (I think).
Comment by yarikoptic
comment 12
git-annex disableremote is completed and I hope to wrap this up soon.
Comment by joey
others stall too -- workaround <()

well -- head also stalls

smaug:/tmp/ds000113
$> git annex list | head
here
|origin
||s3-PRIVATE
|||s3-PUBLIC
||||web
|||||bittorrent
||||||
___X__ derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-XFMdico7Tad2grpbold7Tad_run-01_bold.mat
___X__ derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-XFMdico7Tad2grpbold7Tad_run-02_bold.mat
___X__ derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-XFMdico7Tad2grpbold7Tad_run-03_bold.mat

workaround

*$> head -n1 <(git-annex find --not --in here --fast)
derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-XFMdico7Tad2grpbold7Tad_run-01_bold.mat
Comment by yarikoptic