projects/datalad/bugs-done/get -J8 resource exhaustedyohhttp://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/git-annexikiwiki2023-03-18T16:25:19Zcomment 1http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_1_5f9e9600d65c1270b479cc8910d507d0/joey2023-03-18T16:25:19Z2020-03-16T16:13:00Z
<p>The obvious question to ask, which I can't really imagine making any
progress without an answer to: What files did git-annex have open?</p>
<p>I did notice that of the two git-annex logs, one got 19 files before
failing, while the other got 27. It seems unlikely that, if git-annex, or
an external remote, or git, or whatever is somehow leaking file handles,
it would leak different numbers at different times. Which leads to the
second question: What else on the system has files open and how many?</p>
<p>OSX has a global limit of 12k open files, and a per process limit of 10k.</p>
<p><code>git-annex get</code> on linux needs to open around 16 files per file it
downloads. So if git-annex were somehow leaking every single open FD,
it would successfully download over 600 files before hitting the
per-process limit. If every subprocess git-annex forks also leaked every
open FD, it would of course vary by remote, but with a regular git clone
on the local filesystem, the number of files opened per get is still only
62, so still over an order of magnitude less.</p>
<p>Seems much more likely that the system is unhappy for some other reason.</p>
it is many more "open files" in realityhttp://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_2_54cf519a09cfaa0c85d60734daf58d72/yarikoptic2023-03-18T16:25:19Z2020-04-16T03:41:07Z
<p>Michael has reported <a href="https://github.com/datalad/datalad/issues/4404">a similar issue on Linux</a>. I was initially also "skeptical". But the reason really is that each git-annex process takes HUNDREDS of open files (dynamic libraries etc), and parallel execution of <code>get</code> adds a good number of pipes on top (counted ~3000 for <code>get -J 8</code> process). I thought to investigate more before reporting and then randomly ran into this not so old report from myself ;)</p>
<p>A quick demo:</p>
<div class="highlight-sh"><pre class="hl">$<span class="hl opt">></span> <span class="hl kwb">echo</span> BEFORE<span class="hl opt">;</span> lsof | <span class="hl kwc">grep</span> annex | <span class="hl kwc">nl</span> | <span class="hl kwc">tail</span> <span class="hl kwb">-n</span> <span class="hl num">2</span><span class="hl opt">;</span> git clone http<span class="hl opt">://</span>datasets.datalad.org<span class="hl opt">/</span>allen-brain-observatory<span class="hl opt">/</span>visual-coding-neuropixels<span class="hl opt">/</span>ecephys-cache<span class="hl opt">/</span>.git <span class="hl opt">&&</span> <span class="hl kwb">cd</span> ecephys-cache <span class="hl opt">&&</span> git annex get <span class="hl kwb">-J5</span> <span class="hl opt">* >/</span>dev<span class="hl opt">/</span>null <span class="hl opt">&</span> p<span class="hl opt">=</span>$<span class="hl opt">! &&</span> <span class="hl kwc">sleep</span> <span class="hl num">3</span> <span class="hl opt">&&</span> <span class="hl kwb">echo</span> DURING <span class="hl opt">&&</span> lsof | <span class="hl kwc">grep</span> annex | <span class="hl kwc">nl</span> | <span class="hl kwc">tail</span> <span class="hl kwb">-n</span> <span class="hl num">2</span><span class="hl opt">;</span> <span class="hl kwb">kill</span> <span class="hl opt">%</span><span class="hl num">1</span>
BEFORE
Cloning into <span class="hl str">'ecephys-cache'</span>...
remote<span class="hl opt">:</span> Counting objects<span class="hl opt">:</span> <span class="hl num">5875</span><span class="hl opt">,</span> <span class="hl kwa">done</span>.
remote<span class="hl opt">:</span> Compressing objects<span class="hl opt">:</span> <span class="hl num">100</span><span class="hl opt">% (</span><span class="hl num">3046</span><span class="hl opt">/</span><span class="hl num">3046</span><span class="hl opt">),</span> <span class="hl kwa">done</span>.
remote<span class="hl opt">:</span> Total <span class="hl num">5875</span> <span class="hl opt">(</span>delta <span class="hl num">2335</span><span class="hl opt">),</span> reused <span class="hl num">4599</span> <span class="hl opt">(</span>delta <span class="hl num">1424</span><span class="hl opt">)</span>
Receiving objects<span class="hl opt">:</span> <span class="hl num">100</span><span class="hl opt">% (</span><span class="hl num">5875</span><span class="hl opt">/</span><span class="hl num">5875</span><span class="hl opt">),</span> <span class="hl num">73.55</span> MiB | <span class="hl num">30.39</span> MiB<span class="hl opt">/</span>s<span class="hl opt">,</span> <span class="hl kwa">done</span>.
Resolving deltas<span class="hl opt">:</span> <span class="hl num">100</span><span class="hl opt">% (</span><span class="hl num">2335</span><span class="hl opt">/</span><span class="hl num">2335</span><span class="hl opt">),</span> <span class="hl kwa">done</span>.
Checking out files<span class="hl opt">:</span> <span class="hl num">100</span><span class="hl opt">% (</span><span class="hl num">573</span><span class="hl opt">/</span><span class="hl num">573</span><span class="hl opt">),</span> <span class="hl kwa">done</span>.
<span class="hl opt">[</span><span class="hl num">1</span><span class="hl opt">]</span> <span class="hl num">17335</span>
DURING
<span class="hl num">2242</span> git <span class="hl num">17424</span> yoh <span class="hl num">67</span>w REG <span class="hl num">9</span><span class="hl opt">,</span><span class="hl num">1 40018173 131020</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>ecephys-cache<span class="hl opt">/</span>.git<span class="hl opt">/</span>annex<span class="hl opt">/</span>tmp<span class="hl opt">/</span>SHA512E-s665395296--8327d0715923b88a2b6b179d02a40acb1630e420a73a16a3422b6b245e9c0e57e21529919136492ab2c746256f99831200c36b7e071ea24f25abb37efc28de13.h5
<span class="hl num">2243</span> git <span class="hl num">17424</span> yoh <span class="hl num">68</span>w REG <span class="hl num">9</span><span class="hl opt">,</span><span class="hl num">1 67095741 131021</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>ecephys-cache<span class="hl opt">/</span>.git<span class="hl opt">/</span>annex<span class="hl opt">/</span>tmp<span class="hl opt">/</span>SHA512E-s166348896--3bb739a0df1acd478eb84545a7c22c31933458fcd44ce211d4dd555bc979170bef11126064fed730e3b289d41999cc1c6fb0b6c35870bb996a4faa2e34a75403.h5
</pre></div>
<p>so with <code>-J5</code> 3 seconds after initial call with <code>-J5</code> I get over 2k open files used by annex (according to grep, may be some managed to escape matching).</p>
comment 3http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_3_132ea480f90af577e3cad67f2f01c73d/yarikoptic2023-03-18T16:25:19Z2020-04-16T03:43:13Z
Sorry, forgot to mention, I do not think I had spotted any file descriptor leaking. The number of used up file descriptors (according to <code>lsof</code>) was fluctuating as process kept going but was not really steadily growing
comment 4http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_4_06e99c29eaa770ab96ea5fd832ee04b8/joey2023-03-18T16:25:19Z2020-04-17T16:37:07Z
<p>I'm seeing a lot of git cat-file processes, not a lot of any other process.</p>
<p>Each -J increment adds 3 threads for the different command stages
(start, perform, cleanup). Each thread might need a git cat-file
run with either of two different parameters, and on either of two different
index files. (Both are needed for unlocked files, only one for locked
files.)</p>
<p>So, 5x3x2x2=60 copies of git cat-file max for -J5.
And experimentally, that's exactly how many I see in the worst case
repo where all files are unlocked. (Plus 4 which I think are owned by
the controlling thread or something). Using your test case, I am seeing 44.
So I don't think there's a subprocess leak here.</p>
<p>IIUC, what you show is lsof of things open by git-annex and any git processes
that happen to open a file with "annex" in its name, being around 3000.</p>
<p>Now, lsof is for one thing showing a file that two different threads have
open, as being opened twice.</p>
<pre><code>git-annex 1459862 1459863 ghc_ticke joey mem REG 8,1 169720 9175285 /lib/x86_64-linux-gnu/ld-2.30.so
git-annex 1459862 1459873 git-annex joey mem REG 8,1 169720 9175285 /lib/x86_64-linux-gnu/ld-2.30.so
</code></pre>
<p>That is different threads of the same process, that has certianly not
opened ld.so repeatedly.</p>
<p>So, you should be using <code>lsof -Ki</code> or something. With that, I see around
1019 files open, between git-annex and git. git-annex by itself has only
246.</p>
<p>(Interestingly, the majority of those seem to be sqlite. I'm unsure
why sqlite is opening the same database 30 times. A single thread often
has the same database opened repeatedly. Might be that the sqlite database
layer has a too large connection pool. There are also a lot of FIFO's,
which I think also belong to sqlite, unless they're something internal to
the ghc runtime.)</p>
<p>Looking at Michael's bug report, looks like they were running with -J8.
I don't see that exceeding the default ulimit of 1024. If they were really
running at -J32, it would. It's not clear to me either how datalad's --jobs
interacts with git-annex's -J, does it pass through or do you run multiple
git-annex processes? People in that bug report are referring to multiple
git-annex processes, which git-annex -J does not result in.</p>
<p>All these -J5 etc values seem a bit high. I doubt that more than -J2
makes a lot of sense given the command stages optimisation, that makes
it use 6 threads and balance the work better than it used to. Only
time it really would if if you're getting from several different
remotes that each bottleneck on a different resource.</p>
quick follow uphttp://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_5_0e7b492da14e067c34693b7be02e6864/yarikoptic2023-03-18T16:25:19Z2020-04-17T20:34:55Z
<blockquote><p> It's not clear to me either how datalad's --jobs interacts with git-annex's -J, does it pass through or do you run multiple git-annex processes?</p></blockquote>
<p>ATM we just run a single <code>annex get</code> with <code>-J</code> option ATM (FWIW -- in <code>--batch</code> mode IIRC). Things might change in the future to balance across different submodules.</p>
<blockquote><p>All these -J5 etc values seem a bit high. I doubt that more than -J2 makes a lot of sense given the command stages optimisation, that makes it use 6 threads and balance the work better than it used to.</p></blockquote>
<p>I could do some timing later on, but I did see benefits as I could not go over 40-60MBps in a single download process (e.g. from S3) but parallel ones (even as high as 8 or 10) could easily carry that throughput in parallel, thus scaling up quite nicely. If interested -- you could experiment on smaug to which you have access to possibly observe similar effects.</p>
comment 5http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_5_db1ae584fc803c0dbb48c7e31a540c4e/joey2023-03-18T16:25:19Z2020-04-17T21:09:41Z
<p>The sqlite open files is a red herring: That happened only when
using a remote in a local directory. Anyway, I've fixed that.</p>
<p>The open files I'm seeing now in my artifical
test case (two local repos with 1000 unlocked files, git-annex get between them, lsof
-Ki run after that's moved 500 files, while the git-annex process is suspended):</p>
<pre><code>no -J 48
-J2 104
-J5 185
-J32 964
</code></pre>
<p>Which seems fine, 28 file handles per -J increment.</p>
<p>If you have something worse than that, show me the lsof.</p>
comment 7http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_7_f584dc15789b371569e1925c4ee5ae36/yarikoptic2023-03-18T16:25:19Z2020-04-18T02:05:52Z
<blockquote><p> 5x3x2x2=60 copies of git cat-file max for -J5.</p></blockquote>
<p>So there is now up to 60 <code>git</code> processes, where each one has about 20 open files, totaling up to 1200 open files... so we are getting into thousands
<details>
<summary>
In my current attempt on the laptop, here is a <code>pstree</code> with counts per each process and total at the bottom - 883 open files for -J5 invocation, with each <code>git cat-file</code> taking between 14 and 29:
</summary></p>
<pre><code class="shell">$> total=0; pstree -l -a --compact-not -T -p `pgrep datalad` | sed -e 's,--library-path /[^ ]*,,g' -e 's,/usr/lib/git-annex.linux/shimmed/git/,,g' -e 's,--git-dir=.git --work-tree=. --literal-pathspecs -c annex.dotfiles=true,,g' | nl | while read l; do pid=$(echo "$l" | sed -e 's/.*,\([0-9][0-9]*\).*/\1/g'); of=$(lsof -Ki -p $pid 2>/dev/null|grep -v COMMAND | wc -l); echo "$l" | sed -e "s/,$pid/,$pid = $of/g"; total=$(($total + $of)) ; done; echo "Total: $total open files across all processes"
1 datalad,2807826 = 54 /home/yoh/proj/datalad/datalad-maint/venvs/dev3/bin/datalad -l debug install -J 5 -g ///labs/haxby/raiders
2 `-git-annex,2808614 = 149 /usr/lib/git-annex.linux/shimmed/git-annex/git-annex get -c annex.dotfiles=true --json --json-error-messages --json-progress -J5 -- .
3 |-git,2808649 = 16 git cat-file --batch
4 |-git,2808650 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
5 |-(git,2808653 = 0)
6 |-git,2808654 = 16 git cat-file --batch
7 |-git,2808655 = 16 git cat-file --batch
8 |-git,2808656 = 16 git cat-file --batch
9 |-git,2808657 = 16 git cat-file --batch
10 |-git,2808658 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
11 |-git,2808659 = 16 git cat-file --batch
12 |-git,2808660 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
13 |-git,2808661 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
14 |-git,2808662 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
15 |-git,2808663 = 16 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
16 |-git,2808669 = 16 git cat-file --batch
17 |-git,2808670 = 16 git cat-file --batch
18 |-git,2808671 = 16 git cat-file --batch
19 |-git,2808672 = 16 git cat-file --batch
20 |-git,2808673 = 16 git cat-file --batch
21 |-git,2808674 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
22 |-git,2808675 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
23 |-git,2808676 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
24 |-git,2808677 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
25 |-git,2808678 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
26 |-git,2808679 = 17 git cat-file --batch
27 |-git,2808680 = 14 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
28 |-git,2808682 = 25 git cat-file --batch
29 |-git,2808683 = 23 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
30 |-git,2808685 = 26 git cat-file --batch
31 |-git,2808686 = 24 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
32 |-git,2808688 = 27 git cat-file --batch
33 |-git,2808689 = 24 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
34 |-git,2808690 = 28 git cat-file --batch
35 |-git,2808691 = 26 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
36 |-git,2808693 = 29 git cat-file --batch
37 |-git,2808694 = 27 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
38 |-git,2809036 = 26 git cat-file --batch
39 `-git,2809037 = 24 git cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
Total: 883 open files across all processes
</code></pre>
<p></details></p>
<p><details>
<summary>
looking at the one with 29 open files:
</summary></p>
<pre><code class="shell">$> lsof -Ki -p 2808691
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
git 2808691 yoh cwd DIR 259,5 4096 16654129 /tmp/raiders
git 2808691 yoh rtd DIR 259,5 4096 2 /
git 2808691 yoh txt REG 259,5 165632 8395234 /usr/lib/git-annex.linux/lib64/ld-linux-x86-64.so.2
git 2808691 yoh mem REG 259,5 337024 11806232 /usr/lib/locale/aa_DJ.utf8/LC_CTYPE
git 2808691 yoh mem REG 259,5 3284 11807992 /usr/lib/locale/en_US.utf8/LC_TIME
git 2808691 yoh mem REG 259,5 1824496 8394627 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libc.so.6
git 2808691 yoh mem REG 259,5 35808 8395214 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/librt.so.1
git 2808691 yoh mem REG 259,5 114128 8395210 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libpthread.so.0
git 2808691 yoh mem REG 259,5 121280 8395232 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libz.so.1
git 2808691 yoh mem REG 259,5 539304 8395937 /usr/lib/git-annex.linux/usr/lib/x86_64-linux-gnu/libpcre2-8.so.0
git 2808691 yoh mem REG 259,5 3008120 8395330 /usr/lib/git-annex.linux/shimmed/git/git
git 2808691 yoh 0r FIFO 0,13 0t0 99400741 pipe
git 2808691 yoh 1w FIFO 0,13 0t0 99400742 pipe
git 2808691 yoh 2w REG 0,48 0 35023711 /home/yoh/.tmp/datalad_temp__runneroutput__mq55kiau
git 2808691 yoh 77u IPv4 99400731 0t0 TCP lena:38384->falkor.dartmouth.edu:http (CLOSE_WAIT)
git 2808691 yoh 82u IPv4 99392984 0t0 TCP lena:38386->falkor.dartmouth.edu:http (CLOSE_WAIT)
git 2808691 yoh 83u IPv4 99392024 0t0 TCP lena:38380->falkor.dartmouth.edu:http (ESTABLISHED)
git 2808691 yoh 84u IPv4 99400730 0t0 TCP lena:38382->falkor.dartmouth.edu:http (ESTABLISHED)
git 2808691 yoh 85u IPv4 99377953 0t0 TCP lena:38388->falkor.dartmouth.edu:http (CLOSE_WAIT)
git 2808691 yoh 86w REG 259,5 68071663 16654712 /tmp/raiders/.git/annex/tmp/MD5E-s6658782008--8def61aac5f6742194027447390405ff.hdf5.gz
git 2808691 yoh 87w REG 259,5 53047218 16654713 /tmp/raiders/.git/annex/tmp/MD5E-s121517414--f83afa4a5dff04b5a1467afd30e74632.nii.gz
git 2808691 yoh 89w REG 259,5 23378713 16654715 /tmp/raiders/.git/annex/tmp/MD5E-s23378713--7a238410b5c496a29e7967da21332c03.nii.gz
git 2808691 yoh 94u IPv4 99398310 0t0 TCP lena:38390->falkor.dartmouth.edu:http (CLOSE_WAIT)
git 2808691 yoh 102u IPv4 99394953 0t0 TCP lena:38400->falkor.dartmouth.edu:http (CLOSE_WAIT)
git 2808691 yoh 106u IPv4 99399711 0t0 TCP lena:38402->falkor.dartmouth.edu:http (CLOSE_WAIT)
git 2808691 yoh 107w REG 259,5 545747 16654726 /tmp/raiders/.git/annex/objects/2X/Kz/MD5E-s545747--e2c2cd58ad55da46bf778d223e01389e.nii.gz/MD5E-s545747--e2c2cd58ad55da46bf778d223e01389e.nii.gz
</code></pre>
<p></details></p>
<p><details>
<summary>and only 16</summary></p>
<pre><code class="shell">$> lsof -Ki -p 2808669
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
git 2808669 yoh cwd DIR 259,5 4096 16654129 /tmp/raiders
git 2808669 yoh rtd DIR 259,5 4096 2 /
git 2808669 yoh txt REG 259,5 165632 8395234 /usr/lib/git-annex.linux/lib64/ld-linux-x86-64.so.2
git 2808669 yoh mem REG 259,5 1322694 16654180 /tmp/raiders/.git/objects/pack/pack-3ed86065dacf772445fec4258d6e60ebe21baf77.pack
git 2808669 yoh mem REG 259,5 505212 16654181 /tmp/raiders/.git/objects/pack/pack-3ed86065dacf772445fec4258d6e60ebe21baf77.idx
git 2808669 yoh mem REG 259,5 337024 11806232 /usr/lib/locale/aa_DJ.utf8/LC_CTYPE
git 2808669 yoh mem REG 259,5 3284 11807992 /usr/lib/locale/en_US.utf8/LC_TIME
git 2808669 yoh mem REG 259,5 1824496 8394627 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libc.so.6
git 2808669 yoh mem REG 259,5 35808 8395214 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/librt.so.1
git 2808669 yoh mem REG 259,5 114128 8395210 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libpthread.so.0
git 2808669 yoh mem REG 259,5 121280 8395232 /usr/lib/git-annex.linux/lib/x86_64-linux-gnu/libz.so.1
git 2808669 yoh mem REG 259,5 539304 8395937 /usr/lib/git-annex.linux/usr/lib/x86_64-linux-gnu/libpcre2-8.so.0
git 2808669 yoh mem REG 259,5 3008120 8395330 /usr/lib/git-annex.linux/shimmed/git/git
git 2808669 yoh 0r FIFO 0,13 0t0 99388817 pipe
git 2808669 yoh 1w FIFO 0,13 0t0 99388818 pipe
git 2808669 yoh 2w REG 0,48 0 35023711 /home/yoh/.tmp/datalad_temp__runneroutput__mq55kiau
</code></pre>
<p></details>
we can see why it is fluctuating, although I have no clue why those are opened by <code>git cat-file</code>: connections to the remote (although -- why??), looking at .git/annex/tmp?</p>
<p>But overall problem seems to me is this heavy growth of external processes due to multiple external <code>git</code> invocations per each <code>annex get</code> thread, and then each process consuming very small, but still tens of, open files.</p>
comment 8http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_8_83eccac19016c971f1b799d935a4c0bb/yarikoptic2023-03-18T16:25:19Z2020-04-18T02:14:32Z
FWIW, kept running it a bit more, the number <code>git cat-file</code> processes grew up a bit (to 42) with total open files 1034 but seems to be stable, i.e. one other support that there is no leaking processes or file descriptors -- just sheer growth of sub processes leading to large number of open files.
comment 9http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_9_5db907d77c5f5490c1bdc8e51387a9e9/joey2023-03-18T16:25:19Z2020-04-20T15:54:15Z
<p>Thinking about this over the weekend, I had two ideas:</p>
<ul>
<li><p>The worker pool has an AnnexState for each thread. If those could be
partitioned so eg perfom stage is always run by the same threads,
then when only one stage needs cat-file, the overall number of cat-file
processes would be reduced by 1/3rd.</p>
<p>This might be the least resource intensive approach. But, as threads
transition between stages, their AnnexState necessarily does too,
and the cleanup stage might need some state change made in the perform
stage, so swapping out the perform AnnexState for a cleanup one
seems hard to accomplish.</p></li>
<li><p>Could have a pool of cat-files, and just have worker threads block until
one is available. This would let it be pinned to the -J number, or
event to a smaller number.</p>
<p>Seems likely that only 2 or 3 in the cat-file pool will
maximise concurrency, because it's not a major bottleneck most of the
time, and when it is the actual bottleneck is probably disk IO and so
won't be helped by more (and likely more only increase unnecessary seeks).</p></li>
</ul>
comment 10http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_10_61622483d4f1962f191fb6a791c6817d/joey2023-03-18T16:25:19Z2020-04-20T17:24:53Z
<p>Implemented the cat-file pool. Capped at 2 cat-files of each distinct type,
so it will start a max of 8 no matter the -J level.</p>
<p>(Although cat-file can also be run in those repositories so there will be
more then.)</p>
<p>While testing, I noticed git-anenx drop -Jn starts n git check-attr
processes, so the same thing ought to be done with them. Leaving this bug open
for that, but I do think that the problem you reported should be fixed now.</p>
comment 11http://git-annex.branchable.com/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_11_ccc6f5f1ac5743b0857f68cf21eaa6ea/joey2023-03-18T16:25:19Z2020-04-21T15:23:57Z
check-attr and check-ignore also converted to resource pools