more "filesystem efficient" journalling?

ATM the .git/annex/journal is a flat dump of files. In our DANDI use case where we create git-annex repos possibly with hundred thousand entries (without content being downloaded) it seems that process(es) (we run a number in parallel) are quite slow. Could be attributed to slow-ish filesystem not tuned for that purpose (will try resolving by moving to SDD, issue in dandisets), but also may be because .git/annex/journal accumulates those hundred thousands of files in a flat listing:

(base) dandi@drogon:/proc/1006503/cwd/.git$ ls -l /mnt/backup/dandi/partial-zarrs/c870fffc-0631-403c-884c-8e72e0ea0ff6/.git/annex/journal | nl | tail
142544  -rw-r--r-- 1 dandi dandi      58 Jul 13 10:07 fff_b65_MD5E-s1508088--cb779895508c24d25eef3aafc846ba7c.log
142545  -rw-r--r-- 1 dandi dandi     273 Jul 13 10:07 fff_b65_MD5E-s1508088--cb779895508c24d25eef3aafc846ba7c.log.web
142546  -rw-r--r-- 1 dandi dandi      58 Jul 13 11:47 fff_b93_MD5E-s1581690--45d0e91e6641c1d0a27bcaf59e92cb5b.log
142547  -rw-r--r-- 1 dandi dandi     273 Jul 13 11:47 fff_b93_MD5E-s1581690--45d0e91e6641c1d0a27bcaf59e92cb5b.log.web
142548  -rw-r--r-- 1 dandi dandi      57 Jul 13 10:20 fff_bdf_MD5E-s869056--7d2874f3997f3625f37bee0ea01daaf0.log
142549  -rw-r--r-- 1 dandi dandi     277 Jul 13 10:20 fff_bdf_MD5E-s869056--7d2874f3997f3625f37bee0ea01daaf0.log.web
142550  -rw-r--r-- 1 dandi dandi      58 Jul 13 10:45 fff_c6e_MD5E-s2258--8758b6c839dd5f77e187a692093321f1.log
142551  -rw-r--r-- 1 dandi dandi     273 Jul 13 10:45 fff_c6e_MD5E-s2258--8758b6c839dd5f77e187a692093321f1.log.web
142552  -rw-r--r-- 1 dandi dandi      57 Jul 13 11:09 fff_d36_MD5E-s133521--d7856d85429f13faca9cbc26f0b391f4.log
142553  -rw-r--r-- 1 dandi dandi     271 Jul 13 11:09 fff_d36_MD5E-s133521--d7856d85429f13faca9cbc26f0b391f4.log.web

so I wondered if may be there would be some benefit from making journal directory also (optionally is ok) be carry those leading directories, so that fff_d36_MD5E-s133521--d7856d85429f13faca9cbc26f0b391f4.log.web would become fff/d36/MD5E-s133521--d7856d85429f13faca9cbc26f0b391f4.log.web . Not sure if would be of any benefit, might even be the opposite since would need even more inodes and then logic to clean up whenever any given file is removed.

Or may be there could be some mode which "forces" journal to be processed whenever it reaches some specific size (e.g. 10k entries)?

FWIW, here is a list of processes involved here:

(base) dandi@drogon:/proc/1006503/cwd/.git$ fuser -v /mnt/backup/dandi/partial-zarrs/c870fffc-0631-403c-884c-8e72e0ea0ff6/ 2>&1 | awk '/\.\.c/{print $2;}'   | while read p; do ps -p $p -u | grep -v USER; done
dandi    2540282  0.2  0.0 1074052896 23268 pts/14 Sl+ 10:07   0:17 git-annex examinekey --batch --migrate-to-backend=MD5E
dandi    2540289  0.4  0.0 1074053052 40428 pts/14 Sl+ 10:07   0:33 git-annex whereis --batch-keys --json --json-error-messages
dandi    2540299  0.1  0.0  10640  4660 pts/14   S+   10:07   0:09 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
dandi    2540300  0.7  0.0 1074053052 53652 pts/14 Sl+ 10:07   0:59 git-annex fromkey --force --batch --json --json-error-messages
dandi    2540310  0.0  0.0  10640  4332 pts/14   S+   10:07   0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
dandi    2540313  0.2  0.0  10684  4296 pts/14   S+   10:07   0:19 git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
dandi    2540314 18.7  0.1 1074126764 76184 pts/14 Sl+ 10:07  23:35 git-annex registerurl --batch --json --json-error-messages
dandi    2540322  0.3  0.0  10640  4336 pts/14   S+   10:07   0:30 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
dandi    2540323  0.3  0.0   6888  3388 pts/14   S+   10:07   0:26 /bin/bash /usr/bin/git-annex-remote-rclone

notabug, as confirmed by yarikoptic --Joey

RSS Atom

comment 1

Is this the same machine that is 60x slower than my laptop at git-annex init?

Is catting a single file from the journal measurably slower when the directory has so many files in it vs a small number?

200k files in a directory does not produce a measurable slow down at reading a random file for me:

root@darkstar:/home/joey/tmp/d>ls -1 |wc -l
200000
root@darkstar:/home/joey/tmp/d>sync; echo 3 > /proc/sys/vm/drop_caches
root@darkstar:/home/joey/tmp/d>/usr/bin/time cat 99999
99999
0.00user 0.00system 0:00.00elapsed 66%CPU (0avgtext+0avgdata 1868maxresident)k

At 2 million files, that cat is still running in 0s here. Writing to a new file and overwriting an existing file also are still very fast.

It does take around 3 seconds to list the contents of the directory (37 seconds with 2 million files), but the only time git-annex does that is when it commits the journal, which is done only once per command.

Committing the journal more frequently to keep the number of files down would mean more commits to the git-annex branch. Also committing has overhead, and probably more overhead than listing a directory with a lot of files in it has.

What might be worth considering is periodically staging the journal to the annex index file without immediately committing it (still making one commit at the end). That seems like it would probably be safe to do, although I have not fully analized it. But I'm not convinced it would be an optmisation; it would mean repeatedly re-writing the index file, which gets expensive.

Comment by joey — Wed Jul 13 17:09:59 2022

Remove comment

comment 2

Is this the same machine that is 60x slower than my laptop at git-annex init?

FWIW: most likely (IIRC I have reported from that machine - drogon, and another one - falkor in the past few days with some timing results across different issues/etc).

Comment by yarikoptic — Wed Jul 13 19:04:40 2022

Remove comment

comment 3

I'll need to see some timings like the ones I gave, but a lot slower, to take this suggestion seriously..

Comment by joey — Wed Jul 13 21:26:38 2022

Remove comment

comment 4

here are some timings although I think you are right that it should not be any major effect. most likely my slow system is due to another issue with registerurl. But here are some timings showing that it is ok (23ms) to cat 1MB file, and list those 173k files in 17 seconds

root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time cat 5a4_81f_MD5E-s560--82188063b1988362cc3050918f493320.log.web >/dev/null

real    0m0.023s
user    0m0.001s
sys     0m0.001s
root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time cat 5a4_81f_MD5E-s560--82188063b1988362cc3050918f493320.log.web >/dev/null

real    0m0.001s
user    0m0.001s
sys     0m0.000s
root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time /bin/ls -1 | wc -l
173882

real    0m17.103s
user    0m0.535s
sys     0m0.173s
root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time touch 1

real    0m0.005s
user    0m0.001s
sys     0m0.000s
root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time bash -c 'echo 2 >> 1'

real    0m0.132s
user    0m0.017s
sys     0m0.010s
root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time rm 1

real    0m0.015s
user    0m0.001s
sys     0m0.000s

# repeated ls
root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time /bin/ls -1 | wc -l
177908

real    0m0.789s
user    0m0.577s
sys     0m0.067s
root@drogon:/mnt/backup/dandi/partial-zarrs/db99e810-3bcf-434f-89cb-ae7476224607/.git/annex/journal# time /bin/ls -1 | wc -l
177978

real    0m1.074s
user    0m0.547s
sys     0m0.077s

so if registerurl and others do not list that folder indeed until the point when journal is committed, should be all good

Comment by yarikoptic — Thu Jul 14 17:00:40 2022

Remove comment

Add a comment