SQLite3 returned ErrorBusy while attempting to perform step: database is locked

Please describe the problem.

spotted that our backup job, running git-annex 10.20220322-g7b64dea, had not exited for a number of days. In htop I see:

 381205 dandi      20   0  169M  4276  4272 S  0.0  0.0  0:05.15 │                    └─ python -m tools.backups2datalad -l DEBUG -J 5 --target /mnt/backup/dandi/dandisets populate dandi-dandisets-dropbox
 389485 dandi      20   0 1026G 1140M 16208 S 171.  1.8     155h │                       └─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389486 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 17:33.20 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389488 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 22:07.69 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389490 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 24:49.46 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389494 dandi      20   0 10636  1968  1964 S  0.0  0.0  0:00.00 │                          ├─ git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
 389496 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 13:41.26 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389497 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 10:47.33 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389498 dandi      20   0 1026G 1140M 16208 S 43.1  1.8 36h53:20 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389499 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 27:15.88 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389500 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 32:34.61 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389501 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 25:05.81 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389506 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 26:42.46 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389507 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 28:43.57 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389510 dandi      20   0 10636  1976  1972 S  0.0  0.0  0:00.02 │                          ├─ git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
 389511 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 20:59.64 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389512 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 25:06.37 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389513 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 33:52.81 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389514 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 29:43.56 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389516 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 23:44.52 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389517 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 17:07.65 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389519 dandi      20   0 1026G 1140M 16208 S 43.1  1.8 36h16:00 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389520 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 18:49.32 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389521 dandi      20   0 1026G 1140M 16208 S  0.0  1.8  0:00.02 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389522 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 18:45.09 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389524 dandi      20   0 1026G 1140M 16208 S 41.8  1.8 36h32:16 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389528 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 20:00.35 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389556 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 24:22.72 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 389759 dandi      20   0 1026G 1140M 16208 S  0.0  1.8  0:00.04 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 390392 dandi      20   0 1026G 1140M 16208 S 42.5  1.8 37h16:15 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 391588 dandi      20   0 1026G 1140M 16208 S  0.0  1.8 20:03.40 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 394739 dandi      20   0 1026G 1140M 16208 S  0.0  1.8  0:00.00 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 441349 dandi      20   0 1026G 1140M 16208 S  0.0  1.8  9:46.65 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
 444725 dandi      20   0 1026G 1140M 16208 S  0.0  1.8  7:42.72 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
1910907 dandi      20   0 1026G 1140M 16208 S  0.0  1.8  0:00.00 │                          ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
2277116 dandi      20   0 1026G 1140M 16208 S  0.0  1.8  0:00.00 │                          └─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here

relevant open files in that git repo:

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000233$ lsof -p 389485 | grep 000
lsof: WARNING: can't stat() btrfs file system /mnt/backup/docker/btrfs
      Output information may be incomplete.
git-annex 389485 dandi  cwd       DIR      0,40         266  88559549 /mnt/backup/dandi/dandisets/000233
git-annex 389485 dandi  mem       REG       9,0       93000   3146348 /lib/x86_64-linux-gnu/libresolv-2.31.so
git-annex 389485 dandi  mem-r     REG      0,35             118071211 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db-shm (path dev=0,40)
git-annex 389485 dandi   29u      REG      0,40      139264  88559612 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db
git-annex 389485 dandi   30u      REG      0,40           0 117504617 /mnt/backup/dandi/dandisets/000233/.git/annex/transfer/download/00000000-0000-0000-0000-000000000001/lck.SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb
git-annex 389485 dandi   32u      REG      0,40           0 118071210 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db-wal
git-annex 389485 dandi   34u      REG      0,40       32768 118071211 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db-shm
git-annex 389485 dandi   50w      REG      0,40 47012344309 117504625 /mnt/backup/dandi/dandisets/000233/.git/annex/tmp/SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb

That key is available from:

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000233$ git annex whereis --key SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb
whereis SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb (1 copy) 
        00000000-0000-0000-0000-000000000001 -- web

  web: https://api.dandiarchive.org/api/assets/5304ce04-3a87-400c-a484-aa889156a505/download/
  web: https://dandiarchive.s3.amazonaws.com/blobs/c84/640/c846401f-10bf-452f-b26a-82e3d905556c?versionId=naDNt0WozdY8oi6M9L6MGhfsvnRuW6RJ
ok

where the first one is redirect to the second one. And I think it had not progressed from that 47012344309 size in quite awhile (will check again in an hour).

I would have expected it to fail may be (and possibly retry if annex.retry) but not really get stuck.

Is there any diagnostic information I should collect to help troubleshooting the issue, or could I just kill/restart that process?

RSS Atom

comment 1

git-annex has stall detection, but it's not enabled by default, because how slow a "stall" is can vary quite a bit. Setting annex.stalldetection would probably help.

Plenty of http clients can stall out if there's a network problem. For example, wget that and pull the ethernet cable; you'll see wget hang for at least as long as it took me to write this comment. Since it's just waiting for a read, it will block until the TCP connection closes, and situations can cause dead TCP connections to not close for a long time.

Comment by joey — Wed May 18 20:53:33 2022

Remove comment

comment 2

FWIW, I have added (base) dandi@drogon:/mnt/backup/dandi/dandisets$ git config annex.stalldetection 1KB/120s and I see 4 days old (now it is Tue 24 May 2022 04:34:33 PM EDT)

dandi    1682561  0.0  0.1 173328 73808 ?        S    May20   0:02                 python -m tools.backups2datalad -l DEBUG -J 5 --target /mnt/backup/dandi/dandisets populate dandi-dandisets-dropbox
dandi    1683786  0.0  0.0 1075306796 61632 ?    Sl   May20   2:49                   git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
dandi    1683813  0.4  0.0 1074200772 46872 ?    Sl   May20  26:18                     /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    1683820  0.0  0.0 109896  2504 ?        S    May20   0:00                       git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    1683819  0.4  0.0 1074200772 48152 ?    Sl   May20  24:56                     /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    1683826  0.0  0.0 109896  2320 ?        S    May20   0:00                       git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    1683830  0.4  0.0 1074200772 47092 ?    Sl   May20  26:27                     /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    1683843  0.0  0.0 109896  2288 ?        S    May20   0:00                       git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    1683846  0.4  0.0 1074200804 47932 ?    Sl   May20  25:12                     /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    1683852  0.0  0.0 109896  2460 ?        S    May20   0:00                       git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    1683863  0.0  0.0 1074127072 47300 ?    Sl   May20   0:26                     /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    1683869  0.0  0.0 109896  3244 ?        S    May20   0:00                       git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch

which smells like stuck... looking at open files, I see the stderr to be redirected and looking into it:

(base) dandi@drogon:/mnt/backup/dandi/dandisets$ tail /mnt/backup/dandi/dandisets/.git/tmp/stderr
(Delaying 1s before retrying....)
ok
get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-MotorCortexS08_stain-Calretinin_chunk-00_SPIM.ome.tif (from web...) 
  thread blocked indefinitely in an MVar operation
(Delaying 1s before retrying....)
ok
get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-MotorCortexS08_stain-Nuclei_chunk-05_SPIM.ome.tif (from web...) 
  thread blocked indefinitely in an MVar operation
(Delaying 1s before retrying....)
ok

and I do not think that file saw any updates (tail -f is silent):

(base) dandi@drogon:/mnt/backup/dandi/dandisets$ stat /mnt/backup/dandi/dandisets/.git/tmp/stderr
  File: /mnt/backup/dandi/dandisets/.git/tmp/stderr
  Size: 28944       Blocks: 64         IO Block: 4096   regular file
Device: 28h/40d Inode: 34338261    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1001/   dandi)   Gid: ( 1015/   dandi)
Access: 2021-12-16 17:10:51.838720227 -0500
Modify: 2022-05-20 21:10:19.383226789 -0400
Change: 2022-05-20 21:10:19.383226789 -0400
 Birth: 2021-12-16 17:10:51.838720227 -0500

NB I got a bit surprised that we dump to such possibly shared stderr file, but it is done within tools/backups2datalad-cron which we guard with flock to not run more than 1 instance so it is ok.

Would you recommend to look at anything specific or may be thread blocked indefinitely in an MVar operation is giving ideas? (FTR: I believe the annex is 10.20220322-g7b64dea)

Comment by yarikoptic — Tue May 24 20:35:58 2022

Remove comment

comment 3

I should have looked above in that stderr where interesting part begins with possibly relevant extra complaints:

    52  2022-05-20T19:27:59-0400 [INFO    ] backups2datalad Downloading files for Dandiset 000026
    53  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS10_stain-NeuN_chunk-04_SPIM.ome.tif (from web...) 
    54    failed to commit changes to sqlite database: Just SQLite3 returned ErrorBusy while attempting to perform step: database is locked
    55    CallStack (from HasCallStack):
    56      error, called at ./Database/Handle.hs:91:26 in main:Database.Handle
    57  (Delaying 1s before retrying....)
    58  ok
    59  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS10_stain-NeuN_chunk-02_SPIM.ome.tif (from web...) 
    60    thread blocked indefinitely in an MVar operation
    61  (Delaying 1s before retrying....)
    62  ok
    63  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS12_stain-Calretinin_chunk-00_SPIM.ome.tif (from web...) 
    64    thread blocked indefinitely in an MVar operation
    65  (Delaying 1s before retrying....)
    66  ok
    67  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS12_stain-NeuN_chunk-01_SPIM.ome.tif (from web...) 
    68    thread blocked indefinitely in an MVar operation
    69  (Delaying 1s before retrying....)
    70  ok
    71  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS12_stain-Nuclei_chunk-01_SPIM.ome.tif (from web...) 
    72    thread blocked indefinitely in an MVar operation
    73  (Delaying 1s before retrying....)
    74  ok
    75  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS12_stain-Nuclei_chunk-05_SPIM.ome.tif (from web...) 
    76    failed to commit changes to sqlite database: Just SQLite3 returned ErrorBusy while attempting to perform step: database is locked
    77    CallStack (from HasCallStack):
    78      error, called at ./Database/Handle.hs:91:26 in main:Database.Handle
    79  (Delaying 1s before retrying....)
    80  ok
    81  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS12_stain-Nuclei_chunk-06_SPIM.ome.tif (from web...) 
    82    thread blocked indefinitely in an MVar operation
    83  (Delaying 1s before retrying....)
    84  ok
    85  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS13_stain-NeuN_chunk-02_SPIM.ome.tif (from web...) 
    86    thread blocked indefinitely in an MVar operation
    87  (Delaying 1s before retrying....)
    88  ok
    89  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS14_stain-Calretinin_chunk-01_SPIM.ome.tif (from web...) 
    90    thread blocked indefinitely in an MVar operation
    91  (Delaying 1s before retrying....)
    92  ok
    93  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS13_stain-Nuclei_chunk-05_SPIM.ome.tif (from web...) 
    94    thread blocked indefinitely in an MVar operation
    95  (Delaying 1s before retrying....)
    96  ok
    97  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS14_stain-NeuN_chunk-07_SPIM.ome.tif (from web...) 
    98    thread blocked indefinitely in an MVar operation
    99  (Delaying 1s before retrying....)
   100  ok
   101  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS13_stain-Nuclei_chunk-08_SPIM.ome.tif (from web...) 
   102    thread blocked indefinitely in an MVar operation
   103  (Delaying 1s before retrying....)
   104  ok
   105  get rawdata/sub-I46/ses-SPIM/micr/sub-I46_ses-SPIM_sample-BrocaAreaS14_stain-NeuN_chunk-10_SPIM.ome.tif (from web...) 
   106    thread blocked indefinitely in an MVar operation
   107  (Delaying 1s before retrying....)

strace -f -p 1683786 ends with

[pid 1683799] <... sched_yield resumed>) = 0
[pid 1683798] sched_yield( <unfinished ...>
[pid 1683810] futex(0x7fd5cc000c08, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 1683803] futex(0x7fd5c0000be8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 1683799] futex(0x5b180ec, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 1683798] <... sched_yield resumed>) = 0
[pid 1683798] futex(0x7fd5d8000bec, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 1683787] <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8
[pid 1683787] futex(0x4a4844c, FUTEX_WAIT_PRIVATE, 0, NULL

I will now interrupt that process to "proceed", knowing what to do next time would be helpful

Comment by yarikoptic — Tue May 24 21:05:03 2022

Remove comment

comment 4

Concurrent writes to sqlite (in WAL mode) can fail, but git-annex deals with that by retrying 100 times (once every 0.1 seconds). So apparently other writes kept making it fail, or the database is somehow stuck locked.

I think that the MVar blocking happens as a consequence of the sqlite write failing and is not worth investigating separately, probably.

What filesystem is this happening on?

Comment by joey — Wed May 25 16:49:30 2022

Remove comment

What filesystem is this happening on?

BTRFS

/dev/mapper/vg_backup-lv_backup on /mnt/backup type btrfs (rw,noatime,compress=lzo,space_cache=v2,subvolid=5,subvol=/,x-systemd.mount-timeout=30m)

over LVM over mdadm raid6 arrays

Comment by yarikoptic — Wed May 25 21:09:38 2022

Remove comment

another/different? stall

next one stalled at the end at ``` get sub-dt15/sub-dt15_ses-dt15_6705um_200213_091923_ecephys.nwb (from web...) ok get sub-bruce/sub-bruce_ses-bruce_0um_210115_155615_ecephys.nwb (from web...) ok get sub-bruce/sub-bruce_ses-bruce_0um_210110_171041_ecephys.nwb (from web...) ok get sub-ros/sub-ros_ses-ros_210320_091833_ecephys.nwb (from web...) ok (recording state in git...)

with recent `gets` not complaining about being unable to lock, only some longer before did, e.g.

get sub-CGM49/sub-CGM49_ses-CGM49_512um_210227_093748_ecephys.nwb (from web...) failed to commit changes to sqlite database: Just SQLite3 returned ErrorBusy while attempting to perform step: database is locked CallStack (from HasCallStack): error, called at ./Database/Handle.hs:91:26 in main:Database.Handle (Delaying 1s before retrying....)

44% 9.58 GiB 57 MiB/s 3m36s get sub-CGM52/sub-CGM52_ses-CGM52_0um_210216_144320_ecephys.nwb (from web...) ```

dandi    2013903  0.0  0.0   5492     0 pts/1    S+   May25   0:00       flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c /mnt/backup/dandi/dandisets/tools/backups2datalad-cron
dandi    2013904  0.0  0.0   7024  1892 pts/1    S+   May25   0:00         /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-cron
dandi    2013933  0.0  0.0   7024  1520 pts/1    S+   May25   0:00           /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-cron
dandi    2113421  0.0  0.0 321448  3140 pts/1    S+   May25   0:04             python -m tools.backups2datalad -l DEBUG -J 5 --target /mnt/backup/dandi/dandisets populate dandi-dandisets-dropbox
dandi    2151114  0.8  0.0 1075306800 32416 pts/1 Sl+ May25   9:39               git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
dandi    2151142 11.6  0.0 1074200808 37352 pts/1 Sl  May25 138:50                 /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    2151170  0.0  0.0  10640  1260 pts/1    S    May25   0:00                   git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    2151143 12.9  0.0 1074200808 36384 pts/1 Sl  May25 153:23                 /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    2151158  0.0  0.0  10640  1256 pts/1    S    May25   0:00                   git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    2151147 12.7  0.0 1074127044 35308 pts/1 Sl  May25 150:49                 /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    2151160  0.0  0.0  10640  1232 pts/1    S    May25   0:00                   git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    2151148 16.3  0.0 1074200776 35596 pts/1 Sl  May25 193:42                 /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    2151168  0.0  0.0  10640  1336 pts/1    S    May25   0:00                   git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
dandi    2151161 13.0  0.0 1074200808 32060 pts/1 Sl  May25 155:15                 /home/dandi/miniconda3/envs/dandisets/bin/git-annex transferrer -c annex.retry=3
dandi    2151177  0.0  0.0  10640  1096 pts/1    S    May25   0:00                   git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch

strace end is

[pid 2151125] sched_yield( <unfinished ...>
[pid 2151129] <... sched_yield resumed>) = 0
[pid 2151125] <... sched_yield resumed>) = 0
[pid 2151129] sched_yield()             = 0
[pid 2151125] sched_yield( <unfinished ...>
[pid 2151129] sched_yield( <unfinished ...>
[pid 2151125] <... sched_yield resumed>) = 0
[pid 2151129] <... sched_yield resumed>) = 0
[pid 2151137] rt_sigprocmask(SIG_SETMASK, [],  <unfinished ...>
[pid 2151126] sched_yield( <unfinished ...>
[pid 2151137] <... rt_sigprocmask resumed>NULL, 8) = 0
[pid 2151129] sched_yield( <unfinished ...>
[pid 2151125] sched_yield( <unfinished ...>
[pid 2151137] futex(0x7fe75c000be8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2151129] <... sched_yield resumed>) = 0
[pid 2151125] <... sched_yield resumed>) = 0
[pid 2151129] futex(0x7fe740000be8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2151125] futex(0x52eae6c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2151126] <... sched_yield resumed>) = 0
[pid 2151126] futex(0x52eb068, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 2151115] <... read resumed>"\1\0\0\0\0\0\0\0", 8) = 8
[pid 2151115] futex(0x4a4844c, FUTEX_WAIT_PRIVATE, 0, NULL^Cstrace: Process 2151114 detached
strace: Process 2151115 detached
 <detached ...>
strace: Process 2151116 detached
strace: Process 2151117 detached
strace: Process 2151119 detached
strace: Process 2151124 detached
strace: Process 2151125 detached
strace: Process 2151126 detached
strace: Process 2151127 detached
strace: Process 2151128 detached
strace: Process 2151129 detached
strace: Process 2151132 detached
strace: Process 2151133 detached
strace: Process 2151136 detached
strace: Process 2151137 detached
strace: Process 2151138 detached
strace: Process 2151139 detached
strace: Process 2151140 detached
strace: Process 2677004 detached
strace: Process 2925046 detached
strace: Process 3026979 detached
strace: Process 3027102 detached

I have now killed that get process and restarted our backup beastie (should now get to copy those geted files into backup)

Comment by yarikoptic — Thu May 26 14:31:33 2022

Remove comment

comment 7

It seems that sqlite on BTRFS is somewhat problimatic performance-wise. See this blog post from Colin Walters where firefox was periodically freezing due to sqlite, and apparently the database file was highly fragmented. https://blog.verbum.org/2020/07/14/on-btrfs/

Mounting with nodatacow or autodefrag may help, or chattr +C .git/annex/keysdb/db

Possibly being slow could explain the ErrorBusy: If there are somehow two processes writing to the database and one is being slow, then the other one could get ErrorBusy repeatedly for the 10 seconds it takes git-annex to give up.

Comment by joey — Tue May 31 17:16:26 2022

Remove comment

comment 8

Hi Joey -- thanks for digging all that up! On my laptop I also use BTRFS (on top of dm encrypted volume) and always thought to check why notmuch new running at the end of offlineimap for me at times holds the entire laptop "stalled" for a few seconds. I think you gave me the answer! Following this article and defragmented .glass files under ~/Maildir/.notmuch/xapian . Let's see now in the long run if I encounter similar stalling

But I am not yet entirely sure if it really answers the "total stall" situation -- slow should still work.

FWIW -- I have checked in two repositories which were mentioned in this issue, in one (000233) the keysdb/db was ok -- just 30 refs, but the 000026 (a heavy beast ;)) had 2994:

drogon:/mnt/backup/dandi/dandisets/000026/.git/annex# /usr/sbin/compsize keysdb/db 
Processed 1 file, 2643 regular extents (2994 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       98%       20M          20M          14M       
none       100%       19M          19M          13M       
lzo         46%      352K         752K         400K

FWIW I defraged it into 1 ref. And then did the chattr +C on all repos I have there so far. Let's see if that anyhow helps.

PS Have been using BTRFS on that server and smaug for almost 10 years now and had thousands of git annex repos... I guess these particular ones are pushing some limits

Comment by yarikoptic — Tue May 31 19:07:59 2022

Remove comment

comment 9

interesting! For some (only some) repos chattr didn't work

(base) dandi@drogon:/mnt/backup/dandi/dandisets$ chattr +C */.git/annex/keysdb/db
chattr: Invalid argument while setting flags on 000008/.git/annex/keysdb/db
chattr: Invalid argument while setting flags on 000019/.git/annex/keysdb/db
...

and that is I guess because they already have

(base) dandi@drogon:/mnt/backup/dandi/dandisets$ lsattr 000228/.git/annex/keysdb/db
---------------------m 000228/.git/annex/keysdb/db

(base) dandi@drogon:/mnt/backup/dandi/dandisets$ lsattr -l  000228/.git/annex/keysdb/db
000228/.git/annex/keysdb/db  Dont_Compress

and I don't recall setting that attr for those some. Does git-annex do that since some time recently or in the past?

PS chattr +C seems to have no effect on keysdb/db:

(base) dandi@drogon:/mnt/backup/dandi/dandisets$ chattr +C 000246/.git/annex/keysdb/db
(base) dandi@drogon:/mnt/backup/dandi/dandisets$ lsattr 000246/.git/annex/keysdb/db
---------------------- 000246/.git/annex/keysdb/db
(base) dandi@drogon:/mnt/backup/dandi/dandisets$ file 000246/.git/annex/keysdb/db
000246/.git/annex/keysdb/db: SQLite 3.x database, last written using SQLite version 3033000

but it does add it for e.g. 000246/.git/annex/keysdb.lck ... heh heh odd

Comment by yarikoptic — Tue May 31 19:17:05 2022

Remove comment

comment 10

You can set NOCOW only on emtpy files. e.g.

mv db db.tmp
touch db
chattr +C db
cat db.tmp >db
rm -f db.tmp

Comment by Lukey — Wed Jun 1 05:27:21 2022

Remove comment

comment 11

Thank you @Lukey!

Comment by yarikoptic — Wed Jun 1 13:49:05 2022

Remove comment

comment 13

I tried modifying git-annex to throw ErrorBusy whenever it wrote to the sqlite database, to see how much of the behavior that could reproduce.

That led to 10 second delays due to retrying, and the "failed to commit changes to sqlite database: Just SQLite3 returned ErrorBusy" message.

But, it did not cause the "thread blocked indefinitely in an MVar operation". Which makes me wonder if that is caused by sqlite use elsewhere, perhaps in a read operation rather than a write.

I did try making reads throw ErrorBusy, and it didn't reproduce the MVar problem either, instead "sqlite query crashed: SQLite3 returned ErrorBusy".

Comment by joey — Wed Jun 1 16:43:42 2022

Remove comment

defending BTRFS

FWIW, may be just as a defender of BTRFS decided to share a "terminal shot" of screens from drogon -- you can see how busy it is (top and dstat at the bottom) downloading (>= 100MBps) and reading (70MBps) and saving (>200MBps) while there is a good number of parallel processes which have a number of parallel git-annex processes: screenshot

and I don't know how well any other file system would have held up

Comment by yarikoptic — Thu Jun 2 18:43:02 2022

Remove comment

comment 14

ok, none of those 3 jobs returned, I have a good number (>40) of git* related processes "running", see screenshot . I will keep it in this "state" for possible introspection for another hour, and if no immediate ideas on what to look at, I will kill/update git-annex (from that 10.20220322-g7b64dea) and restart jobs.

FWIW, I have git config annex.stalldetection 1KB/120s

Comment by yarikoptic — Fri Jun 3 16:42:38 2022

Remove comment

comment 13

To hopefully find out what MVar operation is blocking, I have added lock debugging instrumentation to all database calls. This will need a special build of git-annex with the DebugLocks build flag enabled.

I have made a standalone tarball built that way available here: https://downloads.kitenet.net/git-annex/linux/debuglocks/

It should display a backtrace on stderr when the MVar deadlock happens.

Comment by joey — Fri Jun 3 17:52:12 2022

Remove comment

comment 16

Nodatacow is not an option. It disables most safetly guarentees of btrfs (also w.r.t. RAID I believe) and is essentially a hack. As pointed out, it's also difficult to enable retroactively.

Autodefrag is fundamentally broken and might result in overall worse performance. Also not an option.

If a file is highly fragmented (which it might not be, amount of extents isn't an accurate representation fragmentation on btrfs), the correct measure is to btrfs filesystem defragment it.
If git-annex db fragmentation really is a common problem on btrfs (again, you'd have to do actual performance tests, I'm not aware of other means of measuring fragmentation), perhaps an automatic defrag similar to git's automatic gc would be appropriate. Keys DB files range in the Megabytes IME, so re-writing them every now and then shouldn't be a huge load.

A file's fragmentation on-disk also shouldn't cause issues on DB writes because btrfs is CoW; new writes don't go where the old data was. It can only impact read performance.
Free space fragmentation might impact write performance but that's not something a userspace program can or should solve. @yarikoptic try btrfs balance start -dusage=10.

I'm not sure I understand the situation 100% but poor DB write performance with hangouts and stalls can also be caused by long filesystem commits.
Writing a ton of file data to the filesystem only puts it in the cache and it gets written later. When that "later" occurs (usually after 30s), as I understand it, all of that data (potentially megabytes-gigabytes) needs to be written to disk before a DB's synchronous write returns.

This is essentially what git-annex does when getting data from a remote; it "writes" a file's data while downloading (without committing, so no disk activity whatsoever) and then does a syncs at the end (lots of disk activity) before starting the next download. A DB commit at the end can take for as long as (up to) 30s worth of transferred data takes to write. With a fast source/pipe, that could potentially be a lot of data.
Btrfs itself also has issues with insane commit times under some conditions (dedup especially), compounding the issue.

It might be worth looking into the timings of git annex get's write operations using iostat --human 1.

Comment by Atemu — Sun Jun 5 12:55:04 2022

Remove comment

comment 17

FWIW4Joey:

I have made a standalone tarball built that way available here: https://downloads.kitenet.net/git-annex/linux/debuglocks/

I have tried that version. Placing similar load I just got some moves to fail (our issue+idea to just call multiple times) and some other oddities (not new, yet to figure out) but overall -- I have not spotted similar MVar messages :-/ (filing a comment to "jinx" it, rerunning again -- seeing some get's going ;))

4Atemu:

I'm not aware of other means of measuring fragmentation

isn't https://github.com/kilobyte/compsize the one reporting number of fragments, thus degree of fragmentation? As I summarized in comment above I have used it + btrfs fi defrag.

It might be worth looking into the timings of git annex get's write operations using iostat --human 1.

well -- it confirms that there is "write silence" for a while followed by a burst, e.g.:

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
dm-0             50.00       800.0k         0.0k         0.0k     800.0k       0.0k       0.0k
dm-0              9.00       144.0k         0.0k         0.0k     144.0k       0.0k       0.0k
dm-0              0.00         0.0k         0.0k         0.0k       0.0k       0.0k       0.0k
dm-0              0.00         0.0k         0.0k         0.0k       0.0k       0.0k       0.0k
dm-0              0.00         0.0k         0.0k         0.0k       0.0k       0.0k       0.0k
dm-0              0.00         0.0k         0.0k         0.0k       0.0k       0.0k       0.0k
dm-0              0.00         0.0k         0.0k         0.0k       0.0k       0.0k       0.0k
dm-0             17.00         0.0k       207.3M         0.0k       0.0k     207.3M       0.0k
dm-0             50.00         4.0M       319.9M         0.0k       4.0M     319.9M       0.0k
dm-0            111.00        12.6M       365.8M         0.0k      12.6M     365.8M       0.0k
dm-0             99.00         2.9M       685.5M         0.0k       2.9M     685.5M       0.0k
dm-0            107.00         1.6M        10.4M         0.0k       1.6M      10.4M       0.0k
dm-0            236.00         3.7M         0.0k         0.0k       3.7M       0.0k       0.0k
dm-0            330.00         5.2M         0.0k         0.0k       5.2M       0.0k       0.0k
dm-0            323.00         5.0M         0.0k         0.0k       5.0M       0.0k       0.0k

Comment by yarikoptic — Mon Jun 6 14:00:05 2022

Remove comment

comment 18

I have made a standalone tarball built that way available here: https://downloads.kitenet.net/git-annex/linux/debuglocks/ It should display a backtrace on stderr when the MVar deadlock happens.

jinxing helped ... thought I thought to complain that I don't see any traceback but apparently it is due to -J5 (I guess) and lines being ovrewritten, but I managed to Ctrl-s at a point showing

get 0/0/0/3/5/50 (from web...)
MVar deadlock detected CallStack (from HasCallStack):
  debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
  thread blocked indefinitely in an MVar operation
(Delaying 1s before retrying....)

which when I let it go became smth like

get 0/0/0/3/0/98 (from web...)
  thread blocked indefinitely in an MVar operation
(Delaying 1s before retrying....)
ok
get 0/0/0/3/1/111 (from web...) ok

and reconfirmed for those in screenlog I quickly collected:

(base) dandi@drogon:~$ grep debugLocks screenlog.3
  debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
  debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
  debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
  debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
  debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue

Comment by yarikoptic — Mon Jun 6 14:50:16 2022

Remove comment

comment 19

So in flushDbQueue, which is waiting for all queued writes to complete. When the write failed with an exception, a previous flushDbQueue would have left the queue's MVar empty.

So, now I understand how the original problem can lead to this MVar problem. And I've fixed that part of it. Now a write failing this way will refill the queue with what it failed to write. So it will try to write it again later.

This flushDbQueue probably also explains the hang at "recording state in git" since there is also a final flushDbQueue at that point, and I guess it fails to detect a deadlock at that point so just hangs forever. So my fix should also avoid that.

None of which means this is fixed, really, just the fallout from the write timeout problem will be less severe now.

Comment by joey — Mon Jun 6 15:55:15 2022

Remove comment

comment 20

It does occur to me that one "solution" might be to have flushDbQueue not rethrow the exception, and just re-enqueue the write. Then, if a write fails in the middle of a long git-annex command, it will just queue up sqlite changes until later. And hopefully whatever is preventing writing to the database at that time would later resolve itself. At shutdown, it would need to make sure to flush the remaining queue and not ignore a write failure then.

That has the potential problem that, if the write keeps failing, it might end up buffering a large number of writes in memory, and since the queue would be full, it would be trying to write every time. So it would both use an ever-growing amount of memory, and be slowed down by the write attempts.

Still, it does give it something better to do while the write is failing than sleeping and retrying, eg to do the rest of the work it's been asked to do.

(Update: Reads from a database first call flushDbQueue, and it would not be safe for that to return without actually writing to the database, since the read would then see possible stale information. It turns out that git-annex get does do a database read per file (getAssociatedFiles). So it seems this approach will not work.)

Comment by joey — Mon Jun 6 16:12:53 2022

Remove comment

re: comment 16

@Atemu, that seems like at least a plausible theory, that disk write pressure (eg from getting annex objects) is causing sqlite writes to take a long time. Increasing the wait time from the current 10 seconds would make sense if that's what's happening.

The somewhat weird thing is that, with a single git-annex process, no matter the number of jobs, it only has a single database handle open, and so it should not be possible for one write to still be happening when a second write happens. Unless, of course, sqlite is doing something behind my back, like finishing the write in a separate thread? Or, of course, there might be a second process that has somehow not been visible..

So, I've added some debugging of when it commits to the database. @yarik, please get an updated build with commit 5da1a785086910c32559a46f9110779d765af0cd, and try after setting:

git config annex.debug true
git config annex.debugfilter Database.Handle

This should tell us if there are somehow multiple writers at the same time, and also will tell us the timing of how long it is taking to commit to the db.

Comment by joey — Mon Jun 6 16:20:55 2022

Remove comment

comment 22

So, I've added some debugging of when it commits to the database. @yarik, please get an updated build with commit 5da1a785086910c32559a46f9110779d765af0cd,

it doesn't need custom build with DebugLocks right? Then I will "wait" for daily build of datalad/git-annex to happen and test then... meanwhile also running git clone https://github.com/dandizarrs/4ea5a47f-10d6-48a7-8c1e-ce6ce511f9a7; cd 4ea5a47f-10d6-48a7-8c1e-ce6ce511f9a7; git annex get -J25 . on smaug to see if reproduces there (but it is not busy IO so most likely would finish without whining; boosted to -J25 to "make it harder"). smaug is also BTRFS although underlying storage is setup differently (no LVM)

Comment by yarikoptic — Mon Jun 6 17:22:41 2022

Remove comment

comment 23

isn't https://github.com/kilobyte/compsize the one reporting number of fragments, thus degree of fragmentation?

It is reporting the number of extents, just like filefrag does.

Non-CoW filesystems overwrite data in-place, so there is never a need to create new fragments for (pre-allocated) in-file writes. New extents are only created when the free space at the file's location doesn't allow for further expansion and fragmentation is required.

# of extents is therefore indicative of fragmentation in non-CoW filesystems but this does not hold true in CoW filesystems like btrfs. If you wrote 1M to the first half of a 2M file and then 1M to the latter, you'll have 2 1M extents that might be right behind each other on-disk due to CoW.

With transparent compression in the mix, this gets even worse because extent size caps out at 128K. A 1M incompressible file will have 8 extents/"fragments" at minimum despite its data possibly being layed out fully sequentially.

there is "write silence" for a while followed by a burst

Does this burst occur at the same time the DB times out?

If you take a look at top/iotop, which threads are using CPU time and writing? Look out for btrfs-transacti and btrfs-endio-wri.

disk write pressure (eg from getting annex objects) is causing sqlite writes to take a long time.

The problem isn't exactly that. Sure, write activity will make DB operations take longer (larger queue -> higher latency) but this effect should stay within reasonable bounds. I think that the actual big problem is that the object writes are buffered and not written until they hit a deadline after which they must be written ASAP. I believe in such a situation, any other write operation must wait until those dirty pages are committed and a commit can take quite a while in btrfs due to integrity guarantees.

I think it'd be better to avoid these dirty page deadline commits, start writing out data sooner (ideally immediately) and block the source if the target can't keep up.
IMO there's not much point in letting the target seem to "write" at a higher speed than it can actually take in a case of sequential writes like git-annex' object transfer. It'll only lead to wasted memory and erratic IO. (I guess a small buffer (something that could be flushed in a few seconds) could be helpful in smoothing over a little variance.)

Comment by Atemu — Tue Jun 7 12:07:29 2022

Remove comment

comment 24

Right, DebugLocks is not needed. The current daily build should work now.

Comment by joey — Wed Jun 8 16:53:06 2022

Remove comment

comment 25

Revisiting this, I don't understand why the htop at the top of this page lists so many git-annex get processes. They all seem to be children of a single parent git-annex get process. But git-annex get does not fork children like those. I wonder if perhaps those child processes were forked() to do something else, but somehow hung before they could exec?!

By comment #2, annex.stalldetection is enabled, so git-annex get runs 5 git-annex transferrer processes. Each of those can write to the database, so concurrent sqlite writes can happen. So, my "re: comment 16" comment was off-base in thinking there was a single git-annex process. And so, I don't think the debug info requested in that comment is needed.

Also, it turns out that the database queue is being flushed after every file it gets, which is causing a sqlite write per file. So there are a lot of sqlite writes happening, which probably makes this issue much more likely to occur, on systems with slow enough disk IO that it does occur. Especially if the files are relatively small.

The reason for the queue flush is partly that Annex.run forces a queue flush after every action. That could, I think be avoided. That was only done to make sure the queue is flushed before the program exits, which should be able to be handled in a different way. But also, the queue has to be flushed before reading from the database in order for the read to see current information. In the git-annex get case, it queues a change to the inode cache, and then reads the associated files. To avoid that, it would need to keep track of the two different tables in the keys db, and flush the queue only when querying a table that a write had been queued to. That would be worth doing just to generally speed up git-annex get. A quick benchmark shows a get of 1000 small files that takes 17s will only take 12s once that's done. And that's on a fast SSD, probably much more on a hard drive!

So I don't have a full solution, but speeding git-annex up significantly and also making whatever the problem in this bug is probably much less likely to occur is a good next step..

Comment by joey — Tue Oct 11 17:17:13 2022

Remove comment

comment 26

I've avoided the excessive sqlite database writes. Which doubled the speed of git-annex in some circumstances, wow!

@yoh see if it still happens once you upgrade to a git-annex with 6fbd337e34cd62881ed5e21616e9c00d7bf35378.

It would be possible for git-annex to do its own locking around writes to the sqlite database. That would surely avoid any problem that sqlite might have that would cause ErrorBusy. I want to think some about withExclusiveLock blocking issue first.

Comment by joey — Wed Oct 12 19:34:05 2022

Remove comment

comment 27

I've made it retry as long as necessary on ErrorBusy, while also noticing when another process is suspended and has the sqlite database locked, and avoiding retrying forever in that situation.

This seems to be as far as I can take this bug report, I don't know 100% for sure if I've fixed it, but git-annex's behavior should certainly be improved.

Comment by joey — Mon Oct 17 18:49:47 2022

Remove comment

Add a comment