Please describe the problem.
spotted that our backup job, running git-annex 10.20220322-g7b64dea, had not exited for a number of days. In htop I see:
381205 dandi 20 0 169M 4276 4272 S 0.0 0.0 0:05.15 │ └─ python -m tools.backups2datalad -l DEBUG -J 5 --target /mnt/backup/dandi/dandisets populate dandi-dandisets-dropbox
389485 dandi 20 0 1026G 1140M 16208 S 171. 1.8 155h │ └─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389486 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 17:33.20 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389488 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 22:07.69 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389490 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 24:49.46 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389494 dandi 20 0 10636 1968 1964 S 0.0 0.0 0:00.00 │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
389496 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 13:41.26 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389497 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 10:47.33 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389498 dandi 20 0 1026G 1140M 16208 S 43.1 1.8 36h53:20 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389499 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 27:15.88 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389500 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 32:34.61 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389501 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 25:05.81 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389506 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 26:42.46 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389507 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 28:43.57 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389510 dandi 20 0 10636 1976 1972 S 0.0 0.0 0:00.02 │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch
389511 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 20:59.64 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389512 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 25:06.37 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389513 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 33:52.81 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389514 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 29:43.56 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389516 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 23:44.52 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389517 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 17:07.65 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389519 dandi 20 0 1026G 1140M 16208 S 43.1 1.8 36h16:00 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389520 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 18:49.32 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389521 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 0:00.02 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389522 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 18:45.09 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389524 dandi 20 0 1026G 1140M 16208 S 41.8 1.8 36h32:16 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389528 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 20:00.35 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389556 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 24:22.72 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
389759 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 0:00.04 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
390392 dandi 20 0 1026G 1140M 16208 S 42.5 1.8 37h16:15 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
391588 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 20:03.40 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
394739 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 0:00.00 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
441349 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 9:46.65 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
444725 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 7:42.72 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
1910907 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 0:00.00 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
2277116 dandi 20 0 1026G 1140M 16208 S 0.0 1.8 0:00.00 │ └─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here
relevant open files in that git repo:
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000233$ lsof -p 389485 | grep 000
lsof: WARNING: can't stat() btrfs file system /mnt/backup/docker/btrfs
Output information may be incomplete.
git-annex 389485 dandi cwd DIR 0,40 266 88559549 /mnt/backup/dandi/dandisets/000233
git-annex 389485 dandi mem REG 9,0 93000 3146348 /lib/x86_64-linux-gnu/libresolv-2.31.so
git-annex 389485 dandi mem-r REG 0,35 118071211 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db-shm (path dev=0,40)
git-annex 389485 dandi 29u REG 0,40 139264 88559612 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db
git-annex 389485 dandi 30u REG 0,40 0 117504617 /mnt/backup/dandi/dandisets/000233/.git/annex/transfer/download/00000000-0000-0000-0000-000000000001/lck.SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb
git-annex 389485 dandi 32u REG 0,40 0 118071210 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db-wal
git-annex 389485 dandi 34u REG 0,40 32768 118071211 /mnt/backup/dandi/dandisets/000233/.git/annex/keysdb/db-shm
git-annex 389485 dandi 50w REG 0,40 47012344309 117504625 /mnt/backup/dandi/dandisets/000233/.git/annex/tmp/SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb
That key is available from:
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000233$ git annex whereis --key SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb
whereis SHA256E-s124312933165--d769cd7cbec2012515085d63eee594299d7411500c0bb9c13d10c0cd68d5ddce.nwb (1 copy)
00000000-0000-0000-0000-000000000001 -- web
web: https://api.dandiarchive.org/api/assets/5304ce04-3a87-400c-a484-aa889156a505/download/
web: https://dandiarchive.s3.amazonaws.com/blobs/c84/640/c846401f-10bf-452f-b26a-82e3d905556c?versionId=naDNt0WozdY8oi6M9L6MGhfsvnRuW6RJ
ok
where the first one is redirect to the second one. And I think it had not progressed from that 47012344309 size in quite awhile (will check again in an hour).
I would have expected it to fail may be (and possibly retry
if annex.retry) but not really get stuck.
Is there any diagnostic information I should collect to help troubleshooting the issue, or could I just kill/restart that process?
git-annex has stall detection, but it's not enabled by default, because how slow a "stall" is can vary quite a bit. Setting annex.stalldetection would probably help.
Plenty of http clients can stall out if there's a network problem. For example, wget that and pull the ethernet cable; you'll see wget hang for at least as long as it took me to write this comment. Since it's just waiting for a read, it will block until the TCP connection closes, and situations can cause dead TCP connections to not close for a long time.
FWIW, I have added
(base) dandi@drogon:/mnt/backup/dandi/dandisets$ git config annex.stalldetection 1KB/120s
and I see 4 days old (now it isTue 24 May 2022 04:34:33 PM EDT
)which smells like stuck... looking at open files, I see the stderr to be redirected and looking into it:
and I do not think that file saw any updates (
tail -f
is silent):NB I got a bit surprised that we dump to such possibly shared stderr file, but it is done within
tools/backups2datalad-cron
which we guard with flock to not run more than 1 instance so it is ok.Would you recommend to look at anything specific or may be
thread blocked indefinitely in an MVar operation
is giving ideas? (FTR: I believe the annex is10.20220322-g7b64dea
)I should have looked above in that stderr where interesting part begins with possibly relevant extra complaints:
strace -f -p 1683786
ends withI will now interrupt that process to "proceed", knowing what to do next time would be helpful
Concurrent writes to sqlite (in WAL mode) can fail, but git-annex deals with that by retrying 100 times (once every 0.1 seconds). So apparently other writes kept making it fail, or the database is somehow stuck locked.
I think that the MVar blocking happens as a consequence of the sqlite write failing and is not worth investigating separately, probably.
What filesystem is this happening on?
BTRFS
over LVM over mdadm raid6 arrays
next one stalled at the end at ``` get sub-dt15/sub-dt15_ses-dt15_6705um_200213_091923_ecephys.nwb (from web...) ok get sub-bruce/sub-bruce_ses-bruce_0um_210115_155615_ecephys.nwb (from web...) ok get sub-bruce/sub-bruce_ses-bruce_0um_210110_171041_ecephys.nwb (from web...) ok get sub-ros/sub-ros_ses-ros_210320_091833_ecephys.nwb (from web...) ok (recording state in git...)
get sub-CGM49/sub-CGM49_ses-CGM49_512um_210227_093748_ecephys.nwb (from web...) failed to commit changes to sqlite database: Just SQLite3 returned ErrorBusy while attempting to perform step: database is locked CallStack (from HasCallStack): error, called at ./Database/Handle.hs:91:26 in main:Database.Handle (Delaying 1s before retrying....)
44% 9.58 GiB 57 MiB/s 3m36s get sub-CGM52/sub-CGM52_ses-CGM52_0um_210216_144320_ecephys.nwb (from web...) ```
strace end is
I have now killed that
get
process and restarted our backup beastie (should now get tocopy
thoseget
ed files into backup)It seems that sqlite on BTRFS is somewhat problimatic performance-wise. See this blog post from Colin Walters where firefox was periodically freezing due to sqlite, and apparently the database file was highly fragmented. https://blog.verbum.org/2020/07/14/on-btrfs/
Mounting with nodatacow or autodefrag may help, or
chattr +C .git/annex/keysdb/db
Possibly being slow could explain the ErrorBusy: If there are somehow two processes writing to the database and one is being slow, then the other one could get ErrorBusy repeatedly for the 10 seconds it takes git-annex to give up.
Hi Joey -- thanks for digging all that up! On my laptop I also use BTRFS (on top of dm encrypted volume) and always thought to check why
notmuch new
running at the end of offlineimap for me at times holds the entire laptop "stalled" for a few seconds. I think you gave me the answer! Following this article and defragmented.glass
files under~/Maildir/.notmuch/xapian
. Let's see now in the long run if I encounter similar stallingBut I am not yet entirely sure if it really answers the "total stall" situation -- slow should still work.
FWIW -- I have checked in two repositories which were mentioned in this issue, in one (000233) the
keysdb/db
was ok -- just 30 refs, but the 000026 (a heavy beast ;)) had 2994:FWIW I defraged it into 1 ref. And then did the
chattr +C
on all repos I have there so far. Let's see if that anyhow helps.PS Have been using BTRFS on that server and smaug for almost 10 years now and had thousands of git annex repos... I guess these particular ones are pushing some limits
interesting! For some (only some) repos chattr didn't work
and that is I guess because they already have
and I don't recall setting that attr for those some. Does
git-annex
do that since some time recently or in the past?PS
chattr +C
seems to have no effect onkeysdb/db
:but it does add it for e.g.
000246/.git/annex/keysdb.lck
... heh heh oddYou can set NOCOW only on emtpy files. e.g.
I tried modifying git-annex to throw ErrorBusy whenever it wrote to the sqlite database, to see how much of the behavior that could reproduce.
That led to 10 second delays due to retrying, and the "failed to commit changes to sqlite database: Just SQLite3 returned ErrorBusy" message.
But, it did not cause the "thread blocked indefinitely in an MVar operation". Which makes me wonder if that is caused by sqlite use elsewhere, perhaps in a read operation rather than a write.
I did try making reads throw ErrorBusy, and it didn't reproduce the MVar problem either, instead "sqlite query crashed: SQLite3 returned ErrorBusy".
FWIW, may be just as a defender of BTRFS decided to share a "terminal shot" of screens from drogon -- you can see how busy it is (
top
anddstat
at the bottom) downloading (>= 100MBps) and reading (70MBps) and saving (>200MBps) while there is a good number of parallel processes which have a number of parallel git-annex processes: screenshotand I don't know how well any other file system would have held up
ok, none of those 3 jobs returned, I have a good number (>40) of git* related processes "running", see screenshot . I will keep it in this "state" for possible introspection for another hour, and if no immediate ideas on what to look at, I will kill/update git-annex (from that
10.20220322-g7b64dea
) and restart jobs.FWIW, I have git config annex.stalldetection 1KB/120s
To hopefully find out what MVar operation is blocking, I have added lock debugging instrumentation to all database calls. This will need a special build of git-annex with the DebugLocks build flag enabled.
I have made a standalone tarball built that way available here: https://downloads.kitenet.net/git-annex/linux/debuglocks/
It should display a backtrace on stderr when the MVar deadlock happens.
Nodatacow is not an option. It disables most safetly guarentees of btrfs (also w.r.t. RAID I believe) and is essentially a hack. As pointed out, it's also difficult to enable retroactively.
Autodefrag is fundamentally broken and might result in overall worse performance. Also not an option.
If a file is highly fragmented (which it might not be, amount of extents isn't an accurate representation fragmentation on btrfs), the correct measure is to
btrfs filesystem defragment
it.If git-annex db fragmentation really is a common problem on btrfs (again, you'd have to do actual performance tests, I'm not aware of other means of measuring fragmentation), perhaps an automatic defrag similar to git's automatic gc would be appropriate. Keys DB files range in the Megabytes IME, so re-writing them every now and then shouldn't be a huge load.
A file's fragmentation on-disk also shouldn't cause issues on DB writes because btrfs is CoW; new writes don't go where the old data was. It can only impact read performance.
Free space fragmentation might impact write performance but that's not something a userspace program can or should solve. @yarikoptic try
btrfs balance start -dusage=10
.I'm not sure I understand the situation 100% but poor DB write performance with hangouts and stalls can also be caused by long filesystem commits.
Writing a ton of file data to the filesystem only puts it in the cache and it gets written later. When that "later" occurs (usually after 30s), as I understand it, all of that data (potentially megabytes-gigabytes) needs to be written to disk before a DB's synchronous write returns.
This is essentially what git-annex does when getting data from a remote; it "writes" a file's data while downloading (without committing, so no disk activity whatsoever) and then does a syncs at the end (lots of disk activity) before starting the next download. A DB commit at the end can take for as long as (up to) 30s worth of transferred data takes to write. With a fast source/pipe, that could potentially be a lot of data.
Btrfs itself also has issues with insane commit times under some conditions (dedup especially), compounding the issue.
It might be worth looking into the timings of
git annex get
's write operations usingiostat --human 1
.FWIW4Joey:
I have tried that version. Placing similar load I just got some
move
s to fail (our issue+idea to just call multiple times) and some other oddities (not new, yet to figure out) but overall -- I have not spotted similar MVar messages (filing a comment to "jinx" it, rerunning again -- seeing someget
's going ;))4Atemu:
isn't https://github.com/kilobyte/compsize the one reporting number of fragments, thus degree of fragmentation? As I summarized in comment above I have used it +
btrfs fi defrag
.well -- it confirms that there is "write silence" for a while followed by a burst, e.g.:
jinxing helped ... thought I thought to complain that I don't see any traceback but apparently it is due to
-J5
(I guess) and lines being ovrewritten, but I managed toCtrl-s
at a point showingwhich when I let it go became smth like
and reconfirmed for those in screenlog I quickly collected:
So in flushDbQueue, which is waiting for all queued writes to complete. When the write failed with an exception, a previous flushDbQueue would have left the queue's MVar empty.
So, now I understand how the original problem can lead to this MVar problem. And I've fixed that part of it. Now a write failing this way will refill the queue with what it failed to write. So it will try to write it again later.
This flushDbQueue probably also explains the hang at "recording state in git" since there is also a final flushDbQueue at that point, and I guess it fails to detect a deadlock at that point so just hangs forever. So my fix should also avoid that.
None of which means this is fixed, really, just the fallout from the write timeout problem will be less severe now.
It does occur to me that one "solution" might be to have flushDbQueue not rethrow the exception, and just re-enqueue the write. Then, if a write fails in the middle of a long git-annex command, it will just queue up sqlite changes until later. And hopefully whatever is preventing writing to the database at that time would later resolve itself. At shutdown, it would need to make sure to flush the remaining queue and not ignore a write failure then.
That has the potential problem that, if the write keeps failing, it might end up buffering a large number of writes in memory, and since the queue would be full, it would be trying to write every time. So it would both use an ever-growing amount of memory, and be slowed down by the write attempts.
Still, it does give it something better to do while the write is failing than sleeping and retrying, eg to do the rest of the work it's been asked to do.
(Update: Reads from a database first call flushDbQueue, and it would not be safe for that to return without actually writing to the database, since the read would then see possible stale information. It turns out that
git-annex get
does do a database read per file (getAssociatedFiles). So it seems this approach will not work.)@Atemu, that seems like at least a plausible theory, that disk write pressure (eg from getting annex objects) is causing sqlite writes to take a long time. Increasing the wait time from the current 10 seconds would make sense if that's what's happening.
The somewhat weird thing is that, with a single git-annex process, no matter the number of jobs, it only has a single database handle open, and so it should not be possible for one write to still be happening when a second write happens. Unless, of course, sqlite is doing something behind my back, like finishing the write in a separate thread? Or, of course, there might be a second process that has somehow not been visible..
So, I've added some debugging of when it commits to the database. @yarik, please get an updated build with commit 5da1a785086910c32559a46f9110779d765af0cd, and try after setting:
This should tell us if there are somehow multiple writers at the same time, and also will tell us the timing of how long it is taking to commit to the db.
it doesn't need custom build with DebugLocks right? Then I will "wait" for daily build of datalad/git-annex to happen and test then... meanwhile also running
git clone https://github.com/dandizarrs/4ea5a47f-10d6-48a7-8c1e-ce6ce511f9a7; cd 4ea5a47f-10d6-48a7-8c1e-ce6ce511f9a7; git annex get -J25 .
on smaug to see if reproduces there (but it is not busy IO so most likely would finish without whining; boosted to -J25 to "make it harder").smaug
is also BTRFS although underlying storage is setup differently (no LVM)It is reporting the number of extents, just like
filefrag
does.Non-CoW filesystems overwrite data in-place, so there is never a need to create new fragments for (pre-allocated) in-file writes. New extents are only created when the free space at the file's location doesn't allow for further expansion and fragmentation is required.
# of extents is therefore indicative of fragmentation in non-CoW filesystems but this does not hold true in CoW filesystems like btrfs. If you wrote 1M to the first half of a 2M file and then 1M to the latter, you'll have 2 1M extents that might be right behind each other on-disk due to CoW.
With transparent compression in the mix, this gets even worse because extent size caps out at 128K. A 1M incompressible file will have 8 extents/"fragments" at minimum despite its data possibly being layed out fully sequentially.
Does this burst occur at the same time the DB times out?
If you take a look at
top
/iotop
, which threads are using CPU time and writing? Look out for btrfs-transacti and btrfs-endio-wri.The problem isn't exactly that. Sure, write activity will make DB operations take longer (larger queue -> higher latency) but this effect should stay within reasonable bounds. I think that the actual big problem is that the object writes are buffered and not written until they hit a deadline after which they must be written ASAP. I believe in such a situation, any other write operation must wait until those dirty pages are committed and a commit can take quite a while in btrfs due to integrity guarantees.
I think it'd be better to avoid these dirty page deadline commits, start writing out data sooner (ideally immediately) and block the source if the target can't keep up.
IMO there's not much point in letting the target seem to "write" at a higher speed than it can actually take in a case of sequential writes like git-annex' object transfer. It'll only lead to wasted memory and erratic IO. (I guess a small buffer (something that could be flushed in a few seconds) could be helpful in smoothing over a little variance.)
Revisiting this, I don't understand why the htop at the top of this page lists so many
git-annex get
processes. They all seem to be children of a single parentgit-annex get
process. Butgit-annex get
does not fork children like those. I wonder if perhaps those child processes were forked() to do something else, but somehow hung before they could exec?!By comment #2, annex.stalldetection is enabled, so
git-annex get
runs 5git-annex transferrer
processes. Each of those can write to the database, so concurrent sqlite writes can happen. So, my "re: comment 16" comment was off-base in thinking there was a single git-annex process. And so, I don't think the debug info requested in that comment is needed.Also, it turns out that the database queue is being flushed after every file it gets, which is causing a sqlite write per file. So there are a lot of sqlite writes happening, which probably makes this issue much more likely to occur, on systems with slow enough disk IO that it does occur. Especially if the files are relatively small.
The reason for the queue flush is partly that Annex.run forces a queue flush after every action. That could, I think be avoided. That was only done to make sure the queue is flushed before the program exits, which should be able to be handled in a different way. But also, the queue has to be flushed before reading from the database in order for the read to see current information. In the
git-annex get
case, it queues a change to the inode cache, and then reads the associated files. To avoid that, it would need to keep track of the two different tables in the keys db, and flush the queue only when querying a table that a write had been queued to. That would be worth doing just to generally speed upgit-annex get
. A quick benchmark shows a get of 1000 small files that takes 17s will only take 12s once that's done. And that's on a fast SSD, probably much more on a hard drive!So I don't have a full solution, but speeding git-annex up significantly and also making whatever the problem in this bug is probably much less likely to occur is a good next step..
I've avoided the excessive sqlite database writes. Which doubled the speed of git-annex in some circumstances, wow!
@yoh see if it still happens once you upgrade to a git-annex with 6fbd337e34cd62881ed5e21616e9c00d7bf35378.
It would be possible for git-annex to do its own locking around writes to the sqlite database. That would surely avoid any problem that sqlite might have that would cause ErrorBusy. I want to think some about withExclusiveLock blocking issue first.
I've made it retry as long as necessary on ErrorBusy, while also noticing when another process is suspended and has the sqlite database locked, and avoiding retrying forever in that situation.
This seems to be as far as I can take this bug report, I don't know 100% for sure if I've fixed it, but git-annex's behavior should certainly be improved.