Please describe the problem.
I'm unable to use adjusted branches (---unlock
) on WSL because of sqlite crashes. smudge
tends to fail with output like:
sqlite worker thread crashed: user error (SQLite3 returned ErrorProtocol while attempting to perform prepare "SELECT null from content limit 1": locking protocol(while opening database connection))
git-annex: sqlite query crashed
CallStack (from HasCallStack):
error, called at ./Database/Handle.hs:98:42 in main:Database.Handle
git-annex: smudge: 1 failed
error: external filter 'git-annex smudge --clean -- %f' failed 1
error: external filter 'git-annex smudge --clean -- %f' failed
What steps will reproduce the problem?
git init test
cd test
git annex init test
git annex upgrade
echo asdf > asdf
git annex add asdf # fails for some reason on DrvFs
git annex add asdf # works when executed second time
git annex sync
git annex --debug adjust --unlock
What version of git-annex are you using? On what operating system?
Windows 10 Pro v1903 build 18362.207 - WSL (Arch)
git-annex version: 7.20190615-g0bd9e8c0e2
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.3 feed-1.1.0.0 ghc-8.6.5 http-client-0.6.4 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
operating system: linux x86_64
supported repository versions: 5 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
local repository version: 7
Terminal log
$ git init test
Initialized empty Git repository in /mnt/d/debug/test/.git/
$ cd test
$ git annex init test
init test
Detected a filesystem without fifo support.
Disabling ssh connection caching.
ok
(recording state in git...)
$ git annex upgrade
upgrade (v5 to v6...) (v6 to v7...) ok
$ echo asdf > asdf
$ git annex add asdf
add asdf
git-annex: .git/annex/othertmp/asdf.0/asdf: rename: permission denied (Permission denied)
failed
git-annex: add: 1 failed
zsh: exit 1 git annex add asdf
$ git annex add asdf
add asdf ok
(recording state in git...)
$ git annex sync
commit
[master (root-commit) 54e120b] git-annex in test
1 file changed, 1 insertion(+)
create mode 120000 asdf
ok
$ git annex --debug adjust --unlock
[2019-07-06 11:23:39.4031313] read: git ["--version"]
[2019-07-06 11:23:39.4128108] process done ExitSuccess
adjust [2019-07-06 11:23:39.4137008] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-07-06 11:23:39.4273287] process done ExitSuccess
[2019-07-06 11:23:39.4276698] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-07-06 11:23:39.4436791] process done ExitSuccess
[2019-07-06 11:23:39.4440342] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--verify","-q","refs/heads/adjusted/master(unlocked)"]
[2019-07-06 11:23:39.4571275] process done ExitFailure 1
[2019-07-06 11:23:39.4579141] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-ref","refs/basis/adjusted/master(unlocked)","refs/heads/master"]
[2019-07-06 11:23:39.4740989] process done ExitSuccess
[2019-07-06 11:23:39.4745965] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","mktree","--batch","-z"]
[2019-07-06 11:23:39.4770107] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-tree","--full-tree","-z","-r","-t","--","refs/basis/adjusted/master(unlocked)"]
[2019-07-06 11:23:39.4940556] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2019-07-06 11:23:39.4960848] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2019-07-06 11:23:39.5412121] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","diff.external=","-c","filter.annex.smudge=","-c","filter.annex.clean=","diff","--cached","--raw","-z","--abbrev=40","-G^/annex/objects/","--diff-filter=AMUT","--no-renames","--ignore-submodules=all","--no-ext-diff"]
[2019-07-06 11:23:39.5743181] process done ExitSuccess
[2019-07-06 11:23:39.5754474] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--stdin-paths","--no-filters"]
[2019-07-06 11:23:39.5981937] process done ExitSuccess
[2019-07-06 11:23:39.5990927] process done ExitSuccess
[2019-07-06 11:23:39.6010222] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit-tree","69b3149d64ac2eb225c6f5336fa15c518dcd4d83","--no-gpg-sign","-p","refs/basis/adjusted/master(unlocked)"]
[2019-07-06 11:23:39.6194551] process done ExitSuccess
[2019-07-06 11:23:39.6199068] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-ref","-m","entering adjusted branch","refs/heads/adjusted/master(unlocked)","c4e772ae22f9b1146b798404d85268ea3ca40e03"]
[2019-07-06 11:23:39.6402592] process done ExitSuccess
[2019-07-06 11:23:39.6409432] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","checkout","adjusted/master(unlocked)","--progress"]
sqlite worker thread crashed: user error (SQLite3 returned ErrorProtocol while attempting to perform prepare "SELECT null from content limit 1": locking protocol(while opening database connection))
git-annex: thread blocked indefinitely in an MVar operation
error: external filter 'git-annex smudge -- %f' failed 1
error: external filter 'git-annex smudge -- %f' failed
Switched to branch 'adjusted/master(unlocked)'
sqlite worker thread crashed: user error (SQLite3 returned ErrorProtocol while attempting to perform prepare "SELECT null from content limit 1": locking protocol(while opening database connection))
git-annex: sqlite query crashed
CallStack (from HasCallStack):
error, called at ./Database/Handle.hs:98:42 in main:Database.Handle
git-annex: smudge: 1 failed
error: external filter 'git-annex smudge --clean -- %f' failed 1
error: external filter 'git-annex smudge --clean -- %f' failed
[2019-07-06 11:24:03.3626765] process done ExitSuccess
ok
[2019-07-06 11:24:03.372672] process done ExitSuccess
[2019-07-06 11:24:03.3742067] process done ExitSuccess
[2019-07-06 11:24:03.3756292] process done ExitSuccess
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes, on macOS & HFS+
I have seen several sqlite-related problems in WSL, they seem to come and go for me.
I guess this is WSL not emulating Linux very well and so breaking sqlite. There is not much that git-annex can do about it.
(Here's stack having to work around a similar problem https://github.com/commercialhaskell/stack/issues/4876 and the error message looks very similar. But git-annex cannot afford to disable WAL mode as that would break concurrent operation.)
I hear they've given up on emulating Linux syscalls and the next version of WSL will just use the Linux kernel. Which should avoid this problem.
I reproduced with git-annex from debian stable on WSL in MS Edge preview fall 2019. Also with a current git-annex autobuild.
Interestingly, using v7 with unlocked files not in an adjusted branch seems to work ok. Which seems to indicate that sqlite is generally working.
May be somehow caused by there being 2 git-annex processes running when it fails. They could be contending over the database in some way that works less well in WSL.
Backing up that theory, .git/annex/keys/db does get created and is populated with the right tables and data.
git annex add
on DrvFs. I don't think you can fix it, as it is apparantly a WSL problem, but I think it's good to keep track of it and warn potential usersI tried using Windows 10 build 19041 from https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/ That is new enough to support WSL2.
At first, this bug reproduced. Turns out that still has WSL1 by default. https://docs.microsoft.com/en-us/windows/wsl/install-win10 explains how to enable WSL2.
Unfortunately, those instructions failed at the final hurdle,
wsl --set-version Ubuntu 2
said Virtual Machine Platform needed to be enabled, or virtualization enabled in the BIOS. I had already done the former in an earlier step, so I guess VirtualBox is not enabling it in the BIOS.Probably it needs nested VT-x. VirtualBox has that option greyed out for me. https://github.com/microsoft/WSL/issues/5030 says this needs virtualbox to use Hyper-V, which needs a fix that landed 2 weeks ago, not yet in a released version. Or use another emulator, or maybe try it on real hardware?
Since WSL2 has terrible performance with the NTFS volumes already mounted in Windows, consumes more memory, and has higher hardware requirements, I'm still interested in using WSL1. I applied the patch to disable WAL from this comment, however now I get a different sqlite error that happens more often as well.
Without this patch other than adjusted branches, unlocked files generally do work in WSL1. Sqlite error may occur at the end of commands such as
git annex get/drop
and can be fixed by manually removing .git/index.lock and doing agit annex add
orgit reset
.@asakurareiko it makes sense it would fail that way with WAL disabled, since the sqlite database cannot support multiple writers then. And there are probably several situations where multiple git-annex processes end up using the database, even when you are only running a single git-annex command at a time.
Sounds like
restagePointerFile
, which tends to run at the end of such an operation to handle all the files that have been updated. That runsgit update-index
, which then runsgit-annex smudge
. So both the parent and child git-annex process can have the database open for write, which WAL mode normally supports, but something in WSL prevents it from working right.Following this theory, I've made
restagePointerFile
close the database first. Perhaps that will avoid the problem, at least in those cases. Your testing is appreciated.I tested 0f38ad9a6 with the test case below as well as with the repo I use and sqlite errors no longer occur. Adjusted branches still do not work but everything else with unlocked files seems to be ok now. Thank you Joey.
Setup:
Test:
@asakurareiko oh that's encouraging that I seem to be on the right track.
Although I was not aware that this test case in your comment #8 failed before?
I noticed that git-annex opened a second connection to the database for writes, in addition to the connection it used for reads. That seems likely to be involved in whatever locking problem there is on WSL.
Commit d0ef8303cf8c4f40a1d17bd134af961fd9917ca4 eliminates that second connection. But there's some chance I'll have to revert it.
If you test, please include
git-annex version
output so I can make sure you have a version with that change.The crash shows that runSqliteRobustly called
rethrow "while opening database connection"
, and I think it was in the "| otherwise" branch because the error is not Sqlite.ErrorIO.So, it may also possibly help to handle Sqlite.ErrorProtocol, which seems like what the actual error is from the message. Handling it the same as Sqlite.ErrorBusy would make opening the db be retried until whatever else had it open closes it, or finishes the operation that is causing the problem. On the other hand, that might make git-annex hang until another git-annex process exits, which would not be helpful. So perhaps it would be better to handle it like Sqlite.ErrorIO is handled, waiting for up to 1/10th of a second. But perhaps that would not be enough of a wait.
Anyway, this is a note to myself: If all else fails, try catching Sqlite.ErrorProtocol and experiment with different ways to handle it.
The error I get previously (before 0f38ad9a6) with my test case is
With d0ef8303c, the test case still works, but adjusted branches still have the same error.
produces
About
git-annex version
, I'm usingmake install-home
to do an incremental build but the version does not update.I found a new type of failure which occurs when there are new unlocked files in the index.
Something is happening to the files already in the index and the error is triggered once per file in the index.
Forgot to add in the previous comment. The index looks fine afterwards
@asakurareiko could you try this patch, and see if it fixes some/all of the remaining problems?
If that doesn't work, it's possible this version might somehow work better. At least it would be worth a try as well:
This second one might cause git-annex to get stuck and retry forever though, if it doesn't work.