Please describe the problem.
After updating my Synology DS216+ NAS from 6.2-23739-1 to 6.2-23739-2, running git-annex
with any non-trivial command, locally on the NAS or remotely on the NAS results in:
git-annex: timer_create: Bad address
I'm guessing that this means that a library function the standalone binary needs is no longer present, but I'm unsure why. The -1
to -2
update seems to only have two security fixes (SA 18-36, and SA 18-01 -- the latter being a Spectre/Meltdown bug).
Do you know if timer_create
is a kernel system call? Or a libc (etc) library function?
What steps will reproduce the problem?
Update Synology NAS to 6.2-23739-2, run git annex sync
, git annex version
or similar (remotely or locally).
What version of git-annex are you using? On what operating system?
x86-32
, stand alone build, for ancient-kernels, as that the 64-bit stand alone build no longer seemed to work due to locale issues (see note added at the end).
I believe it is 6.20180626
, but git annex version
currently also fails...
Please provide any additional information below.
ewen@nas01:/volume1/music/podcasts$ hostname --fqdn
nas01
ewen@nas01:/volume1/music/podcasts$ git annex sync
git-annex: timer_create: Bad address
ewen@nas01:/volume1/music/podcasts$
ewen@nas01:/volume1/music/podcasts$ git annex version
git-annex: timer_create: Bad address
ewen@nas01:/volume1/music/podcasts$
ewen@nas01:/volume1/music/podcasts$ uname -mr
3.10.105 x86_64
ewen@nas01:/volume1/music/podcasts$
There is a 6.2.1 firmware release for the Synology NAS released a couple of days ago, but it does not yet seem to be visible to my NAS. I can try that one if it'd help.
I can also try switching back to one of the more modern x86-64
/ x86-32
stand alone builds if that'd help. But then I'd need some assistance with working around the:
sh: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
error: git-annex died of signal 6
error I was seeing on those builds. (From other work elsewhere it feels a lot like the LC_TIME structure changed in size in some libc definition somewhere in the last 6-12 months; in another project we had a similar break on Ubuntu 18.04 issues which didn't affect Ubuntu 16.04 and earlier.)
In case it helps, out of the box defaults give:
ewen@nas01:/volume1/music/podcasts$ echo $LANG
en_US.utf8
ewen@nas01:/volume1/music/podcasts$ ls /volume1/thirdparty/git-annex.linux/locales/en_US.utf8/
LC_ADDRESS LC_IDENTIFICATION LC_MONETARY LC_PAPER
LC_COLLATE LC_MEASUREMENT LC_NAME LC_TELEPHONE
LC_CTYPE LC_MESSAGES LC_NUMERIC LC_TIME
ewen@nas01:/volume1/music/podcasts$
(although those files are currently generated with the now broken x86-32 legacy standalone build).
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Definitely. It was working until I updated the NAS firmware image this afternoon... for this repository holding years of podcasts and many others.
From some digging around it looks like
timer_create
is a Linux system call. So presumably the issue with the Synology NAS is being caused by the kernel being updated.Of note, it appears that Microsoft Windows System for Linux does not support
timer_create
either. And that system call is apparently required for Haskell binaries... Apparently (from this thread) other people have hit this issue trying to rungit-annex
too.There's a suggestion of some work around flags, but they do not seem to work with the
git-annex
stand alone binary as it is currently built:Possibly it'll be necessary to try to persuade Synology to re-enable the
timer_create
system call. But my guess is maybe they turned it off as a Spectre/Meltdown fix, so it may be non-trivial to persuade them to change it back.Ewen
FWIW, I've created a Synology Account, to create a Synology Support Request, #2082132 for the missing timer_create system call on DSM 6.2-23739-2. I'm not holding out a lot of hope given it's a third party application running on the NAS, and not even one in their app store, but maybe there's a chance it'll get the system call turned back on again. That Synology support request includes a link to this bug report.
Ewen
PS: I expect that bug report will only be visible if you're logged in with my Synology account. But at least someone else reporting it can reference the same issue to Synology Support.
The Synology reply was "our developers suspected that this is related to some library linking problem in the kernel", and they asked for remote access to my device -- which I wasn't very keen on so I did some experimentation of my own.
With the latest standalone
x86-64
build, I do not get thetimer_create
error. So it does seem like there's a linking issue around accessingtimer_create
from the "ancient" builds on the current Synology DS216+ kernel/libc.However, the standalone
x86-64
builds do fail withLC_TIME
issues by default:(and have for a while; it was why I switched to trying the Ancient 32-bit build earlier this year). It appears at some point the
LC_TIME
structure changed in size, which means that mixing statically linked binaries withlocale
information that is older/newer causes problems.Fortunately that problem is now common enough (etc), due to people encountering it with, eg Ubuntu 18.04 LTS, that it's widely known, and there's a pretty easy work around:
(which overrides the Synology shell default of
LC_ALL=en_US.utf8
). Just settingLC_TIME
does not seem to be sufficient if eitherLC_ALL
orLANG
are set to other values (with the Synology shell does by default).That
LC_ALL=C
override works for me, so I've tweaked mygit-annex-shell
,git-annex-wrapper
, andgit-annex
shell scripts to do that before carrying on to call the rest of thegit-annex
tools, and everything seems to work again. Possibly thestandalone
builds should do that automatically?Ewen
IIRC the haskell runtime stopped using
timer_create
after ghc 7 or something like that. These days it uses poll.That said,
timer_create
is part of POSIX and part of the linux API and that is unlikely to change. There might be something in the libc/kernel interface that has changed in newer kernels, if so it may be that the i386ancient build is not going to work with new kernels.. But it's there to support old kernels anyway.As for the setlocale failure, while LC_ALL=C avoids it, that's not something I want to set by default, because people do use git-annex with unicode filenames and that can impact that, as well as of course preventing display of any translated messages by anything git-annex runs (though git-annex itself is not translated). It would be good to get to the bottom of the setlocale failure.
It does very much look like something has removed
timer_create
from the Synology NAS. Eg, if I statically compile the exampletimer_create
program in the man page on Debian Jessie/Stretch with:and try to run it on the Synology NAS now, I get effectively the same symptoms:
(Unfortunately I have no easy way to check if with a previous Synology NAS version.)
The same manpage notes that it can be configured out of the kernel: "Since Linux 4.10, support for POSIX timers is a configurable option that is enabled by default. Kernel support can be disabled via the CONFIG_POSIX_TIMERS option." for which the patch notes that "Some embedded systems have no use for them". Obviously the Synology NAS is an embedded kernel situation. Given that the tripping point was the release where Synology introduced their Meltdown/Spectre fixes, I imagine that they backported a lot of related fixes from later kernels, and it seems likely that they did deliberately turn off
CONFIG_POSIX_TIMERS
, even if their 1st level/2nd level helpdesk did not know about it, since timing RAM accesses is pretty much key to the common Meltdown/Spectre exploits, and thus various projects removed "high resolution" timers. (Unfortunately there's no/proc/config.gz
or similar that I can see, and while I can find some Synology open source bits I got bored digging to try to find a kernel config file.)The fact that later Haskell switched to
poll
probably explains why the later build works. At least that seems to a viable path forward for usinggit-annex
on my NAS for now.Ewen
From some hunting around I can find reports of that
loadlocale.c
issue withLC_TIME
throughout 2018. It seems to be more commonly encountered now that things like Ubuntu 18.04 are released with the newer glibc. (There's also aLC_COLLATE
change somewhere around mid/late 2015, which seems to be the previous change in data format impacting people in this way.)The generally understood wisdom seems to be that it's caused by locale data compiled by older (glibc) tool versions than the ones being used to load it (eg, statically linked into the binaries). This is at least the second context in which I've come across it, in the last month, myself. Both being caused by statically linked binaries built on Linux versions older/newer than the one on which they were being run, approximately across the 2017/2018 calendar boundary.
On the Synology NAS this seems to be the locale tools version:
which is a tiny bit newer than Debian Jessie:
and somewhat older than Debian Stretch (2.24), Debian Unstable (2.27), or Ubuntu 18.04 which is also 2.27. My guess from the timing is that something in the
lc_time
structure size changed between about glibc 2.24 and 2.27, maybe late 2017 or early 2018.Digging in the glibc git tree turns up commit f301e5334065e93aace667fd4a87bce6fc1dbd13, from 2017-10-27 which foreshadows a change with "Now when we are about to add alternative month names to LC_TIME (BZ#10871) this will fail again.". This appears to be BugZilla #10871 which had been open forever, but the patches seem to have been finally pushed on 2018-01-22. Which is about the right timing for the problems seen. Building locales with a glibc after that and loading with an older libc (statically linked), or vice versa, is going to break. (I think this might be the commit that breaks things, and if not, it's very nearby that commit in the history. Applied 2018-01-22.)
I don't see a work around for statically linked binaries using system locale files, short of building versions with code from on either side of that flag date and suggesting people use the right version depending on their locale files...
... the best kludge I've thought of so far that might work is to turn
LC_ALL
intoLC_...
settings for each individual one, exceptLC_TIME=C
, in the hope that maybe that'll cause the (changed) time part not to load. But I haven't tested that myself.Ewen
PS: One might hope that this change could have been done in a backwards compatible manner. But that does not appear to be the case here. glibc seem to have been perfectly fine with creating a flag day, presumably reasoning distros can always force the locales to be recompiled so they'll always stay in sync.
In case it helps, I think this is the change to the
lc_time
structure definition, and this is an example of the change needed to a program loading them. Looks like two new arrays (onechar *
, one wide char), each 12 entries long, and maybe a flag for whether the "alt month" (names) are present (alt_mon_defined
), but I'm not clear whether or not that is in the loaded files. Both linked from the commit I mentioned in the previous comment.While the parser in that program seems to be able to cope with various formats to some extent, the loadlocale.c code has an assert check on size that makes it more brittle. (And at this point I think I've seen it fail both with "newer locale files, older binary" and "older locale files, newer binary", so I suspect both are at least "untested".)
For the Synology NAS case building on Debian Stretch will might still work, and building on Debian Jessie looks like it should still work, but building on Debian Testing/Unstable presumably will not. Presumably this incompatibility issue will just get more common over 2018/2019 as more things are/aren't upgraded. Joy.
Ewen
FTR, Synology finally released the NAS DSM 6.2.1-23824 (OS) software to my region (New Zealand; last I assume) so I've been able to install the September software update. That does work with the 64-bit Git Annex standalone version, providing the
LC_ALL=C
workaround is still in place:so at least for now I'm leaving my hand-edited work around in place:
Interestingly I hadn't noticed previously that it seems like its
rm
andcmp
and the like which are running into locale loading problems, at least now. Ironically line 132 is the point where it tries to clean up locale caches...and 142 is the bit that tries to refresh the locale cache:
so it's possible the new issue might just be that bits of the git annex work around for locale issues are now unreachable... due to locale issues. In which case maybe only the work around step needs
LC_ALL=C
? I haven't experimented with that, but it does seem plausible that if it can get far enough to build/use locales in the right format for its own libc, that it'd then work properly.Ewen