Please describe the problem.
Since upgrading to git-annex version 10.20250925, from the macOS HomeBrew build, git annex importfeed
seems to usually fail with an Enum.toEnum{Word8}
out of bounds error. The exact value reported for the out of bounds value varies between feed URLs, but from a little bit of testing the error appears deterministic between those feed URLs.
A few examples (plus one more in the reproducer below):
importfeed https://popculturedetective.agency/feed/podcast
git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
failed
importfeed https://contextualelectronics.com/feed/podcast/
git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
failed
importfeed https://theamphour.libsyn.com/rss
git-annex: Enum.toEnum{Word8}: tag (8211) is outside of bounds (0,255)
failed
(A couple of podcast feeds with no new changes just report "ok"; but I'd also expect most of the above to not have any recent changes as they're weekly-or-less podcasts.)
What steps will reproduce the problem?
Indicative example (one of the feed URLs I follow; but it's happening on all of most of them that all worked with the previous version of git-annex):
ewen@basadi:/tmp/podcasts$ git init
Initialized empty Git repository in /private/tmp/podcasts/.git/
ewen@basadi:/tmp/podcasts$ git annex init 'Test repo'
init Test repo ok
(recording state in git...)
ewen@basadi:/tmp/podcasts$ TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}'
ewen@basadi:/tmp/podcasts$ git annex importfeed --template="${TEMPLATE}" "https://risky.biz/feeds/risky-business"
importfeed gathering known urls ok
importfeed https://risky.biz/feeds/risky-business
git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
failed
importfeed: 1 failed
ewen@basadi:/tmp/podcasts$
The --template
part does not seem necessary to the reproducer either, as I get the same error without (it's just the --template
is in my standard run that I've used for years):
ewen@basadi:/tmp/podcasts$ git annex importfeed "https://risky.biz/feeds/risky-business"
importfeed gathering known urls ok
importfeed https://risky.biz/feeds/risky-business
git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
failed
importfeed: 1 failed
ewen@basadi:/tmp/podcasts$
What version of git-annex are you using? On what operating system?
ewen@basadi:~$ git annex version
git-annex version: 10.20250925
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV OsPath
dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask
operating system: darwin x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
ewen@basadi:~$
on macOS 15.6.1 (Sequoia), which is the latest release apart from the macOS 26 released this month. On Intel in this case, but seems to also reproduce on the same macOS 15.6.1 (Sequoia) on Apple M2 processor, with the same HomeBrew build of git-annex.
Please provide any additional information below.
ewen@basadi:/tmp/podcasts$ git annex --verbose --verbose importfeed --verbose --verbose "https://risky.biz/feeds/risky-business"
importfeed gathering known urls ok
importfeed https://risky.biz/feeds/risky-business
git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
failed
importfeed: 1 failed
ewen@basadi:/tmp/podcasts$
At this stage I don't know if this is specific to importfeed
or specific to the HomeBrew build of git-annex.
Other annexes tracking files do seem to work (git annex add
/ git annex sync
/ git annex copy ...
all work) with this version of git-annex. So I suspect it's somehow specific to importfeed and/or the HomeBrew build.
And it seems a fairly recent breakage, as IIRC the previous installed was from 2025-08.
HomeBrew git-annex package information
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes, for many years. git-annex has worked vey well for downloading/collecting podcasts for years, which is why t was surprising it's suddenly failing like this.
For context, previous HomeBrew build that I would have had installed, and working, before was
20250828
.Ewen
Having found
--debug
(by trying to scan the source; I barely know Haskell, but found almost no explicittoEnum
and none that have changed in the last month AFAICT), it does seem like it's getting as far as downloading the feed URL contents, and then failing (presumably on doing something about parsing it).Request headers (via
curl
; note there's a 301 redirect, but askinggit-annex
to download the version at the end of the redirect doens't change thegit-annex
symptoms):My hunch is that this error occurs during parsing the feed XML, based on not getting to the feed title and "ok" being displayed in the error case. But I'm not sure if there's a specific way to test just that.
Example of a podcast feed that still works:
https://www.2600.com/oth-broadband.xml
There's no redirect on this one, and the
Content-Type
header has an explicitcharset=utf-8
, but so far I don't know if that matters.The failing feed has
encoding="utf-8"
in the<?xml ...?>
header of the file, which in theory is functionally equivalent in terms of XML communicating how to expect the file to be encoded. But maybe git-annex is not treating that the same any longer?second import attempt above, matching what my podcast downloads normally do; the first one was also
--relaxed
but with--debug
and the debug output is quote long, so here's just the start of it, showing it got a lot further than the feeds that don't work:Based on looking at some examples, I'm fairly convinced that the podcast feeds are now being parsed into 8 bit characters (extended ASCII?), even when (only when?) they have
encoding="UTF-8"
on the<?xml ...?>
prelude tag. UTF-8 decoding can obviously can easily result in characters outside the 8-bit range, which seems to be the exception thrown, based on examining the feed contents (below) and the "tag" values outside range.8217 == 0x2019 (in hex).
And U+2019 is a single quotation mark, which encodes in UTF-8 as
0xE2 0x80 0x99
.The first problematic feed is littered with that exact byte sequence:
Another of the problematic feeds (reported as 8211; see first post) has lots of the UTF-8 sequence
e2 80 93
for U+2103 (an en dash), and 8211 == 0x2013:The working feed appears to have no non-ASCII characters in it:
So it appears non-ASCII UTF-8 encoding is required to trigger this problem.
Ewen
I think the relevant change is likely to be:
from https://git-annex.branchable.com/bugs/35_failed_tests_on_beegfs/#comment-d7e4cf0592937215e3acd3c08c03288c
Based on the fact that's a 2025-09-04 change (so since previous release), refers to
parseFeedFromFile
, and the relevant commit seems to be:http://source.git-annex.branchable.com/?p=source.git;a=commit;h=2b1e9eced2fe825c882b4e9549a3a12f41d08055
and particular in this file:
http://source.git-annex.branchable.com/?p=source.git;a=blobdiff;f=Command/ImportFeed.hs;h=e36e72370204ece44a05bfae5954272a46f34f5c;hp=7b66a2b5077613b7e33dc8597a8272e7fdea7102;hb=2b1e9eced2fe825c882b4e9549a3a12f41d08055;hpb=56cd59a9f4e24c5a6842179e0da9180875d837cc
My reading of that code is that the feed parsing switched from (implicitly) "just bytes" (
openBinaryFile
) to decoding UTF-8 into full UTF-8 characters, but there's either (a) something in the later git-annex code or (b) the XML parser that does not expect to receive non-ASCII Unicode characters resulting from opening in "character" mode rather than "binary" mode, resulting in out of range values.Which results in the crash on encountering the first non-ASCII character in the feed
It's not clear to me why in fixing "set close-on-exec bit on open files" the feed parsing was changed from bytes (binary mode) to decoded characters. But it appears it wasn't tested on feeds where the text has been through a wordprocessor throwing in smart quotes and smart dashes and the like all over the place.
Ewen
Thanks for some really good detective work @ewen.
Note that this only happens when git-annex is built with the OsPath build flag.
That seems to indicate that the problem is in Utility.FileIO.openBinaryFile, which is the only way that parseFeedFromFile' varies depending on that build flag.
Aha yes, the problem is that uses withOpenFileEncoding, which is inappropriate for a binary file!
Thanks for the very quick turn around on a new release!
Conveniently HomeBrew also turned around building the new release quickly (I suspect it might be one of the packages in their CI for auto upgrade now), so I've been able to test the HomeBrew build of 20050929.
20250929 seems to be working correctly to download podcast feeds, parse them, and download the media attachments as before.
Ewen
PS: Test example below. But also worked for my regular podcast downloads, which were failing with 20250926.