Please describe the problem.

Since upgrading to 6.20180626-gdf91a5cff (pre-built OS X binary on OS X 10.11) it appears git annex importfeed is unable to properly download a RSS feed from one particular site -- it gets back 400 Bad Request from the website, and then unsurprisingly reports warning: bad feed content; no enclosures to download. The relevant RSS feed appears sane when downloaded from the same URL with wget or curl. The previous git-annex version I was using on this machine (before the security update), 6.20170320-g41c5d9d was able to download this RSS feed (I realise that's a rather large window of changes!).

It appears the older version (6.20170320-g41c5d9d) used wget to download the podcast RSS, and the newer version (6.20180626-gdf91a5cff) seems to fetch the RSS internally in a way that does not work with this RSS feed (nor does the newer version appear to provide debug output of what it is doing at that step that goes wrong, unlike the previous version which shows the explicit wget command run, even if multiple --debug and/or --verbose options are provided).

All other podcast RSS feeds work on both versions.

As best I can tell from packet captures and telnet debugging, the issue is that the new git annex importfeed does not send a User-Agent: header. Possibly some anti-DoS/anti-spam front end is rejecting it as a result?

Can a suitable User-Agent: header be added to the git annex importfeed HTTP requests?

What steps will reproduce the problem?

Attempt to git annex importfeed the feed 'https://theythempodcast.com/episodes?format=RSS'. The problem also appears reproducible with 'http://theythempodcast.com/episodes?format=RSS' (ie, unencrypted), which makes debugging a wee bit simpler (ie, packet captures can help).

Failing, on 6.20180626-gdf91a5cff:

ewen@ashram:~/Music/podcasts$ git annex --debug importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS' 2>&1 
importfeed checking known urls [2018-07-15 13:36:53.298188] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2018-07-15 13:36:53.306676] process done ExitSuccess
[2018-07-15 13:36:53.306768] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2018-07-15 13:36:53.314672] process done ExitSuccess
[2018-07-15 13:36:53.315353] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--","."]
[2018-07-15 13:36:53.325249] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2018-07-15 13:36:53.32666] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2018-07-15 13:36:53.768979] process done ExitSuccess
ok
importfeed https://theythempodcast.com/episodes?format=RSS download failed: <html>
<head>
<title>400 Bad Request</title>
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
</head>
<body>
<h1>400 Bad Request</h1>
<p><pre>e64OsAKG/PqPs52sZ @ Sun, 15 Jul 2018 01:36:57 GMT</pre>
<p><pre>SEC-43</pre>
<p><pre></pre>
</body>
</html>

  warning: bad feed content; no enclosures to download
ok
[2018-07-15 13:36:58.076606] process done ExitSuccess
[2018-07-15 13:36:58.077183] process done ExitSuccess
ewen@ashram:~/Music/podcasts$ 

Working, on 6.20170320-g41c5d9d:

ewen@ashram:~/Music/podcasts$ /Applications/OpenSource/git-annex-2017-03-20.app/Contents/MacOS/git-annex importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS'
importfeed checking known urls ok
importfeed https://theythempodcast.com/episodes?format=RSS 
/var/folders/p1/gpj 100%[===================>]  32.62K  61.4KB/s    in 0.5s    
2018-07-15 13:31:17 URL:https://theythempodcast.com/episodes?format=RSS [33404/33404] -> "/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249" [1]
ok
ewen@ashram:~/Music/podcasts$ /Applications/OpenSource/git-annex-2017-03-20.app/Contents/MacOS/git-annex --debug importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS' 2>&1 | grep -A 1 "format=RSS"
importfeed https://theythempodcast.com/episodes?format=RSS 
[2018-07-15 13:33:05.367548] call: wget ["-nv","--show-progress","--clobber","-c","-O","/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249","https://theythempodcast.com/episodes?format=RSS","--user-agent","git-annex/6.20170320-g41c5d9d"]

     0K .......... .......... .......... ..                   100%  121K=0.3s2018-07-15 13:33:07 URL:https://theythempodcast.com/episodes?format=RSS [33404/33404] -> "/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249" [1]
[2018-07-15 13:33:07.050443] process done ExitSuccess
ewen@ashram:~/Music/podcasts$ 

In both cases the TEMPLATE value is the same one I've used for years, and works with all the other podcast feeds:

ewen@ashram:~/Music/podcasts$ set | grep TEMPLATE
TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}'
ewen@ashram:~/Music/podcasts$ 

What version of git-annex are you using? On what operating system?

OS X 10.11 (El Capitan), using the pre-built OS X binaries downloaded just after the security release came out.

ewen@ashram:~/Music/podcasts$ git annex version
git-annex version: 6.20180626-gdf91a5cff
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV FsEvents ConcurrentOutput TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.17.1 bloomfilter-2.0.1.0 cryptonite-0.23 DAV-1.3.1 feed-0.3.12.0 ghc-8.0.2 http-client-0.5.7.0 persistent-sqlite-2.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.4.5
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
operating system: darwin x86_64
supported repository versions: 3 5 6
upgrade supported from repository versions: 0 1 2 3 4 5
local repository version: 5
ewen@ashram:~/Music/podcasts$ 

Please provide any additional information below.

Working request (created with wget, then replicated with telnet):

ewen@ashram:~$ telnet 198.185.159.144 80
Trying 198.185.159.144...
Connected to 198.185.159.144.
Escape character is '^]'.
GET /episodes?format=RSS HTTP/1.1
User-Agent: Wget/1.19.5 (darwin15.6.0)
Accept: */*
Accept-Encoding: identity
Host: theythempodcast.com
Connection: Keep-Alive
Range: bytes=0-

HTTP/1.1 301 Moved Permanently
Date: Sun, 15 Jul 2018 02:02:53 GMT
X-ServedBy: web012
Location: https://theythempodcast.com/episodes?format=RSS
Transfer-Encoding: chunked
x-contextid: ZBu72tcg/OJSeq1q9
x-via: 1.1 echo017

0

^]
telnet> quit
Connection closed.
ewen@ashram:~$ 

Failing request (as sent by git-annex importfeed):

ewen@ashram:~$ telnet 198.185.159.144 80
Trying 198.185.159.144...
Connected to 198.185.159.144.
Escape character is '^]'.
GET /episodes?format=RSS HTTP/1.1
Host: theytheypodcast.com
Range: bytes=0-
Accept-Encoding: identity

HTTP/1.1 400 Bad Request
content-length: 378
x-synthetic: true
expires: Thu, 01 Jan 1970 00:00:00 UTC
pragma: no-cache
cache-control: no-cache, must-revalidate
content-type: text/html; charset=UTF-8
connection: close
date: Sun, 15 Jul 2018 02:01:14 UTC
x-contextid: 1WIJdGgE/Y1Tq8lJu
x-via: 1.1 echo008

<html>
<head>
<title>400 Bad Request</title>
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
</head>
<body>
<h1>400 Bad Request</h1>
<p><pre>1WIJdGgE/Y1Tq8lJu @ Sun, 15 Jul 2018 02:01:14 GMT</pre>
<p><pre>SEC-43</pre>
<p><pre></pre>
</body>
</html>Connection closed by foreign host.
ewen@ashram:~$ 

Also failing request, with minimal HTTP/1.1 headers:

ewen@ashram:~$ telnet 198.185.159.144 80
Trying 198.185.159.144...
Connected to 198.185.159.144.
Escape character is '^]'.
GET /episodes?format=RSS HTTP/1.1
Host: theythempodcast.com

HTTP/1.1 400 Bad Request
content-length: 378
x-synthetic: true
expires: Thu, 01 Jan 1970 00:00:00 UTC
pragma: no-cache
cache-control: no-cache, must-revalidate
content-type: text/html; charset=UTF-8
connection: close
date: Sun, 15 Jul 2018 02:07:14 UTC
x-contextid: c05RnFlG/h78z0OqB
x-via: 1.1 echo034

<html>
<head>
<title>400 Bad Request</title>
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
</head>
<body>
<h1>400 Bad Request</h1>
<p><pre>c05RnFlG/h78z0OqB @ Sun, 15 Jul 2018 02:07:14 GMT</pre>
<p><pre>SEC-43</pre>
<p><pre></pre>
</body>
</html>Connection closed by foreign host.
ewen@ashram:~$ 

Apparently minimal working request:

ewen@ashram:~$ telnet 198.185.159.144 80
Trying 198.185.159.144...
Connected to 198.185.159.144.
Escape character is '^]'.
GET /episodes?format=RSS HTTP/1.1
User-Agent: Wget/1.19.5 (darwin15.6.0)
Host: theythempodcast.com

HTTP/1.1 301 Moved Permanently
Date: Sun, 15 Jul 2018 02:09:09 GMT
X-ServedBy: web025
Location: https://theythempodcast.com/episodes?format=RSS
Transfer-Encoding: chunked
x-contextid: veJm4JC0/sT3ENh27
x-via: 1.1 echo017

0

^]
telnet> quit
Connection closed.
ewen@ashram:~$ 

Note that the only addition was adding a User-Agent: header (copied from the wget request). The redirect to HTTPS is expected here, as we are initially requesting via HTTP for telnets benefit (I did try to get openssl s_client working, but it appeared the TLS connection timed out before I could manually complete a request, and I had no luck piping the request into openssl s_client).

Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

Absolutely :-) git annex has been my podcatcher, and media management tool for years. Thanks for writing it!

fixed --Joey