You can use git-annex as a podcatcher, to download podcast contents. No additional software is required, but your git-annex must be built with the Feeds feature (run git annex version to check).

All you need to do is put something like this in a cron job:

cd somerepo && git annex importfeed http://url/to/podcast http://other/podcast/url

This downloads the urls, and parses them as RSS, Atom, or RDF feeds. All enclosures are downloaded and added to the repository, the same as if you had manually run git annex addurl on each of them.

git-annex will avoid downloading a file from a feed if its url has already been stored in the repository before. So once a file is downloaded, you can move it around, delete it, git annex drop its content, etc, and it will not be downloaded again by repeated runs of git annex importfeed. Just how a podcatcher should behave.

templates

To control the filenames used for items downloaded from a feed, there's a --template option. The default is --template='${feedtitle}/${itemtitle}${extension}'

Other available template variables:
feedauthor, itemauthor, itemsummary, itemdescription, itemrights, itemid, itempubdate

catching up

To catch up on a feed without downloading its contents, use git annex importfeed --relaxed, and delete the symlinks it creates. Next time you run git annex addurl it will only fetch any new items.

fast mode

To add a feed without downloading its contents right now, use git annex importfeed --fast. Then you can use git annex get as usual to download the content of an item.

storing the podcast list in git

You can check the list of podcast urls into git right next to the files it downloads. Just make a file named feeds and add one podcast url per line.

Then you can run git-annex on all the feeds:

xargs git-annex importfeed < feeds

distributed podcatching

A nice benefit of using git-annex as a podcatcher is that you can run git annex importfeed on the same url in different clones of a repository, and git annex sync will sync it all up.

centralized podcatching

You can also have a designated machine which always fetches all podcstas to local disk and stores them. That way, you can archive podcasts with time-delayed deletion of upstream content. You can also work around slow downloads upstream by podcatching to a server with ample bandwidth or work around a slow local Internet connection by podcatching to your home server and transferring to your laptop on demand.

youtube playlists

If your git-annex is also built with quvi support, you can also use git annex importfeed on youtube playlists. It will automatically download the videos linked to by the playlist.

It seems that some of my feeds get stored into keys that generate a too long filename:

podcasts/.git/annex/tmp/b1f_325_URL-s143660317--http&c%%feedproxy.google.com%~r%mixotic%~5%urTIRWQK2OQ%Mixotic__258__-__Michael__Miller__-__Galactic__Technolgies.mp3.log.web:
openBinaryFile: invalid argument (File name too long)

Is there a way to work around this?

Comment by ckeen Tue Jul 30 14:39:44 2013
@ckeen You seem to be using a filesystem that does not support filenames 150 characters long. This is unusual -- even windows and android can support a filename up to 255 characters in length. git-annex addurl already deals with this sort of problem by limiting the filename to 255 characters. If you'd like to file a bug report with details about your system, I can try to make git-annex support its limitations, I suppose.
Comment by joeyh.name Tue Jul 30 17:16:07 2013

Looking forward to seeing it in Debian unstable; where it will definitely replace my hpodder setup.

I guess there is no easy way to re-use the files already downloaded with hpodder? At first I thought that git annex importfeed --relaxed followed by adding the files to the git annex would work, but importfeed stores URLs, not content-based hashes, so it wouldn’t match up.

Comment by nomeata Tue Jul 30 21:21:57 2013

@nomeata, well, you can, but it has to download the files again.

When run without --fast, importfeed does use content based hashes, so if you run it in a temporary directory, it will download the content redundantly, hash it and see it's the same, and add the url to that hash. You can then delete the temporary directory, and the files hpodder had downloaded will have the url attached to them now. I don't know if this really buys you anything over deleting the hpodder files and starting over though.

Comment by joeyh.name Tue Jul 30 21:29:50 2013
Currently I have my podcasts imported with --fast. For some reason there are podcast episodes missing. This has been done propably during my period of toying with the feature. If I retry on a clean annex I see all episodes. My suspicion is that git-annex has been interrupted during downloading a feed but now somehow thinks it's already there. How can I debug this situation and/or force git annex to retry all the links in a feed?
Comment by ckeen Wed Jul 31 10:35:50 2013

The only way it can skip downloading a file is if its url has already been seen before. Perhaps you deleted them?

I've made importfeed --force re-download files it's seen before.

Comment by joeyh.name Wed Jul 31 16:20:39 2013
Is it intentionally saving URLs with a prefixed 2_? I have sorted out all missing URLs and renamed it, so no harm done, but it has been a bit of a hassle to get there.
Comment by ckeen Thu Aug 1 09:47:34 2013
I've now made importfeed --force a bit smarter about reusing existing files.
Comment by joeyh.name Thu Aug 1 16:05:10 2013

Joey - your initial post said:

git-annex must be built with the Feeds feature (run git annex version to check).

...but how do I actually switch on the feeds feature?

I install git-annex from cabal, so I do

cabal update
cabal install git-annex

which I did this morning and now git annex version gives me:

git-annex version: 4.20130802
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV FsEvents XMPP DNS

So it is the latest version, but without Feeds. :-(

Comment by a-or-b [myopenid.com] Mon Aug 5 04:52:41 2013
cabal install feed should get the necessary library installed so that git-annex will build with feeds support.
Comment by joeyh.name Mon Aug 5 16:47:30 2013
$ cabal install feed
Resolving dependencies...
All the requested packages are already installed:
feed-0.3.9.1
Use --reinstall if you want to reinstall anyway.

Then I reinstalled git-annex but it still doesn't find the feeds flag.

$ git annex version
git-annex version: 4.20130802
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV FsEvents XMPP DNS

Do I need to do something like:

cabal install git-annex --bindir=$HOME/bin -f"-assistant -webapp -webdav -pairing -xmpp -dns -feed"

...but what are the default flags to include in addition to -feed

Comment by a-or-b [myopenid.com] Tue Aug 6 04:20:16 2013

-f-Feed will disable the feature. -fFeed will try to force it on.

You can probably work out what's going wrong using cabal install -v3

Comment by joeyh.name Tue Aug 6 04:24:10 2013

So I ran cabal install -v3 and looked at the output,

Flags chosen: feed=True, tdfa=True, testsuite=True, android=False,
production=True, dns=True, xmpp=True, pairing=True, webapp=True,
assistant=True, dbus=True, inotify=True, webdav=True, s3=True

This looks like feed should be on.

There doesn't appear to be any errors in the compile either.

Is it as simple as a bug where this flag just doesn't show in the git annex version command?

Comment by a-or-b [myopenid.com] Tue Aug 6 05:42:45 2013
Yes, it did turn out to be as simple as my having forgotten that I have to manually add features to the version list.
Comment by joeyh.name Wed Aug 7 16:03:12 2013
It seems git-annex is a bit overzealous when sanitizing the file extension, currently I get: "Nerdkunde/Let_s_go_to_the_D_M_C_A_m4a" from http://www.nerdkunde.de/episodes.m4a.rss with the default template and only "Nerdkunde/Let_s_go_to_the_D_M_C_A._m4a" if I add the "." in the template myself...
Comment by 23.gs Mon Aug 12 13:21:50 2013
The filename extension is a known issue and already fixed in the development version, see http://git-annex.branchable.com/bugs/importfeed_uses___34____95__foo__34___as_extension/
Comment by arand Mon Aug 12 13:32:46 2013
If a podcast requires authentication, is there a way to pass credentials through? I tried http://user:pass@site.com/rss.xml but it didn't work.
Comment by Stephen Tue Aug 13 13:32:52 2013

Hi,

the explanations to --fast and --relaxed on this page could be extended a bit. I looked it up in the man page, but it is not yet clear to me when I would use one or the other with feeds. Also, does “Next time you run git annex addurl it will only fetch any new items.” really only apply to --relaxed, and not --fast?

Furthermore, it would be good if there were a template variable itemnum that I can use to ensure that ls prints the casts in the right order, even when the titles of the items are not helpful.

Greetings, Joachim

Comment by nomeata Fri Aug 16 07:27:59 2013
I would expect user:pass@site.com to work if the site is using http basic auth. importfeed just runs wget (or curl) to do all downloads, and wget's documentation says that works. It also says you can use ~/.netrc to store the password for a site.
Comment by joeyh.name Thu Aug 22 15:25:02 2013

The git-annex man page has a bit more to say about --relaxed and --fast. Their behavior when used with importfeed is the same as with addurl.

If the podcast feed provides an itemid, you can use that in the filename template. I don't know how common that is. Due to the way importfeed works, it cannot keep track of eg, an incrementing item number itself.

Comment by joeyh.name Thu Aug 22 15:29:11 2013