NAME
git-annex importfeed - import files from podcast feeds
SYNOPSIS
git annex importfeed [url ...]
DESCRIPTION
Imports the contents of podcasts and other rss and atom feeds. Only downloads files whose content has not already been added to the repository before, so you can delete, rename, etc the resulting files and repeated runs won't duplicate them.
When yt-dlp
is installed, it can be used to download links in the feed.
This allows importing e.g., YouTube playlists.
(However, this is disabled by default as it can be a security risk.
See the documentation of annex.security.allowed-ip-addresses
in git-annex(1) for details.)
To make the import process add metadata to the imported files from the feed,
git config annex.genmetadata true
By default, the downloaded files are put in a directory with the title of the feed, and files are named based on the title of the item in the feed. This can be changed using the --template option.
Existing files are not overwritten by this command. If "some feed/foo.mp3" already exists, it will instead write to "some feed/2_foo.mp3" (or 3, 4, etc). Sometimes a feed will change an item's url, resulting in the new url being downloaded to such a filename.
OPTIONS
--force
Force downloading items it's seen before.
--fast
,--relaxed
,--verifiable
,--raw
,--raw-except
These options behave the same as when using git-annex-addurl(1).
--fast
Avoid immediately downloading urls. The url is still checked (via HEAD) to verify that it exists, and to get its size if possible.
--relaxed
Don't immediately download urls, and avoid storing the size of the url's content. This makes git-annex accept whatever content is there at a future point.
--raw
Prevent special handling of urls by yt-dlp, bittorrent, and other special remotes. This will for example, make importfeed download a .torrent file and not the contents it points to.
--no-raw
Require content pointed to by the url to be downloaded using yt-dlp or a special remote, rather than the raw content of the url. if that cannot be done, the import will fail, and the next import of the feed will retry.
--scrape
Rather than downloading the url and parsing it as a rss/atom feed to find files to import, uses yt-dlp to screen scrape the equivilant of a feed, and imports what it found.
--template
Controls where the files are stored.
The default template is '${feedtitle}/${itemtitle}${extension}'
The available variables in the template include these that are information about the feed: feedtitle, feedauthor, feedurl
And these that are information about individual items in the feed: itemtitle, itemauthor, itemsummary, itemdescription, itemrights, itemid.
Also, title is itemtitle but falls back to feedtitle if the item has no title, and author is itemauthor but falls back to feedauthor.
(All of the above are also added as metadata when annex.genmetadata is set.)
The extension variable is the extension of the file in the feed, or sometimes ".m" if no extension can be determined.
The template also has some variables for when an item was published.
itempubyear (YYYY), itempubmonth (MM), itempubday (DD), itempubhour (HH), itempubminute (MM), itempubsecond (SS), itempubdate (YYYY-MM-DD or if the feed's date cannot be parsed, the raw value from the feed).
(These use the UTC time zone, not the local time zone.)
--no-check-gitignore
By default, gitignores are honored and it will refuse to download an url to a file that would be ignored. This makes such files be added despite any ignores.
--jobs=N
-JN
Runs multiple downloads parallel. For example:
-J4
Setting this to "cpus" will run one job per CPU core.
--backend
Specifies which key-value backend to use.
--json
Enable JSON output. This is intended to be parsed by programs that use git-annex. Each line of output is a JSON object.
--json-progress
Include progress objects in JSON output.
--json-error-messages
Messages that would normally be output to standard error are included in the JSON instead.
Also the git-annex-common-options(1) can be used.
SEE ALSO
git-annex(1)
AUTHOR
Joey Hess id@joeyh.name
Warning: Automatically converted into a man page by mdwn2man. Edit with care.
An example rss item entry:
Using --template '${itemtitle}${extension}'
The resulting filename is 00135929509939_04.jpg (The dash was replaced with an underscore).
I decided to just change all the filenames in each blog repository to use underscores, however, I thought you might want to know about this.
All spaces and punctuation (other than '.') and other wacky stuff are replaced with '_' when git-annex builds a filename from some untrusted source like a feed.
I think it makes sense to do that even for '-', at least if it's at the start of a filename. "--force" is not a filename you want to let a feed inject into your work tree. I could perhaps be convinced to let '-' elsewhere in the filename through unmunged, but simplicity and consistency suggests it's just as good to always munge it.
I just thought it was surprising behavior, especially when
git-annex addurl --file 123-45.ext $url
preserves the dash in the filename, yet the rss I constructed to add multiple urls, with files, didn't do this.The new
--batch --with-files
options to addurl will eliminate the need to create specially crafted rss files.Thanks!