Crawling a website for media files.

Linux for blind general discussion blinux-list at redhat.com
Wed May 26 18:00:51 UTC 2021


Tim here.  Both seem to have RSS feeds buried in the HTML source,
so I'd point a podcatcher at the feed URLs and have it do all the
heavy lifting.  As an added bonus, when new episodes are released, it
should just download the new ones.  I like "castget" so here's a
simple ~/.castgetrc example that should slurp them down 

  $ cat >> ~/.castgetrc <<EOF
  [mediamd]
  spool=$HOME/Podcasts/MediaMD/
  url=http://www.mediamdpodcast.com/feed/podcast

  [parahuman]
  spool=$HOME/Podcasts/Parahuman/
  url=http://parahumanaudio.com/feed/podcast
  EOF
  $ mkdir -p $HOME/Podcasts/{MediaMD,Parahuman}

and then run

  $ castget

(optionally with "-v" for verbose output) to fetch all the episodes.
If you have some other podcatcher that you like, putting that URL
into it should be enough to let it do the hard work.  If new episodes
come out, re-running just "castget" will fetch the updates.

If either of these give you grief, I can toss together some scraping
utility to extract the direct MP3 URLs and pass those off to wget if
needed.

Hope this helps.

-Tim



On May 26, 2021, Linux for blind general discussion wrote:
> Okay, so I've known for a while that someone has been recording
> audio books of two completed works from one of my favorite web
> serial writers. Thing is, both works in question span hundreds of
> chapters and the people doing the audio books don't, best I can
> tell, offer any convenient means of downloading everything they've
> recorded thus far, and I don't like the idea of tabbing through and
> control entering over 100 links for one or going through a couple
> hundred blog posts and locating the download button for the other.
> 
> I'm using Firefox-ESR 78.10... Deos anyone know of an accessible
> browser extention that can either download all of the media files
> linked on the page in the active tab or recursively download media
> files from the current page and pages it links to on the same
> domain, ideally by adding a "Download All" option to the context
> menu? Alternatively, anyone know a command line tool that can do
> this, ideally which can read URLs from a txt file?
> 
> If it helps, the pages I have bookmarked for the two audiobooks are:
> 
> http://www.mediamdpodcast.com/pact-audiobook-project/
> 
> http://parahumanaudio.com/
> 
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list
> 




More information about the Blinux-list mailing list