[pskmail] Re: New screen scraper

  • From: Joerg Schroeder <dl9ycs@xxxxxxxxxxxxxx>
  • To: pskmail@xxxxxxxxxxxxx
  • Date: Sun, 15 Aug 2010 08:51:00 +0200

Good idea, Rein.
installt it on my server.

73
Joerg

2010/8/14 Rein Couperus <rein@xxxxxxxxxxxx>

> I noticed that most of the links to provide content for PI4TUE and others
> were dead.
> That triggered me to write a better web scraper, which is now available in
> htpp://hermes.esrac.ele.tue.nl/pskmail/utililities
>
> The files are scraper.pl and scraper.cfg.
>
> The new scraper works like the URL downloads in the pskmail client, you can
> define a *begin* and an *end* word to get a defined number of lines, and
> also start and end of the column in the line.
> So you can cut a square text field anywhere from the web page, killing
> links,banners, ads and nav columns.
>
> The downloads are now in a config file called scraper.cfg, which could look
> like:
> danish_sea_areas,
> http://www.dmi.dk/eng/print/index/forecasts/forecast_for_sea_areas.htm,Forecast,http
>
> dutch_wx,http://www.knmi.nl/waarschuwingen_en_verwachtingen/,Weer,Uitleg,3,60<http://www.dmi.dk/eng/print/index/forecasts/forecast_for_sea_areas.htm,Forecast,http%0Adutch_wx,http://www.knmi.nl/waarschuwingen_en_verwachtingen/,Weer,Uitleg,3,60>
>
> Each line contains:Filename, url,beginword, endword, begin column,
> endcolumn.
>
> Just start scraper.pl periodically with a cron job...
>
> An example, look at
> http://www.knmi.nl/waarschuwingen_en_verwachtingen/index.html ,
> after scraping this looks like the attached file...
>
> 73,
>
> Rein PA0R
>
>
> http://pa0r.blogspirit.com

Other related posts: