[pskmail] New screen scraper

  • From: Rein Couperus <rein@xxxxxxxxxxxx>
  • To: pskmail@xxxxxxxxxxxxx
  • Date: Sat, 14 Aug 2010 13:58:39 +0200 (CEST)

I noticed that most of the links to provide content for PI4TUE and others were 
That triggered me to write a better web scraper, which is now available in 

The files are scraper.pl and scraper.cfg.

The new scraper works like the URL downloads in the pskmail client, you can 
define a *begin* and an *end* word to get a defined number of lines, and also 
start and end of the column in the line.
So you can cut a square text field anywhere from the web page, killing 
links,banners, ads and nav columns.

The downloads are now in a config file called scraper.cfg, which could look 

Each line contains:Filename, url,beginword, endword, begin column, endcolumn.

Just start scraper.pl periodically with a cron job...

An example, look at 
http://www.knmi.nl/waarschuwingen_en_verwachtingen/index.html ,
after scraping this looks like the attached file...


Rein PA0R


Other related posts: