Re: parsing:retrieving data from websites

In order to do that, you need to follow some steps:

1. Find the full URL to the page you want to parse. If the page is a static one, it is simple, but if you need to use a form in order to open that page, you need to copy the URL which is displayed in the address bar after opening that page. If that page is opened after sending a form with a POST request or by a link that uses a Javascript code that uses the POST request, then it is more complicated. In that last case you will need to use an HTTP headers monitor in order to see the URL that was accessed and the parameters sent to the server.

2. Make a program that downloads that page. For example, let's say that the address of that page is something like:
http://www.weather.com/temperature?city=shanghai&date=2009-01-20

With perl, you can get that page with a simple code like:

use LWP::Simple;
my $page_content = get('http://www.weather.com/temperature?city=shanghai&date=2009-01-20');

3. You need to parse the HTML code from $page_content.
In order to do this, perl offers more perl modules that can be used for parsing the HTML code, but I use to do it with regular expressions because I found it more flexible and easy to use.

Octavian

----- Original Message ----- From: "Tyler Littlefield" <tyler@xxxxxxxxxxxxx>
To: <programmingblind@xxxxxxxxxxxxx>
Sent: Tuesday, January 13, 2009 8:48 PM
Subject: parsing:retrieving data from websites


Hello list,
I've seen a few scripts that will connect to a site, (weather for example), send the zip and somehow parse the weather out of the data returned.
Any pointers on where to get started? I'm totally lost here.


Thanks,
Tyler Littlefield
http://tysdomain.com

__________
View the list's information and change your settings at http://www.freelists.org/list/programmingblind

Other related posts: