[net-gold] INTERNET: SEARCH: TOOLS : STATISTICS : DATA MINING: Researchers Unleash Crawlers into Deep Web Data

  • From: "David P. Dillard" <jwne@xxxxxxxxxx>
  • To: Net-Gold <Net-Gold@xxxxxxxxxxxxxxx>, Temple University Net-Gold Archive <net-gold@xxxxxxxxxxxxxxxxxxx>, Temple Gold Discussion Group <TEMPLE-GOLD@xxxxxxxxxxxxxxxxxxx>, Net-Gold <net-gold@xxxxxxxxxxxxxxxx>, Sean Grigsby <myarchives1@xxxxxxxxxxxxxxx>, Educator Gold <Educator-Gold@xxxxxxxxxxxxxxx>, Educator Gold <Educator-Gold@xxxxxxxxxxxxxxxx>, K12AdminLIFE <K12AdminLIFE@xxxxxxxxxxxxxxx>, Net-Platinum <net-platinum@xxxxxxxxxxxxxxx>, "Net-Gold @ Nabble" <ml-node+3172864-337556105@xxxxxxxxxxxxx>, MediaMentor <mediamentor@xxxxxxxxxxxxxxx>, Digital Divide Diversity MLS <mls-digitaldivide@xxxxxxxxxxxxxxx>, Discussion of Digital Reference <DIG_REF@xxxxxxxxxxxxxxxx>, Discussion of Library Reference Issues <LIBREF-L@xxxxxxxxxxxxxxxxx>, Business Librarians <BUSLIB-L@xxxxxxxxxxxxxxxxx>, net-gold@xxxxxxxxxxxxx
  • Date: Mon, 18 Jan 2010 18:14:51 -0500 (EST)



.


INTERNET: SEARCH: TOOLS :
STATISTICS :
DATA MINING:
Researchers Unleash Crawlers into Deep Web Data


Researchers Unleash Crawlers into Deep Web Data
Jennifer Foreshew | From: The Australian | January 19, 2010 12:00AM <http://www.theaustralian.com.au/australian-it/researchers-
unleash-crawlers-into-deep-web-data/story-e6frgakx-1225820997337>


A shorter URL for the above link:


<http://tinyurl.com/ybkvzaj>


STRUCTURED data on the web presents a number of difficult technical challenges because it is hard to extract, and often disorganised and messy, a visiting Google engineer says.

In Australia for Australasian Computer Science Week, which started yesterday at Queensland University of Technology, Alon Halevy's research looks at the difficulties of using the millions of structured databases on the web.

Professor Halevy, who heads Google's structured data management research group in the US, is a keynote speaker at the event, which has attracted more than 250 leading computer science researchers and IT experts from 21 countries.


<snip>


"First, the data is embedded in textual web pages and must be extracted prior to use," the paper, Structured Data on the Web, says.

"Second, there is no centralised data design, as there is in a traditional database."


<snip>


Google has two research projects on these problems.

The first, WebTables, compiles a huge collection of databases by crawling the web and finding small relational databases that use the HTML table tag.

"By performing data mining on the resulting extracted information, we can also introduce a number of brand-new data-centric applications," the paper says.

The second project attempts to extract information from the Deep Web, which refers to data on the web that is only available by filling web forms, and therefore invisible to traditional search crawlers.

"We crawled the content of millions of databases behind forms and now serve content from these databases to over 1000 queries per second," Professor Halevy said.



------------------------------------



The complete article may be read at the URL above.



Sincerely,
David Dillard
Temple University
(215) 204 - 4584
jwne@xxxxxxxxxx
<http://daviddillard.businesscard2.com>
Net-Gold
<http://groups.yahoo.com/group/net-gold>
Index: <http://tinyurl.com/myxb4w>
<http://listserv.temple.edu/archives/net-gold.html>
<http://groups.google.com/group/net-gold?hl=en>
General Internet & Print Resources
<http://guides.temple.edu/general-internet>
COUNTRIES
<http://guides.temple.edu/general-country-info>
EMPLOYMENT
<http://guides.temple.edu/EMPLOYMENT>
TOURISM
<http://guides.temple.edu/tourism>
DISABILITIES
http://guides.temple.edu/DISABILITIES
INDOOR GARDENING
<http://tech.groups.yahoo.com/group/IndoorGardeningUrban/>
Educator-Gold
<http://groups.yahoo.com/group/Educator-Gold/>
K12ADMINLIFE
<http://groups.yahoo.com/group/K12AdminLIFE/>
RUSSELL CONWELL CENTER SUBJECT GUIDE
http://guides.temple.edu/Russell-Conwell-Center
THE COLLEGE LEARNING CENTER
<http://tinyurl.com/yae7w79>
Nina Dillard's Photographs on Net-Gold
http://tinyurl.com/36qd2o
and also  http://gallery.me.com/neemers1
Net-Gold Membership Required to View Photos on Net-Gold
Twitter: davidpdillard

Bushell, R. & Sheldon, P. (eds),
Wellness and Tourism: Mind, Body, Spirit,
Place, New York: Cognizant Communication Books.
Wellness Tourism: Bibliographic and Webliographic Essay
David P. Dillard
<http://tinyurl.com/p63whl>
<http://tinyurl.com/ou53aw>

INDOOR GARDENING
Improve Your Chances for Indoor Gardening Success
http://tech.groups.yahoo.com/group/IndoorGardeningUrban/
http://groups.google.com/group/indoor-gardening-and-urban-gardening

SPORT-MED
https://www.jiscmail.ac.uk/lists/sport-med.html
http://groups.google.com/group/sport-med
http://groups.yahoo.com/group/sports-med/
http://listserv.temple.edu/archives/sport-med.html

Health Diet Fitness Recreation Sports Tourism
http://health.groups.yahoo.com/group/healthrecsport/
http://groups.google.com/group/healthrecsport
http://listserv.temple.edu/archives/health-recreation-sports-tourism.html

Other related posts:

  • » [net-gold] INTERNET: SEARCH: TOOLS : STATISTICS : DATA MINING: Researchers Unleash Crawlers into Deep Web Data - David P. Dillard