RESOUR> Techniques to compute web page rankings faster

  • From: Gleason Sackmann <gleason@xxxxxxxxxxxxxxx>
  • To: NetHappenings <nethappenings@xxxxxxxxxxxxx>
  • Date: Wed, 14 May 2003 07:39:30 -0500

**************************************************************
Net Happenings - From Educational CyberPlayGround
**************************************************************

National Science Foundation
4201 Wilson Blvd.
Arlington, VA 22230
"Where discoveries begin"

For Immediate Release
May 13, 2003

NSF PR03-56

NSF Media Contact: David Hart, 703-292-7737, dhart@xxxxxxx

   RESEARCHERS DEVELOP TECHNIQUES FOR COMPUTING GOOGLE-STYLE WEB
                 RANKINGS UP TO FIVE TIMES FASTER
    Speed-up may make "topic-sensitive" page rankings feasible

ARLINGTON, Va. - Computer science researchers at Stanford
University have developed several new techniques that together
may make it possible to calculate Web page rankings as used in
the Google search engine up to five times faster.  The speed-ups
to Google's method may make it realistic to calculate page
rankings personalized for an individual's interests or customized
to a particular topic.

The Stanford team includes graduate students Sepandar Kamvar and
Taher Haveliwala, noted numerical analyst Gene Golub and computer
science professor Christopher Manning.  They will present their
first paper at the Twelfth Annual World Wide Web Conference
(WWW2003) in Budapest, Hungary, May 20-24, 2003.  The work was
supported by the National Science Foundation (NSF), an
independent federal agency that supports fundamental research and
education in all fields of science and engineering.

Computing PageRank, the ranking algorithm behind the Google
search engine, for a billion Web pages can take several days.
Google currently ranks and searches 3 billion Web pages.  Each
personalized or topic-sensitive ranking would require a separate
multi-day computation, but the payoff would be less time spent
wading through irrelevant search results.  For example, searching
a sports-specific Google site for "Giants" would give more
importance to pages about the New York or San Francisco Giants
and less importance to pages about Jack and the Beanstalk.

"This work is a wonderful example of how NSF support for basic
computer science research, including applied mathematics and
algorithm research, has impacts in daily life," said NSF program
officer Maria Zemankova.  In the mid-1990s, an NSF digital
library project and an NSF graduate fellowship also supported
Stanford graduate students Larry Page and Sergey Brin while they
developed what would become the Google search engine.

To speed up PageRank, the Stanford team developed a trio of
techniques in numerical linear algebra. First, in the WWW2003
paper, they describe so-called "extrapolation" methods, which
make some assumptions about the Web's link structure that aren't
true, but permit a quick and easy computation of PageRank.
Because the assumptions aren't true, the PageRank isn't exactly
correct, but it's close and can be refined using the original
PageRank algorithm.  The Stanford researchers have shown that
their extrapolation techniques can speed up PageRank by 50
percent in realistic conditions and by up to 300 percent under
less realistic conditions.

A second paper describes an enhancement, called "BlockRank,"
which relies on a feature of the Web's link structure-a feature
that the Stanford team is among the first to investigate and
exploit.  Namely, they show that approximately 80 percent of the
pages on any given Web site point to other pages on the same
site.  As a result, they can compute many single-site PageRanks,
glue them together in an appropriate manner and use that as a
starting point for the original PageRank algorithm.  With this
technique, they can realistically speed up the PageRank
computation by 300 percent.

Finally, the team notes in a third paper that the rankings for
some pages are calculated early in the PageRank process, while
the rankings of many highly rated pages take much longer to
compute. In a method called "Adaptive PageRank," they eliminate
redundant computations associated with those pages whose
PageRanks finish early.  This speeds up the PageRank computation
by up to 50 percent.

"Further speed-ups are possible when we use all these methods,"
Kamvar said.  "Our preliminary experiments show that combining
the methods will make the computation of PageRank up to a factor
of five faster.  However, there are still several issues to be
solved.  We're closer to a topic-based PageRank than to a
personalized ranking."

The complexities of a personalized ranking would require even
greater speed-ups to the PageRank calculations.  In addition,
while a faster algorithm shortens computation time, the issue of
storage remains.  Because the results from a single PageRank
computation on a few billion Web pages require several gigabytes
of storage, saving a personalized PageRank for many individuals
would rapidly consume vast amounts of storage.  Saving a limited
number of topic-specific PageRank calculations would be more
practical.

The reason for the expensive computation and storage requirements
lies in how PageRank generates the rankings that have led to
Google's popularity.  Unlike page-ranking methods that rate each
page separately, PageRank bases each page's "importance" on the
number and importance of pages that link to the page.

Therefore, PageRank must consider all pages at the same time and
can't easily omit pages that aren't likely to be relevant to a
topic.  It also means that the faster method will not affect how
quickly Google presents results to users' searches, because the
rankings are computed in advance and not at the time a search is
requested.

The Stanford team's conference paper and technical reports on
enhancing the PageRank algorithm, as well as the original paper
describing the PageRank method, are available on the Stanford
Database Group's Publication Server
(http://dbpubs.stanford.edu/).

                               -NSF-

WWW2003: http://www.www2003.org/
The papers are also available from
http://www.stanford.edu/~sdkamvar/research.html
Stanford Global InfoBase: http://www-db.stanford.edu/~mor/GIB/

NSF Program Officers: Maria Zemankova, 703-292-8930, mzemanko@xxxxxxx
John Staudhammer, 703-292-8918, jstaudham@xxxxxxx

Project Contacts: Sepandar Kamvar, 650-799-5063, sdkamvar@xxxxxxxxxxxx
Christopher Manning, 650-723-7683, manning@xxxxxxxxxxxxxxx

National Science Foundation is an independent federal agency that
supports fundamental research and education across all fields of
science and engineering, with an annual budget of nearly $5
billion.  NSF funds reach all 50 states through grants to nearly
2,000 universities and institutions.  Each year, NSF receives
about 30,000 competitive requests for funding, and makes about
10,000 new funding awards.  NSF also awards over $200 million in
professional and service contracts yearly.

<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>
EDUCATIONAL CYBERPLAYGROUND 
http://www.edu-cyberpg.com

VENDORS REACH THE EDUCATION MARKET
FREE EDUCATION VENDOR DIRECTORY LISTING
Find PREMIUM & FEATURED MERCHANT LISTING ALSO 
http://www.edu-cyberpg.com/Directory/default.asp

HOT LIST OF SCHOOLS ONLINE
http://www.edu-cyberpg.com/Schools/default.asp

SERVICES
http://www.edu-cyberpg.com/PS/Home_Products.html

Net Happenings,K12 Newsletters, Network Newsletters, New-list 
http://www.edu-cyberpg.com/Community/index.html
<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>~~~~~<>

Other related posts:

  • » RESOUR> Techniques to compute web page rankings faster