Hi Victor, Thursday, August 28, 2003, 3:02:48 PM, you wrote: VF> The only webpages that Google will never find are company internal VF> Intranet webpages - all the rest eventually will end up in Googles VF> database. [Which, thinking about it must be huge / gigantic. I VF> wonder what they run their database on and what type of database VF> server - I am sure it must be at least a 4 processor machine]. I guess you haven't seen this page before: http://www.google.com/technology/pigeonrank.html It's obvious humor, yet on the other hand, it does tell you how Google works behind the scenes (replace "PigeonRank" with "PageRank", etc.). Looks like they're running Linux (presumably with Beowulf clustering software) on standard rack-mounted computers (apparently they don't use backplanes or blades). The fact that they keep a cache of all of the text on each web page means that they need huge amounts of storage space just to store that, even if stored compresed. My experiments with high-speed full-text indexing were interesting; you can get impressive speed as long as you don't mind huge indices. On the other hand, if you're indexing huge quantities of data, the size becomes more reasonable. My experiment stored blocks of 8 (or whatever it was) characters of text along with a pointer to which record(s) that text could be found in. For example, this paragraph would be stored as: My exper Msg 2177 line 25, msg 2177 line 28 y experi Msg 2177 line 25, msg 2177 line 28 experim Msg 2177 line 25, msg 2177 line 28 experime Msg 2177 line 25, msg 2177 line 28 xperimen Msg 2177 line 25, msg 2177 line 28 periment Msg 2177 line 25, msg 2177 line 28 eriments Msg 2177 line 25 riments Msg 2177 line 25 iments w Msg 2177 line 25 etc. I used a standard index for the blocks themselves. The space could be optimized greatly by using LZW compression. --Scott. To unsubscribe from this list send an email to pcductape-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field OR by logging into the Web interface.