[lit-ideas] Re: Petrabytes, Exabytes, and Yottabytes, oh my!

From: "Julie Krueger" <juliereneb@xxxxxxxxx>
To: lit-ideas@xxxxxxxxxxxxx
Date: Thu, 27 Nov 2008 19:02:47 -0600
>
> Yeah, I know.... the below explains everything, and once you read it
> everything will become clear ...however, you must read it in order to read
> the article following the definitions.  Then get back to me and talk to me
> about modern miracles like loading hundreds of songs on a little device you
> can hang around your neck...
> terabyte
>
>    1. A unit of computer memory or data storage capacity equal to 1,024
>    gigabytes (240 bytes).
>    2. One trillion bytes.
>
>   WordNet: <http://www.answers.com/library/WordNet-cid-2257626> terabyte
>
> The *noun* has one meaning: Meaning 
> #1<http://www.answers.com/topic/terabyte-tb>
> : a unit of information equal to one trillion (1,000,000,000,000) bytes
>   Synonym: TB <http://www.answers.com/topic/tb-abbreviation>
> <http://www.answers.com/library/Dictionary-cid-2257172> petabyte  (pĕt*'*
> ə-bīt) [image: pronunciation]
>
> Home <http://www.answers.com/> > 
> Library<http://www.answers.com/main/what_content.jsp>> Literature
> & Language <http://www.answers.com/main/words.jsp> > 
> Dictionary<http://www.answers.com/library/Dictionary-cid-2257172>
>  *n.*
>
>    1. A unit of computer memory or data storage capacity equal to 1,024
>    terabytes (250 bytes).
>    2. One quadrillion bytes.
>
> Quantities of bytes <http://www.answers.com/topic/byte>  SI 
> prefixes<http://www.answers.com/topic/si-prefix> Historical
> use <http://www.answers.com/topic/binary-prefix> Binary 
> prefixes<http://www.answers.com/topic/binary-prefix>
> Symbol
> (name) Value  Symbol  Value Symbol
> (name) Value  kB (kilobyte <http://www.answers.com/topic/kilobyte>) 10001= 10
> 3 KB 10241 = 210 KiB (kibibyte <http://www.answers.com/topic/kibibyte-1>)
> 210  MB (megabyte <http://www.answers.com/topic/megabyte>) 10002 = 106 MB
> 10242 = 220 MiB (mebibyte <http://www.answers.com/topic/mebibyte-1>) 220  GB
> (gigabyte <http://www.answers.com/topic/gigabyte>) 10003 = 109 GB 10243 =
> 230 GiB (gibibyte <http://www.answers.com/topic/gibibyte-1>) 230  TB (
> terabyte <http://www.answers.com/topic/terabyte>) 10004 = 1012 TB 10244 =
> 240 TiB (tebibyte <http://www.answers.com/topic/tebibyte-1>) 240  PB (*
> petabyte*) 10005 = 1015 PB 10245 = 250 PiB 
> (pebibyte<http://www.answers.com/topic/pebibyte-1>
> ) 250  EB (exabyte <http://www.answers.com/topic/exabyte>) 10006 = 1018 EB
> 10246 = 260 EiB (exbibyte <http://www.answers.com/topic/exbibyte-1>) 260  ZB
> (zettabyte <http://www.answers.com/topic/zettabyte>) 10007 = 1021 ZB 10247= 2
> 70 ZiB (zebibyte <http://www.answers.com/topic/zebibyte>) 270  YB (
> yottabyte <http://www.answers.com/topic/yottabyte>) 10008 = 1024 YB 10248= 2
> 80 YiB (yobibyte <http://www.answers.com/topic/yobibyte>) 280
>
> Now that that's all cleared up (*yottabytes??*), here's the article....
>
> <<Google Sorts One Petabyte Of Data In 6 Hours
>  Posted by *Roger Smith* <rsmith@xxxxxxxxxxx>*, Nov 26, 2008 02:49 PM*
>
>
> According to last Friday's Official Google 
> Blog<http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html>,
> the Google Systems Infrastructure Team has sorted a record 1 terabyte of
> data on 1,000 computers in only 68 seconds, which breaks the previous mark
> of 209 seconds established in July by 
> Yahoo<http://www.hpl.hp.com/hosted/sortbenchmark/>
> .
>
>  Team leader Grzegorz Czajkowski wrote that the team followed the rules of
> a standard terabyte sort benchmark and used Google's MapReduce software
> framework that supports parallel computations over large (multiple petabyte)
> data sets on clusters of computers. Yahoo's effort had featured a 910-node
> cluster, and used Hadoop, an open-source MapReduce implementation.
>
> The sort benchmark, which was created in 1998 by computer scientist Jim
> Gray, specifies the input data (10 billion 100-byte records in uncompressed
> text files), which must be completely sorted and written to disk. Not
> content with just rewriting the record book, the Google team then decided to
> up the ante in sorting massive volumes of data.
>
> "Sometimes you need to sort more than a terabyte, so we were curious to
> find out what happens when you sort more and gave one petabyte (PB) a try,"
> said Czajkowski. "It took six hours and two minutes to sort 1 PB (10
> trillion 100-byte records) on 4,000 computers. We're not aware of any other
> sorting experiment at this scale and are obviously very excited to be able
> to process so much data so quickly."
>
> One petabyte is a thousand terabytes, or roughly 12 times the amount of
> archived Web data in the U.S. Library of Congress as of May 2008. One way to
> put that amount in perspective, according to Czajkowski, is to consider that
> the aggregate size of data processed by all instances of MapReduce at Google
> was, on average, 20 PB per day in January 2008. A 
> paper<http://labs.google.com/papers/mapreduce.html>explaining MapReduce on 
> the Google labs site says that the upwards of one
> thousand MapReduce jobs are executed on Google's clusters every day. So the
> infrastructure team's MapReduce job that extended the benchmark factors out
> to 50 typical MapReduce jobs, or one-twentieth the total of all daily
> MapReduce jobs run on Google's clusters.
>
> As I 
> reported<http://www.informationweek.com/blog/main/archives/2008/08/micosofts_sql_s.html;jsessionid=B0PEFQE0UXWO4QSNDLRSKHSCJUNN2JVN>a
>  couple of months ago, Microsoft has its own strategy for sorting massive
> data sets, which I gleaned from reading a white 
> paper<http://research.microsoft.com/%7Ejrzhou/pub/Scope.pdf>presented at a 
> database conference. All companies that operate
> Internet-scale cloud services have the need to store and process massive
> data sets, such as search logs, Web content collected by crawlers, and
> click-streams collected from a variety of Web services. Google, Yahoo, and
> Microsoft have developed their own systems that support parallel
> computations over multiple petabyte data sets on clusters of computers.
> While Google and Yahoo rely on the *map-reduce* programming model,
> Micosoft's *Scope* programming model intentionally builds on end-user
> knowledge of relational data and SQL. Microsoft's sorting strategy at this
> point appears to be primarily conceptual since, unlike Google and Yahoo, it
> hasn't competed in any recent benchmark tests.>>
>
> http://www.informationweek.com/blog/main/archives/2008/11/google_sorts_on.html
>
>
[lit-ideas] Re: Petrabytes, Exabytes, and Yottabytes, oh my!

Other related posts: