Re: Storage array advice anyone?

  • From: "Terry Sutton" <terrysutton@xxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 14 Dec 2004 17:20:56 -0800

As Cary says, the probabilities have been worked out.  But people don't tend
to internalize probabilities (at least accurately).  They tend to think
either "lucky" or "unlucky".  I personally like probabilities, but I
generally tell managers my experience.  In the last 7 years, at least 4
systems I had some contact with (clients/employers/former
employers/whatever) had 2 disks fail in close time proximity (within roughly
30 minutes).  Calculating the implicit probabilites of these real-life
occurrences is way over my head (some were small shops with 1-2 servers,
some larger).  But I can safely say that, in human terms, such an event is
not "unbelievably unlikely".

The likelihood of serious data loss from 2 failed drives is much lower with
RAID 10 than with RAID 5.  So you've got a performance benefit and a fault
tolerance benefit with RAID 10.  And I won't even detail the crippling
effect I've seen when a RAID 5 disk failed and the system rebuilt it with
the hot spare.

--Terry

----- Original Message ----- 
From: "Cary Millsap" <cary.millsap@xxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Sent: Tuesday, December 14, 2004 2:38 PM
Subject: RE: Storage array advice anyone?


The probabilities are already worked out, and they're publicly available =
in
the paper called "RAID: High-Performance, Reliable Secondary Storage" =
(an
ACM Surveys article) by Messrs. Chen, Lee, Gibson, Katz, and Patterson.

Not many people bother to put them into Excel, but when I once played =
with
the numbers a bit, I realized pretty quickly that the probability of an
outage-causing double-whammy is a lot worse than most people think. The
article mentions that point specifically, if I remember correctly.

The key idea is that the failures of two disks in an array are =
frequently
not independent events. Often, the event that just screwed up disk #1 =
has a
higher probability now of screwing up disk #2 before you can fix #1.


Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com
* Nullius in verba *

Upcoming events:
- Performance Diagnosis 101: 1/4 Calgary
- SQL Optimization 101: 2/7 Dallas
- Hotsos Symposium 2005: March 6-10 Dallas
- Visit www.hotsos.com for schedule details...


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx =
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Jared Still
Sent: Tuesday, December 14, 2004 3:58 PM
To: chris@xxxxxxxxxxxxxxxxxxxxx
Cc: Stephen.Lee@xxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: Re: Storage array advice anyone?

On Tue, 14 Dec 2004 10:47:20 +0000, chris@xxxxxxxxxxxxxxxxxxxxx=20
> My experience is that with either RAID 5 or 10 you have to be =
unbelievably
> unlucky to lose data providing disks are replaced when they fail and =
not
left
> for a few days or even more. You are talking extremely remote. It =
might be
an
> idea to get someone to do the maths and work out the probabilities.

I, for one, have been that unlucky on at least one occasion.=20

--=20
Jared Still
Certifiable Oracle DBA and Part Time Perl Evangelist
--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l

Other related posts: