Re: raid 5 disaster

  • From: Mogens Nørgaard <mln@xxxxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Sat, 14 Aug 2004 16:05:28 +0200

If you really want a cool, technical explanation, here's what I did at 
such a site some years ago (where I had been told that it was NOT the 
disk IO system, because it was the Shark RAID-4 from IBM themselves, and 
the client had bought it for slightly less than a B-2 bomber would have 
cost them (complete with ground staff, runways, buildings and munitions) 
just two years earlier - so as I could clearly see, it was not the IO 
system.

Right.

So I did the 10046 thing on various important and/or demanding reports 
on the system and it showed that 80-90% of the time went to db file 
sequential/scattered reads.

Then I set out to see whether the IO took such a large amount of time 
because there were many, but fast, IO's, or because there were few, but 
slow, IO's.

In my case it was the latter - very slow IO's. My handwriting was faster 
than this zillion-Euro-IO-subsystem (it was more "sub" than "system", I 
would argue).

Then Peter Gram from Miracle went to work, getting access to a small, 
embedded DB2-database built into the Shart that could actually show the 
number of IO's per 8-pack and even per disk.

Since this was the old Shark, we knew it was using cheap, slow, old 
disks (7500 RPM). And 8-pack contains 6 data disks, one dedicated parity 
disk (RAID-4) and one hot spare. So a total of 300 IO's per second per 
8-pack was to be expected.

Lots of Excel-related work followed, we could see that many of the 
8-packs were - surprise, surprise - trying to handle more than 300 IO's 
per second.

Then I had the good fortune of being allowed to tell all this to the 
board, and one thing they understood was that if you start out bying 
8-packs where 2 of the disks are not used for data, you've wasted 25% of 
your 8-pack investment up front. If, that is, you need the disks 
mirrored anyway (and of course this RAID-4 system was mirrored). Next, I 
told them how many of the IO's were sequential (small) writes and how 
that meant roughly a factor 4 of IO's on the disks (more waste of 
investment). Then I argued that having 33% more disks than required (8 
disks where only 6 worked with data) also meant higher investments in 
cabinets, controllers, rooms, and what have you.

They decided to buy a new system. They bought (for political reasons) 
the new Shark 800, which can of course be configured with the same, 
silly 8-pack/RAID-4 configuration) but thank God also as RAID-1+0. Which 
is what management wanted and had decided on the meeting.

My job (and Peter Gram's) was over, and we left the building.

A few months later I heard that it was running twice as fast as the old 
system. Only twice, I asked? Well, that could only mean that they were 
only getting the benefit of the new, faster 15000 RPM disks, but NOT 
getting the benefit of the RAID-1+0! So I asked the sys admin guy, and 
it turned out that he had never really believed all this bad-mouthing of 
RAID-4 (it had to be RAID-5, he insisted, because the IBM docs said so, 
even when he ackowledged the presence of a dedicated parity disk), so he 
had - believe it - configured the whole, new, very expensive Shart 800 
SAN as ... RAID-4 (or 5 :-) ).

So I got mad and told him what I thought, and he didn't care at all. 
Then I realised two things: 1) They would have to call us in again 
sooner because of this, 2) I could never convince this guy anyway.

Hey, guess where my consulting outfit is making some decent money 
currently... and I'm not kidding.

Long story, I know. But find out where the time goes, then find out why. 
The same goes for your considerations regarding dbms_output. It's really 
not enough to believe or think or to have heard something somewhere. 
Measure it, and then you can argue sense into most heads around you.

Mogens

Ken Naim wrote:
> Raid5 is notoriously slow for writes and instead of just selecting and
> spooling the data they are making several copies of the data (with logging
> to boot) converting the statement to simple sql scripts with spool
> statements followed by the select statements will avoid the write penalty of
> raid5 and will gain the performance benefit of only using the sql engine (no
> context switching etc.). Now if they are not inclined to rewrite the reports
> you can always throw money at the problem and switch from raid5 to raid 10
> which will speed everything up tremendously but it isn't the optimal
> solution as you will see a better performance gains be redesigning the
> reports (you could also do both if you want to shock and amaze them).
> 
> -----Original Message-----
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
> On Behalf Of Michael McMullen
> Sent: Thursday, August 12, 2004 2:21 PM
> To: oracle-l@xxxxxxxxxxxxx
> Subject: raid 5 disaster
> 
> Oracle 9.2.0.3
> solaris 5.8
> raid 5 T3's
> 
> I have a server which was created for adhoc purposes. What has happened is
> this. Instead of users running adhocs, they have created a package which
> creates tables (all logging of course) based on their adhoc query. The
> package then uses dbms_output.put_line to write the created table to a file
> which then ftp's the file to the appropriate client. Roughly 200 reports are
> written a day and most would be over 1 million lines and total file size
> would be ~10Gb. Of course everybody wants their report by 9:00 a.m so the
> box is routinely pegged at 60% i/o wait.
> So everybody is complaining about database performance. I have stated that
> dbms_output was probably the worst way to implement this. As I recall
> dbms_output will write out one line at a time. Now would the use of
> dbms_output affect other ddl. A truncate of a small table can take 15
> minutes. Or is that just a function of the i/o waits? I guess I'm looking
> for some really technical explanation as to why there performance is so bad.
> 
> Thanks
> Mike
> 
> ----------------------------------------------------------------
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> ----------------------------------------------------------------
> To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx put
> 'unsubscribe' in the subject line.
> --
> Archives are at //www.freelists.org/archives/oracle-l/
> FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
> -----------------------------------------------------------------
> ----------------------------------------------------------------
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> ----------------------------------------------------------------
> To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
> put 'unsubscribe' in the subject line.
> --
> Archives are at //www.freelists.org/archives/oracle-l/
> FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
> -----------------------------------------------------------------
> 

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: