Re: Raid 50

From: "Craig I. Hagan" <hagan@xxxxxxx>
To: oracle-l@xxxxxxxxxxxxx
Date: Thu, 8 Jul 2004 07:33:26 -0700 (PDT)
> are you sure that they aren't using RAID 5 sets with 5
> or 9 members?

you're right, i forgot to add the parity disk when i worked the #disks/set.
however the points remain.

note that this still doesn't violate the statements that i made (i had a
feeling that i might have been off by one)

Next, your statement talks about reads, which don't have the stripe width
problem (just chunk size/individual disk) save when operating in degraded mode
and a read is performed against data on the failed disk. Raid5 isn't all that
bad for random reads -- it is just that most random read system also come with
random writes which you didn't address.

this leaves you with two sets of io possibilities (one if the array's minimum 
io size is a stripe):

1) read just the chunk(s) requested if the data being read is less than
        stripe width and no drives have failed

        send io to sub-disk(s), return result

        NB: this is comparable to raid1 (one iop per disk)

2) read the entire strip
        if drives have failed:
        read stripe's chunks from surviving subdisks. unless chunk w/ crc 
        has failed, use it to compute missing data

        if no faults
        read strip's chunks from subdisks, return result

        NB: this is also comparable to raid1 (one iop per disk) save in
        degraded mode where you also have a checksum computation.

In both cases at most one iop is being submitted to the subdisks. This
important -- and part of why raid5 often has radically different read vs. write
performance.

You discussed reads earlier, which is an area that raid5 often does quite well
at. Writes can be a different matter. In order to achieve writes the size of
stripes is to issue the to the OS either as a single large write, or (for
OSes/storage which are smart enough to coalesce) a series of adjacent smaller
writes.

When your submitted writes are less than stripe size and are random so no
coalescing can be performed (think oltp with blocksize < stripesize), then you
will see this:

read stripe in.
modify the 8k region
compute checksum for stripe
write out to disks

This requires two operations against all disks in the set as well as a
checksummer computation. This is inferior to raid1 which would have emitted one
iop to each disk. This is a major reason why raid5 isn't chosen for truly
random io situations unless the sustained writes are below that which can be
sunk to disk and the cache can handle the burst workload.

When the submitted write is either coalesced to an integer numer of stripes,
then your io pattern looks like this:

checksum data
write stripe

which goes back to one io per subdisk. 

This is an area where raid5 tends to do quite well -- often better than raid1
pair because you're splitting the load across more disks (similar # of iops)  
rather than duplicating it (write speed of raid1 pair == write speed of single
disk).

Raid5 is often chosen for streaming read/write applications where the submitted
requests (from the array's perspective) is sequential io as raid5 is pretty
dang good at it.

This is why I posted in the form of pro/con. The assumption is that folks
should have an understanding of their system so that they can guage their io
rate and look at a storage solution and choose one which fits both their io,
space, reliability, and budget requirements.

-- craig
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
Follow-Ups:
- Re: Raid 50
  - From: Matthew Zito
References:
- Re: Raid 50
  - From: Paul Drake
Re: Raid 50

Other related posts: