Re: Storage array advice anyone?

  • From: chris@xxxxxxxxxxxxxxxxxxxxx
  • To: Stephen.Lee@xxxxxxxx
  • Date: Tue, 14 Dec 2004 10:47:20 +0000


This is a classic debate / argument that's been going on for in one form or
other for years. Anyway for what it's worth here's my "two penny worth" or 2
cents if you prefer.

Last year I was involved in setting up an IBM ESS 8000 "Shark" which had 80 -
146 GB drives, so similar but not quite the capacity of your array. We had to
go through a number of decisions:

1. RAID 5 or 10 ?
(I've heard some people say that on the Shark RAID5 is actually RAID4 but Id
on't want to go there now).
There are the usual trade-offs:
  RAID 5 gives more capacity
  RAID 10 gives better protection against disk failures, although with hot
spares etc. you'd have to be very, very unlucky to suffer data loss using RAID

  On performance RAID 10 is generally better but it depends on things such as
read/write ratio, does the RAID 10 implementation use both plexes for reading
or does it only read from the primary etc. RAID 5 suffers when there's been a
disk failure especially when it's re-building the disk using the hot-spare.

In our situation RAID 5 was choosen due to price and capacity requirements. Also
with the nature of our Oracle databases the performance benefits of RAID 10 were
likely to be marginal except when recovering from a disk failure.

2. Striping etc

The first question you need to ask is:
  "Do I have different workloads e.g. dev, live, performance critical databases
etc ?"
  If you do, which is likely, then you need to decide if you want to segregate
the workloads / databases onto separate groups of disks to avoid any performance
contention (at the disk level, you can't avoid it at the cache level) between
the workloads / databases. James Morle has written an excellent paper on this
"Sane SAN", it should be available on his website

In our case we affectively had a single critical workload (a group of databases
and flat files). When this workload was running nothing else would be running.
So to maximise performance we did the following:

a) Divided each disk group (set of 8 disks as a RAID 5 set) into 20 GB LUNs i.e.
"disks" / Physical volumes (PVs) from the OS view.

b) Created volume groups made up of an equal number of LUNs from each disk
group. e.g. VG02 contained 2 LUNs from each of the 10 disk groups so 400 GB.

c) Created filesystems from these volumes that were striped with a 4 MB strip
size across all disks in the VG. This was done using "extent based striping"
performed by the volume manager (both HP-UX and AIX).

This meant that our critical workload had access to all the phyiscal disks all
the time and the IO was evenly spread across all disks, hence maximising

If you decide you want to segregate workloads then you need allocate a physical
separate group of disks to each workload. Then I'd suggest you stripe across
each separate group of disks as shown above. So you might end up with 3 or 4
groups of disks which have their own spearate striping.

When doing this you need to bear in mind what you are going to do when extra
capacity is added in a year or two's time. This can be quite a challenge.

Hope that all made sense.

3. Disk failures

My experience is that with either RAID 5 or 10 you have to be unbelievably
unlucky to lose data providing disks are replaced when they fail and not left
for a few days or even more. You are talking extremely remote. It might be an
idea to get someone to do the maths and work out the probabilities.

Well I hope that helps.


PS I know about BAARF and in an ideal world we wouldn't use RAID 5 but sometimes
when managers are managers and bean counters are counting their beans you can't
justify RAID 10 over RAID 5. You just need to make management aware of the
trade-offs and understand the implications of the decision.

Quoting Stephen Lee <Stephen.Lee@xxxxxxxx>:

> There is a little debate going on here about how best to setup a new
> system which will consist of IBM pSeries and a Hitachi TagmaStore 9990
> array of 144 146-gig drives (approx. 20 terabytes).  One way is to go
> with what I am interpreting is the "normal" way to operate where the
> drives are all aggregated as a big storage farm -- all reads/writes go
> to all drives.  The other way is to manually allocate drives for
> specific file systems.
> Some around here are inclined to believe the performance specs and
> real-world experience of others that say the best way is keep your hands
> off and let the storage hardware do its thing.
> Others want to manually allocate drives for specific file systems.
> Although they might be backing off (albeit reluctantly) on their claims
> that is it required for performance reasons, they still insist that
> segregation is required for fault tolerance.  Those opposed to that
> claim insist that the only way (practically speaking) to lose a file
> system is to lose the array hardware itself in which case all is lost
> anyway no matter how the drives were segregated, and if they really
> wanted fault tolerance they would have bought more than one array.  And
> around and around the arguments go.
> Is there anyone on the list who would like to weigh in with some real
> world experience and knowledge on the subject of using what I suppose is
> a rather beefy, high-performance array.
> --

Chris Dunscombe

Christallize Ltd

Other related posts: