RE: Storage array advice anyone?

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <Stephen.Barr@xxxxxxxxx>, <Amir.Hameed@xxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 16 Dec 2004 10:32:54 -0500

I would be very interested to see 3, 4, 8, 16 in parallel results.

Also results of parallel 2 and 1 for a single query.

Why, you ask? That will quickly tell you whether you are seeing the
synergistic effects of parallel access on your disk heads that only obtains
in activities that are effectively single user access to a disk farm.

If 3, 4, 8, 16 are degraded only incrementally to the point where CPU is the
bottleneck, and if parallel 2 and 1 are faster than you would expect from
the reduced parallelism, that would likely be the case.

(This is entirely different from finding out what you are waiting for with
an extended trace, which may be your most productive tuning activity.)

Then the question would become can you achieve a higher throughput in the
multi-user case by widening the stripe unit so you tend to hit no more than
two heads per fresh read. This all presumes that you've been degraded to
spinning rust speed rather than cache, controller, or wire speed limits on
the transmission of data.

Have you checked the aggregate dataflow from your media to your computer?
Once that is pegged, it doesn't really matter how the disk is configured.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of Barr, Stephen
Sent: Wednesday, December 15, 2004 3:35 PM
To: Amir.Hameed@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: RE: Storage array advice anyone?


Hi Amir,
        We also have a DMX 3000 box and have it striped 8 ways.

        We have 83 meta devices, each meta device is ~67Gb is size and is
made of eight 8.43Gb volumes. Each volume is RAID 1, however, each meta
volume is striped across it's eight individual volumes with a stripe size of
0.94Mb.

        The issue we have at present is that we are a datawarehouse doing
lots of 1Mb direct path reads. Each read will hit 8 physical devices (with
1Mb stripe unit size at OS). I assuming this is a bad thing - surely each of
our reads should be hitting only a single device? i.e. we're waiting on 8
devices instead of only one.

        I've performed a number of tests with PQ on the current setup, and
it looks like the IO subsystem is saturated with a single PQ query (degree
4) to such an extent that two PQ queries running together BOTH take twice as
long to complete....surely this isn't the pattern we should be seeing? It
essentially means that the system is 100% non-scalable.

1 query PARALLEL 4 (FTS)

1Mb Stripe unit         5 mins 18 secs
512k Stripe unit                5 mins 18 secs
128k Stripe unit                5 mins 52 secs
CONCAT                  5 mins 10 secs


2 queries hitting same table PARALLEL 4 (FTS)

1Mb Stripe unit         8 mins 43 secs (each)
512k Stripe unit                10 mins 12 secs (each)
128k Stripe unit                8 mins 35 secs (each)
CONCAT                  8 mins 10 secs (each)


Does anyone have any experience of setting up this type of storage solution
for a data warehouse?

Thanks,

Steve.

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Hameed, Amir
Sent: 15 December 2004 19:31
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: Storage array advice anyone?

While the discussion is going on these heavy duty SAN boxes, I would
also like to bounce a question on the disks layout in SAN. We have
recently acquired an EMC DMX 3000 box. Our current production is running
on EMC 8830, four-way striped, and is going out of lease in a few
months. So, we will be migrating our mission critical production system
to the newly arrived DMX 3000 box soon. I have gone through a white
paper from James Morle, "Sane SAN", which basically suggests that for
optimal SAN disk layout, assume that there is no cache available and
stripe disks optimally and consider cache as a added benefit.

In our existing configuration on the 8830 frame, the Meta Volumes is
created from four hypers and is 20GB in size. The Metas are then
presented as a volume to the server and each mount point is based on a
20GB volume. We are not double-striping the volume at the host level.
The drives in the 8830 frame are 73 GB in size and do an average of ~
120 reads/seconds and ~ 110 writes/seconds. So, the I/O bandwidth of a
Meta would be ~ 480 r/s (4x120) and ~ 440 w/s (4x110).

Having said that, I have done some basic calculations on the IOs that
Oracle is issuing (on the 8830 frame) from the v$filestat and v$tempstat
views, aggregated on per mount point basis. From what I have seen is
that on some mount points Oracle is doing up to 800 reads per second.
Based upon the fact that on a highly available system, it is not always
possible to move around hot files without incurring a downtime, I am
exploring the idea of striping the new DMX frame 8-ways. This DMX frame
has 146 GB drives and based upon these drives specifications, they can
do ~ 130 r/s and ~ 120 w/s. So, an 8-way striped Meta volume would be
able to do 1040 r/s and 960 w/s. I was in the HotSos symposium this
Summer and I asked Steve Adams this question and he also suggested going
with 8-way striping. Is there anyone in this DL who is using a DMX frame
and striped 8-ways ?

Does anyone has any advise on 4-way versus 8-way striping ? EDS is our
service provider and they are not buying the idea of 8-way striping as
they and EMC think that the cache on frame can resolve all the issues,
which is not true because the cache has to de-stage at some point and I
have seen high IO waits on the 8830 frame from sar and I don't believe
that cache is nirvana.

Thank you
Amir

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Stephen Lee
Sent: Wednesday, December 15, 2004 10:33 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: Storage array advice anyone?

=3D20
I appreciate the discussion on the topic.  I think additional
considerations on this particular array (Hitachi TagmaStore 9990) are
that the "normal" configuration (according to Hitachi) is that the disks
are in groups of 8; each group is a stripe with parity; the parity
cycles around all drives.  When a bad block occurs, the block is NOT
replaced by a spare block on the drive, but the drive is failed and
replaced by a hot spare, and phone home occurs.  Which -- I guess -- is
a fairly aggressive drive replacement scheme.

There appears to be agreement that the best performance for most cases
(note: most cases) is to stripe everything across all drives.  There
does appear to be some remaining discussion, from a fault tolerance
standpoint, about whether to go strictly with stripe + parity and trust
that Hitachi really has worked out the fault tolerance issues, or assume
that claims from Hitachi are just a bunch of sales hype and insist on
stripe + mirror.  Healthy skepticism is useful, but one does not want to
be basing that skepticism on outdated ideas.  That is what a lot of this
comes down to: Which ideas and rules are outdated -- given the
capabilities of this new gee whiz hardware -- and which still hold.

The astute reader will note that the stripe + parity is, more or less,
raid 5-ish.  But yet again, we have a manufacturer who claims that in
their case the I/O speed penalty is no longer an issue.  In the case of
this array, there appears to be some real world experience to support
that claim.  Any comments from those who know otherwise, are most
welcome.  Again, another one of those "Have some of the ideas about this
become outdated?" sort of thing.

--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l


.


-----------------------------------------------------------------------
Information in this email may be privileged, confidential and is
intended exclusively for the addressee.  The views expressed may
not be official policy, but the personal views of the originator.
If you have received it in error, please notify the sender by return
e-mail and delete it from your system.  You should not reproduce,
distribute, store, retransmit, use or disclose its contents to anyone.

Please note we reserve the right to monitor all e-mail
communication through our internal and external networks.
-----------------------------------------------------------------------

--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l

Other related posts: