RE: How many of you use S.A.M.E?

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <gorbyx@xxxxxxxxx>, <ax.mount@xxxxxxxxx>
Date: Fri, 2 Feb 2007 12:54:19 -0500
(I'm not sure how my email is screwing up, but I haven't seen this be
delivered, I'll check with an individual later to see if it comes through,
no need to busy the list with "I got its"

-----Original Message-----
From: Mark W. Farnham [mailto:mwf@xxxxxxxx]
Sent: Thursday, February 01, 2007 6:22 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: How many of you use S.A.M.E?

Okay, so there are whole books and many papers good, bad, and ugly on this
topic.

Grossly oversimplifying, and remembering that cached mitigates the downside
of i/o demand collisions, SAME operates like a statmux, that is every
requesting process sticks its straw into a single firehose (or garden hose
if you're unlucky) and drinks and blows bubbled in competition with all the
other processes and their straws.

I think it was JL who remarked emphatically that he would prefer that if
someone else running a database on the same disk farm as him wanted to
destroy their own performance, that was okay with him but he would prefer
that they could not destroy his performance. Whether that is parceled out as
different tablespaces isolated from each other within a single database or
multiple databases doesn't matter much for the central bit I'm trying to
convey. SAME avoids hot spots and tends to even out the load and that by
definition means if one use is making the disk farm go crazy everyone
suffers equally. That is neither all good nor all bad.

Let's say you have three databases designed to serve EMEA (Europe, the
Middle East, and Africa), AMER (The Americas Region, you know from north of
Canada all the way south to that bit that almost reaches Anarctica), and
ASIA. If those are peak load oriented to 9AM to 5PM in the local time zones
and you smear everything across all the disks evenly like SAME, then you
effectively get triple the i/o horsepower for each when you need it. That is
the polar case where SAME shines best.

Now let's say you have three applications that don't share data between them
but which simultaneously peak in activity (meaning i/o demand in this case).
SAME will minimize hot spots, but it will also maximize seek, read, and
write collisions. (I guess DBWR will migitate the write collisions somewhat,
especially if you segregate the redo destinations from the database files
[ignoring SAME in that case]).

What if two of the applications are beating the disk drives to death with
batch jobs and one of the applications is trying to service interactive user
requests? You lose. SAME applies the pain equally to everyone.

Now I'm not sure what became of a paper by Gary Sharpe that I helped write,
but it had the neatest pictures of a big disk farm and how it could quickly
become incomprehensible for humans to make good choices (like in your case
of 120 disks with 32 slices each) in the assembly of volumes for Oracle (or
anything else) to use. By the way, I'm looking for that paper if anyone has
a copy with the animated powerpoint. I suppose I could redo the work, but
that thing is a piece of art and I wouldn't do it justice. We introduced the
concept of "stripe sets", that is if you take some number of those 120 disks
and line them up and paint a different color across all the disks on each of
those 32 slices, you would be looking at 32 stripes and one stripe set.
Which disks and how many disks per stripe set is something you have to
determine for a particular disk farm, taking into account all the things
that queue on a disk request, redundancy, the most efficient pairwise
hardware mirroring of the drives if that is possible, etc. etc. etc..

So then if you look at the picture of the whole disk farm and you want to
parcel out storage to different applications or databases it is child's
play, almost boring, to allocate a good solution that makes sense.

In general though, when you add storage, the minimum increment tends to be a
whole tray full of disks (because you want to clone your stripe sets
definitions for ease of use, and if you just stick one drive in instead and
add a little piece of it to each stripeset based volume to grow the volume
you will immediately produce a hot spot so intense that it has been known to
destroy new drives well before their normal mean time between failure). SAME
has a protocol for adding single drives, and ASM automates blending in
additional storage over time.

It is entirely possible to arrange the Meta Devices to be stripes of a
stripeset and then to allocate the Meta Devices from a given stripeset to
only one database. This is part of the BORING protocol. You can implement it
with disk groups in ASM. If isolation of disk i/o demand is what you want,
that is as good a way to do it as any, either with ASM or by hand. For the
disk farm interfaces I am aware of, you have to do the book keeping to keep
track of which [meta devices, volumes, plexes, plex sets, make up your own
new name] are which and which disks comprise them. Using consistent
nomenclature can automatically create a virtual notebook, but you have to
remember that the volume managers are not going to enforce your nomenclature
against your typos.

Arranging things in this BORING way is also conducive to producing good
thinking about adding faster media types to an existing disk farm. Oh,
BORING is Balanced Organization of Resources in Natural Groups.
So if you add some 15Krpm, 256M cache drives to a farm that is previously
made of 7.2Krpm, 4M cache drives, don't mix them up in existing stripe sets.
Likewise if you add some mirrored (aka duplexed) persistent ram disk
devices. Make them be separate stripesets and separate disk groups if you're
using ASM.

So you still stripe and mirror everything. Just not all in one piece. And to
the extent you are able to divide the use of channels, cache, and disk
platters you will isolate (protect) the response time of one application
from high use by other applications. Isolating cache usage runs from easy to
impossible depending on what the disk array supports. Interestingly enough
if you cannot partition cache and your i/o demand underflows what the cache
is capable of, then after warmup any old SAME and a perfectly arranged
BORING layout will perform the same.  (You also won't be asking the question
below if your load demand underflows cache capability).

Now, lest someone think I am being unfair to SAME, remember that if you
don't actually have a disk performance problem, then some variety of SAME is
probably the cheapest to configure and maintain.
Also, notice that in the timezone peak load case, with BORING you have less
total disk throughput to serve each timezone while the disks for the other
time zones sit nearly idle. Of course that might be a good time to run batch
jobs and back up the other time zones, but SAME would have made all the
horsepower available to each time zone.

BORING takes a bit more configuration or a lot more depending on the
technology and tools you have. If you have no idea what the load demands of
the different applications or databases will be, then you don't really have
a basis for configuring BORING for performance advantage immediately, but if
you keep it really boring it will be easy to reallocate. There was a time
when it seemed like the vendors assembled the disk farms in the worst
possible way at the factory and then you had to pay extra for tool suites to
rebuild them sensibly, but I must have been imagining that which I write for
legal purposes.

SAME and BORING each have strong points and best use cases. What you seem to
indicate you have below may be what I call "HAPHAZARD" for which I have no
spelled out words. Autogenerated HAPHAZARD may be okay as long as you never
have to look at it and understand it. And you might not have to look at it,
except that you seem to think you are having i/o problems, so I guess you do
have to look at it.

Finally, if perchance you acquire a disk farm that is 50/50 divided for test
and production so that your load simulations in test will be "just like the
real thing" make very certain you understand which way it was cut in half
before you let someone start a load test after production is in service. If
they allocated half the disks and half the channels, etc. to each, you'll be
fine. If they cut each disk platter in half by definition of partitions....
you likely won't be fine.

Regards,

mwf



--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- RE: How many of you use S.A.M.E?
  - From: Carel-Jan Engel
References:
- Re: How many of you use S.A.M.E?
  - From: Alex Gorbachev
RE: How many of you use S.A.M.E?

Other related posts: