RE: Oracle RAC cost justification?

  • From: "Marquez, Chris" <cmarquez@xxxxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 2 Jun 2005 14:33:57 -0400

"Oracle RAC cost justification?"
It not hard to justify the COST once the NEED is justified.
---REGARDING COST:
>> "RAC and a cost effective solution is an oxymoron."
There is some truth to this too, but lets be fair.
RAC was once only for those with multi-million dollar budgets and few DBA's 
could even put their hands on RAC (aka OPS).
Today, that is not true.  Not to start a Oracle License flame but this doc was 
recently posted on this list.
RAC is *included* for 10g SE & SE One!?
  Oracle Database 10g Product Family An Oracle White Paper
  Jan. 2004
  ...
  Feature/Option:  Oracle Real Application Clusters
  SE & SEO: Y
  EE: Y
  Notes: Extra cost with Enterprise Edition, included with SE, not in SE One

PS RAC is listed as a "Scalability" option and *not* "High Availability" in 
this doc!...but I disagree. ;o)


---REGARDING NEED:
Mogens Nørgaard's document and Tim Gorman post are great regarding the need and 
use of RAC.

"You Probably Don?t Need RAC" is probably one of the best and "right on" docs 
about RAC and the history of Oracle Clusters (OPS) I howave ever seen.  
However, I think the doc is lacking in two ways.
While the author (Mogens Nørgaard) gives much (deserved) credit to developers 
Oracle Cluster technology he under plays one significant and key change that 
makes OPS, not RAC.
The ability to complete WRITE-WRITE instance transactions in Cache Fusion.  
Most people are surprised, as was I, when I found out that "Cache Fusion" 
existed in 8i.
The problem is that until Oracle Cluster technology overcame the ability for 
all nodes to WRITE-share-WRITE the same block without writing it to disk 
*first* (a PING) OPS was just not practical nor scalable.
Without this (cache) WRITE-WRITE ability, we would not even be discussing RAC 
here today.

What I like to call "The RAC Reality" stated by Oracle;
"if *your* application will not scale by adding CPU on a single server, it will 
not scale on RAC"
This is a fact than many developer/managers don't know (or don't want to hear.)

I have worked with OPS/RAC since Oracle 7.  It is a cool technology and has 
very particle uses and I would continue to use and recommend it.
However, I personally feel that whether it is Oracle's fault or applications 
fault, RAC never lives up to the usability (running active-active, TAF, etc.) 
expectations.
For the 3 clients I worked, who use RAC/OPS they all been, in the end, only 
able to use and benefit from RAC using it in "Failover Mode" ("active-passive").
Meaning all end user sessions run from single instance at any one time.  Should 
that instance die/fail/terminate we "Failover" to the other "passive" instance.
Again many reasons for this...most are technical issues that lead to political 
issues.  After the political issues, no one including the DBA's are willing and 
want to "stand behind" RAC and running in full RAC "active-active" mode.
Even the savvy RAC newbiess starting asking "why do we have "global cache" 
waits now?" :o|

Also, I must admit that I have see as many OPS/RAC successes, as I have seen 
failures.
I have seen cluster and OPS/RAC hangs the *cause* the downtime...as many times 
as I have seen downtime reduced by RAC/OPS during the loss of a node.
It just tends to be that the loss of a node/server is a longer outage and thus 
OPS/RAC pays off...but trust me it is not a good feeling being in a "day after 
meeting" explaining that OPS/RAC hang *caused* an application outage!
Enough of these meetings and week long TAR's and you will run in RAC "Failover" 
or "active-passive" Mode too!


---REGARDING SCALABILITY:
I have mixed feeling on this, but no technical data to support my ideas...so I 
wont spew them.
However, to be fair RAC has a missed scalability, or rather administrative 
benefit.
You can reduce the number a databases and "application glue" with RAC.
Simply, I can join databases that once used Snapshots, into a single database 
and run different user groups on different instances.  I just reduced my 
database liability and administration by one, AND eliminated the distributed 
nature of my database app.  "Application Partitioning" is not a bad thing in a 
centralized world!


---REGARDING HA
>> My definition HA...
>> you have a different definition of HA
>> well that maybe that's where we're miscommunicating

If a "single point of failure" is the *rule* for all HA, then true HA can never 
be achieved...as the plan Earth is a "single point of failure".
I personally believe that if you have adequate redundancy, then you have some 
(solid) level of HA.
I find when talking with groups of people its best to discuss two topics; "HA" 
and "Disaster Recovery", rather than just one.
To me HA is the ability to recover from a failure..."having redundancy".
And Disaster Recovery is the ability to recover from a true catastrophic 
disaster, "loss of all redundancy".

Think about it, RAID keeps the "system" running in the event of a 
failure...redundant support.
While are "system backups" saves us from a disaster such as loss of all 
redundancy.

IMHO, HA is the *active* use of redundant resources...while Disaster Recovery 
is *passive* use duplicate resources.  RAC for me is HA...as I have seen it 
work that way.


---REGARDING & DBA's
>>so getting a RAC implementation on your project 
>>helps you get your next job.

There is some truth to that! :o|


This is a good topic and conversation.


Chris Marquez
Oracle DBA



-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx on behalf of Tim Gorman
Sent: Thu 6/2/2005 12:24 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: Oracle RAC cost justification?
 
Instead of arguing about whether RAC is good at scalability or HA or
cost-effectiveness, how about citing specifics?

Q1 - RAC and HA:

    - What does RAC do better than any other possible solution (i.e. OS
      clustering, DataGuard, volume replication, etc)?  How and why?
    - What other solutions are better than RAC at HA and why?

Q2 - RAC and scalability:

    - When does RAC present a better scalability solution than, say,
      simply buying a larger server?
    - What scalability bottlenecks does a RAC solution resolve better
      than alternatives (i.e. larger server, RAM disk, etc)?
    - What other solutions are better than RAC at scalability and why?

Q3 - RAC and cost-effectiveness:

    - Compare and contrast the explicit (and implicit) costs to a
      RAC versus non-RAC configuration for the following scenarios:

        * 8 CPUs of processing capacity required, no HA reqmts stated
        * 8 CPUs of processing capacity required, MTTR less than 1 hour
        * same as above for 64 CPUS of processing capacity

I've got my own ideas as to the answers to these questions, and I'd be glad
to share them:

    Q1: When is RAC the best solution for scalability?

    Tim-A1:
        When you can't buy a larger server.

        - The largest Linux server of which I'm aware is 8 CPUs.
          I'm not sure on this, though...
        - The largest Windows server of which I'm personally aware
          is also 8 CPUs, but I've heard that Win2003 can support
          up to 64 CPUs?
        - For AIX, the largest server is 32 CPUs...
        - For Solaris, the largest server is 144 CPUs...
        - For HP-UX, the largest server is (I think) 128 or 256
          CPUs...

        Of course, these are constantly shifting numbers;  if I'm
        wrong on any of these, my apologies in advance...

        Anything larger can only be accomplished by RAC.  Anything less
        scales better without RAC.  So, RAC as a scalability solution
        is a platform-dependent choice, also dependent on your needs.

    Q2: When is RAC the best solution for HA?

    Tim-A2:
        - For unplanned server or instance outages (a.k.a. "failure"),
          RAC (when running in an active-passive configuration) has
          the fastest service failover of all options.  When RAC is
          running in an active-active configuration, service failover
          takes somewhat longer, possibly as long as other HA
          alternatives.  However, persistent connections using
          "transparent application failover" (TAF) are an attractive
          possibility, though TAF must be configured intelligently
          (i.e. commonly recommended config does not support
          "failback") and has many restrictions (i.e. PL/SQL package
          state does not failover, DML does not failover, etc)
        - For planned server or instance outages (a.k.a. "maintenance"),
          RAC is not the very best HA alternative;  OS clustering
          packages (i.e. Veritas, Sun HA, HP MCSG, AIX HACMP, etc)
          are more robust
        - For planned or unplanned storage or data center outages
          (a.k.a. disaster recovery), RAC does not (at this time)
          fit in this discussion

On the cost-effectiveness question, I've run out of time (and energy) for
one response...

What do y'all think?

-Tim


on 6/1/05 8:33 PM, Khemmanivanh, Somckit at
somckit.khemmanivanh@xxxxxxxxxxxxxxxx wrote:

> Whoa, a SAN is non-redundant???
> =20
> I agree it could still be a SPOF but it certainly is redundant component =
> wise...
> =20
> I guess you're entitled to your opinion regarding rather RAC provides HA =
> for the Oracle Instance or not. Keyword here is Instance. RAC provides =
> HA at the Oracle instance, that does not exclude you from addressing the =
> other SPOFs in your environment (to what degree your budget =
> allows)...but if 1 instance in the RAC cluster should go down, there =
> should be others available to handle the workload...
> =20
> My definition HA for the Oracle instance is really just that there is =
> minimal downtime should 1 instance in the RAC cluster be unavailable. =
> What does any other HA clustering solution provide? It simply restarts =
> the Oracle instance on  the standby node...
> =20
> If you have a different definition of HA, well that maybe that's where =
> we're miscommunicating...=20
> =20
> 
> ________________________________
> 
> From: Jared Still [mailto:jkstill@xxxxxxxxx]
> Sent: Wed 6/1/2005 5:18 PM
> To: Khemmanivanh, Somckit
> Cc: oracle-l@xxxxxxxxxxxxx
> Subject: Re: Oracle RAC cost justification?
> 
> 
> HA for the Oracle Instance?
> 
> You're kidding, right?
> 
> If you have SPOF, it isn't HA.
> 
> A non-dedundant disk system is a rather glaring SPOF.
> 
> 
> On 6/2/05, Khemmanivanh, Somckit <somckit.khemmanivanh@xxxxxxxxxxxxxxxx> =
> wrote:=20
> 
> Well RAC is not the SAN right? RAC is HA for the Oracle Instance.
> =20
> If you're saying the total HA solution involves eliminating all SPOFs, =
> I'd agree but cost is always a limiting factor in that regard...
> =20
> 
> Thanks!=20
> 
> =20
> 
> ________________________________
> 
> From: Jared Still [mailto:jkstill@xxxxxxxxx]=20
> Sent: Wednesday, June 01, 2005 4:04 PM
> To: Khemmanivanh, Somckit
> Cc: Vlado Barun; oracle-l@xxxxxxxxxxxxx
> Subject: Re: Oracle RAC cost justification?
> =09
> =09
> =09
> 
> On 6/1/05, Khemmanivanh, Somckit =
> <somckit.khemmanivanh@xxxxxxxxxxxxxxxx> wrote:=20
> 
> 
> Let's say we already have Service Guard in house. For new
> implementations should we go with MCSG or look at RAC? RAC is an HA =
> and
> scalability solution (MCSG is purely HA). I'm trying to get a good
> =09
> 
> 
> RAC might be many things, but HA is not one of them.
> =09
> The disk subsystem is a single point of failure: you only have one =
> database.

--
//www.freelists.org/webpage/oracle-l




--
//www.freelists.org/webpage/oracle-l

Other related posts: