Re: oracle clusterware: stonith

  • From: "Jeremy Paul Schneider" <jeremy.schneider@xxxxxxxxxxxxxx>
  • To: raindoctor@xxxxxxxxx
  • Date: Tue, 26 Jun 2007 10:44:13 -0500

Depends on your definition of "stonith" I guess.  :)  Oracle does seem to
like redefining terms (like, say, "grid" computing...)  Seems like the
linux-ha project was using the term before Oracle though and they usually
refer to hardware (power) based solutions.  And most other people seem to
mean the same thing.  This technique is not used by RAC directly (although
Oracle can integrate with vendor-based clusterware that might support it).
Oracle clusterware itself definitely doesn't do that though.

Kevin's point was that a local userland-based reset is not failsafe.  (He's
not talking about hangcheck-timer but rather CRS-based reboots - by the way
it's CRITICAL that you run hangcheck-timer on linux/RAC deployments.)  CRS
doesn't reboot right away and doesn't stop requests that are queued in the
scsi drivers from hitting disk.

However he kinda skips past hangcheck-timer.  To be fair it's really a race
condition... there are two userland processes, the process updating the
hangcheck clock and the CRS process.  If neither can get processor time then
the machine reboots.  If both can get processor time then the machine
reboots.  The main problem would be if hangcheck gets processor time but CRS
does not.

Also, that situation by itself is not yet a split-brain.  On RAC you won't
get corruption until the first node has performance instance recovery.  Up
until that point the cluster just hangs until it has figured out the cluster
status.  So it's another race condition - the critical question is not how
long it takes Oracle to reboot the node but rather what's the relationship
between rebooting node 1 and node 2 starting recovery.  It would be a
problem if the alive node starts recovery, but the "dead" node isn't
completely "dead" yet.  Power-based solutions are simple and guranteed.
Software-based solutions are way more complicated.  I tend to think that
simple is better.

As Barb pointed out in the presentation the default timeout for CRS is 200s
- or 600s if you're using vendor clusterware.  If anything then this also
reinforces Kevin's paper awhile ago about clusterware... saying that
vendor-based clusterware is FAR from being out of the game.  I think that
this is one clear advantage of using something like serviceguard or hacmp -
more mature clusterware.

Also, as a disclaimer, I've worked a fair amount with RAC but I don't
consider myself an "expert"...  there are lurkers on oracle-l that know a
lot more than I do.  And I don't know the exact mechanics of every failure
situation in RAC.

Just my two cents...

-Jeremy



On 6/25/07, Pedro Espinoza <raindoctor@xxxxxxxxx> wrote:

Does oracle clusterware remotely power reset? If I recall correctly
from what I have read from Kevin Closson's blog, it is not so.
However, this presentation by an oracle insider claims that they
support stonith.

On slide 11:

IO fencing via stonith algorithm (remote power reset).



http://oukc.oracle.com/static05/opn/oracle9i_database/40168/053107_40168_source/index.htm



Thanks, Pedro.
--
//www.freelists.org/webpage/oracle-l





--
Jeremy Schneider
Chicago, IL
http://www.ardentperf.com/category/technical

Other related posts: