Interesting Issue with RAC - Any Advice Appreciated

  • From: David Barbour <david.barbour@xxxxxxxxxxxxx>
  • To: Oracle_L <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 5 Oct 2005 20:37:01 -0700 (PDT)

I have an Oracle RAC installed on a pair of Dell
PE6850s w/2 processors and 8GB of RAM.  Running RHEL
4.0 QU1 and Oracle 10.1.0.4.  Using ASM on direct
attached CX300 with qLogic HBAs.  

Thought everything was just fine - until I was testing
the RAC by crashing the nodes.  If I crash a node, I
see the VIP migrate to the survivor.  When I bring up
the crashed box, CRSD starts and the VIP migrates back
to it's 'home' box, but the other nodeapps, gsd and
ons, don't start, and neither does the listener nor
the instance.  The crsd goes bye-bye (actually
'defunct').

Here's what I see in crs_stat -t:

[oracle@rhlv005 ~]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.prod1.db   application    ONLINE    ONLINE   
rhlv005
ora....11.inst application    ONLINE    OFFLINE
ora....12.inst application    ONLINE    ONLINE   
rhlv005
ora....SM1.asm application    ONLINE    ONLINE   
rhlv004
ora....04.lsnr application    ONLINE    OFFLINE
ora....004.gsd application    ONLINE    OFFLINE
ora....004.ons application    ONLINE    OFFLINE
ora....004.vip application    ONLINE    ONLINE   
rhlv004
ora....SM2.asm application    ONLINE    ONLINE   
rhlv005
ora....05.lsnr application    ONLINE    ONLINE   
rhlv005
ora....005.gsd application    ONLINE    ONLINE   
rhlv005
ora....005.ons application    ONLINE    ONLINE   
rhlv005
ora....005.vip application    ONLINE    ONLINE   
rhlv005

The crs log is spectacularly informative:

2005-10-05 18:18:34.127: CRS Daemon Started.
2005-10-05 18:18:34.862: Attempting to stop
`ora.rhlv004.vip` on member `rhlv005`
2005-10-05 18:18:35.499: Stop of `ora.rhlv004.vip` on
member `rhlv005` succeeded.
2005-10-05 18:18:35.720: Attempting to start
`ora.rhlv004.vip` on member `rhlv004`
2005-10-05 18:18:41.822: Start of `ora.rhlv004.vip` on
member `rhlv004` succeeded.
2005-10-05 18:18:41.922: CRS-1007: Failed after
successful dependency consideration

2005-10-05 18:18:33.090: CRSD-1: [CMDMAIN:1336832]
Restart waiting for Oracle CRSD to start
2005-10-05 18:18:42.174: CRSD-1: Complete Restart
Application Request


Any ideas? 

--
//www.freelists.org/webpage/oracle-l

Other related posts: