RE: Interesting Issue with RAC - Any Advice Appreciated

  • From: David Barbour <david.barbour@xxxxxxxxxxxxx>
  • To: Bryan Thomas <bthomas@xxxxxxxxxxxxxx>, Oracle_L <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 6 Oct 2005 06:59:35 -0700 (PDT)

Thanks for the reply Bryan.  I've tried a whole bunch of times,taking the node 
down using init 0, taking the node down using reboot, and turning off the 
power.  Interestingly enough, this is the 'primary' node, the one from which I 
performed the install.  The other node works perfectly.  I've tried several 
different approaches on the 'faulty' node, including leaving the .loc_<node> 
and <node>.pid files in $ORA_CRS_HOME/crs/init and removing them after running 
a srvctl stop nodeapps -n rhlvoo4 command. Same results.  Haven't tried to 
disable then reenable it though.  Let you know how that goes.
 
I forgot to metion that I can bring up the node manually using either crs_ or 
srvctl commands, it just won't start automatically.  I have altered the cluster 
registry to enable the ons start automatically using crs_stat -p and 
crs_register to make the modification to auto_start for this service (which 
gets set to 0 when the 10.1.0.4 patch is applied).  That seems to have worked 
since the ons is targeted online.
 


Bryan Thomas <bthomas@xxxxxxxxxxxxxx> wrote:
David,

How many times did you reboot the "crashed" server? Sometimes it takes a
several tries to get RAC fully back up.

You might also want to try to disable crsd and reboot. Then enable crsd and
reboot again. That seems to fix a lot of problems.

I have not worked with RAC on RHEL V4, so I'm not exactly sure what the
problem is.

Let me know if any of this helps.

-Bryan Thomas

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of David Barbour
Sent: Wednesday, October 05, 2005 10:37 PM
To: Oracle_L
Subject: Interesting Issue with RAC - Any Advice Appreciated


I have an Oracle RAC installed on a pair of Dell
PE6850s w/2 processors and 8GB of RAM. Running RHEL
4.0 QU1 and Oracle 10.1.0.4. Using ASM on direct
attached CX300 with qLogic HBAs.

Thought everything was just fine - until I was testing
the RAC by crashing the nodes. If I crash a node, I
see the VIP migrate to the survivor. When I bring up
the crashed box, CRSD starts and the VIP migrates back
to it's 'home' box, but the other nodeapps, gsd and
ons, don't start, and neither does the listener nor
the instance. The crsd goes bye-bye (actually
'defunct').

Here's what I see in crs_stat -t:

[oracle@rhlv005 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.prod1.db application ONLINE ONLINE
rhlv005
ora....11.inst application ONLINE OFFLINE
ora....12.inst application ONLINE ONLINE
rhlv005
ora....SM1.asm application ONLINE ONLINE
rhlv004
ora....04.lsnr application ONLINE OFFLINE
ora....004.gsd application ONLINE OFFLINE
ora....004.ons application ONLINE OFFLINE
ora....004.vip application ONLINE ONLINE
rhlv004
ora....SM2.asm application ONLINE ONLINE
rhlv005
ora....05.lsnr application ONLINE ONLINE
rhlv005
ora....005.gsd application ONLINE ONLINE
rhlv005
ora....005.ons application ONLINE ONLINE
rhlv005
ora....005.vip application ONLINE ONLINE
rhlv005

The crs log is spectacularly informative:

2005-10-05 18:18:34.127: CRS Daemon Started.
2005-10-05 18:18:34.862: Attempting to stop
`ora.rhlv004.vip` on member `rhlv005`
2005-10-05 18:18:35.499: Stop of `ora.rhlv004.vip` on
member `rhlv005` succeeded.
2005-10-05 18:18:35.720: Attempting to start
`ora.rhlv004.vip` on member `rhlv004`
2005-10-05 18:18:41.822: Start of `ora.rhlv004.vip` on
member `rhlv004` succeeded.
2005-10-05 18:18:41.922: CRS-1007: Failed after
successful dependency consideration

2005-10-05 18:18:33.090: CRSD-1: [CMDMAIN:1336832]
Restart waiting for Oracle CRSD to start
2005-10-05 18:18:42.174: CRSD-1: Complete Restart
Application Request


Any ideas?

--
//www.freelists.org/webpage/oracle-l



Other related posts: