your init scripts are screwed somehow. A couple of thoughts you might
want to look into:
a look at the various init files and see if anything looks
incorrect. If it is, don’t try hand editing these files as
that might really screw things up even more. J
there any core dumps? They seem to end up in quite a few different
locations, so try checking as root for core dumps using something like:
cd $ORA_CRS_HOME ; find . -name "*core*"
cd $ORACLE_HOME ; find . -name "*core*"
the location and existence of the OCR file. On Linux, look at
'/etc/oracle/ocr.loc'. This file should contain:
Looking at this file on each node is useful in case one of the
nodes is configured incorrectly and is referring to a different OCR device.
an ocrdump as root so we can access the configuration data:
you can’t find anything screwy there, try setting _USR_ORA_DEBUG=1
in racgwrap on both CRS_HOME and ORACLE_HOME. This does not require that CRSD
is restarted, but should give you better diagnostic information.
developers is like herding cats."
Oracle DBA Handbook
it's not. It's much harder than that!"
long-term Oracle DBA
oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of David Barbour
Sent: Friday, 7 October 2005 12:00
To: Bryan Thomas; Oracle_L
Subject: RE: Interesting Issue
with RAC - Any Advice Appreciated
Thanks for the reply Bryan. I've tried a whole bunch of
times,taking the node down using init 0, taking the node down using reboot, and
turning off the power. Interestingly enough, this is the 'primary' node,
the one from which I performed the install. The other node works perfectly.
I've tried several different approaches on the 'faulty' node, including leaving
the .loc_<node> and <node>.pid files in $ORA_CRS_HOME/crs/init and
removing them after running a srvctl stop nodeapps -n rhlvoo4 command. Same
results. Haven't tried to disable then reenable it though. Let you
know how that goes.
I forgot to metion that I can bring up the node manually using either
crs_ or srvctl commands, it just won't start automatically. I have
altered the cluster registry to enable the ons start automatically using
crs_stat -p and crs_register to make the modification to auto_start for this
service (which gets set to 0 when the 10.1.0.4 patch is applied). That
seems to have worked since the ons is targeted online.
How many times did you reboot the "crashed" server? Sometimes it
several tries to get RAC fully back up.
You might also want to try to disable crsd and reboot. Then enable crsd and
reboot again. That seems to fix a lot of problems.
I have not worked with RAC on RHEL V4, so I'm not exactly sure what the
Let me know if any of this helps.
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of David Barbour
Sent: Wednesday, October 05, 2005 10:37 PM
Subject: Interesting Issue with RAC - Any Advice Appreciated
I have an Oracle RAC installed on a pair of Dell
PE6850s w/2 processors and 8GB of RAM. Running RHEL
4.0 QU1 and Oracle 10.1.0.4. Using ASM on direct
attached CX300 with qLogic HBAs.
Thought everything was just fine - until I was testing
the RAC by crashing the nodes. If I crash a node, I
see the VIP migrate to the survivor. When I bring up
the crashed box, CRSD starts and the VIP migrates back
to it's 'home' box, but the other nodeapps, gsd and
ons, don't start, and neither does the listener nor
the instance. The crsd goes bye-bye (actually
Here's what I see in crs_stat -t:
[oracle@rhlv005 ~]$ crs_stat -t
Name Type Target State Host
ora.prod1.db application ONLINE ONLINE
ora....11.inst application ONLINE OFFLINE
ora....12.inst application ONLINE ONLINE
ora....SM1.asm application ONLINE ONLINE
ora....04.lsnr application ONLINE OFFLINE
ora....004.gsd application ONLINE OFFLINE
ora....004.ons application ONLINE OFFLINE
ora....004.vip application ONLINE ONLINE
ora....SM2.asm application ONLINE ONLINE
ora....05.lsnr application ONLINE ONLINE
ora....005.gsd application ONLINE ONLINE
ora....005.ons application ONLINE ONLINE
ora....005.vip application ONLINE ONLINE
The crs log is spectacularly informative:
2005-10-05 18:18:34.127: CRS Daemon Started.
2005-10-05 18:18:34.862: Attempting to stop
`ora.rhlv004.vip` on member `rhlv005`
2005-10-05 18:18:35.499: Stop of `ora.rhlv004.vip` on
member `rhlv005` succeeded.
2005-10-05 18:18:35.720: Attempting to start
`ora.rhlv004.vip` on member `rhlv004`
2005-10-05 18:18:41.822: Start of `ora.rhlv004.vip` on
member `rhlv004` succeeded.
2005-10-05 18:18:41.922: CRS-1007: Failed after
successful dependency consideration
2005-10-05 18:18:33.090: CRSD-1: [CMDMAIN:1336832]
Restart waiting for Oracle CRSD to start
2005-10-05 18:18:42.174: CRSD-1: Complete Restart