Re: Oracle RAC and VIPs

From: "Alessandro Vercelli" <alever@xxxxxxxxx>
To: "dannorris" <dannorris@xxxxxxxxxxxxx>
Date: Tue, 15 Jul 2008 11:29:45 +0200
The crash exact time is not clearly defined, in the morning of May 9th, it was 
a database crash, not system; crsd.log reported many messages like:

2008-05-09 12:32:33.833: [  CRSEVT][3695033264]0CAAMonitorHandler :: 0:Action 
Script /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for 
ora.<failednode>.ons! (timeout=600)

each message referred to a different resource.

Last week, I tried to restart the failed node (in the meantime, other people 
made other attempts) and crsd.log reported, among other messages, the following:

2008-07-07 16:10:18.743: [  CRSRES][3781585840]0CRS-1028: Dependency analysis 
failed because of:
'Resource in UNKNOWN state: ora.<failednode>.vip'

Using crs_stat -t the ora.<failednode>.vip resource allocation was on the 
partner node - not the failed one - and its state was UNKNOWN (as expected).

My opinion is that, at the crash time, the partner node performed an automatic 
failover but it failed; crsd.log of partner node:

2008-05-09 11:55:55.278: [  CRSRES][3686595504]0Attempting to start 
`ora.<failednode>.vip` on member `<partnernode>`
2008-05-09 11:56:58.305: [  CRSAPP][3686595504]0StartResource error for 
ora.<failednode>.vip error code = -2
2008-05-09 11:57:05.429: [  CRSEVT][3697085360]0CAAMonitorHandler :: 0:Action 
Script /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for 
ora.<failednode>.vip! (timeout=60)

and, finally:

2008-05-09 11:58:01.422: [  CRSRES][3686595504]0X_OP_StopResourceFailed : Stop 
Resource failed
(File: rti.cpp, line: 1698

2008-05-09 11:58:01.422: [  CRSRES][3686595504][ALERT]0`ora.<failednode>.vip` 
on member `<partnernode>` has experienced an unrecoverable failure.
2008-05-09 11:58:01.422: [  CRSRES][3686595504]0Human intervention required to 
resume its availability.
2008-05-09 11:58:01.444: [  CRSRES][3686595504]0CRS-1028: Dependency analysis 
failed because of:
'Resource in UNKNOWN state: ora.<failednode>.vip'

Sorry for the *mess* of messages.....

Thanks,

Alessandro


>If you think it's related to the resource not starting because of some 
>dependency, then I'd suggest looking at 
>$CRS_HOME/log/<nodename>/crsd/crsd.log on each node (especially the 
>crashed node) and see what's there around the time of startup.
>
>If the node won't boot, try booting it into single user mode and 
>disabling clusterware from starting if you think clusterware is what's 
>not allowing it to boot completely.
>
>Dan
>
>Alessandro Vercelli wrote:
>> O.S.: RHEL AS4
>> Hardware is HP BL45P, 4 x AMD Dual core, 8 Gb RAM.
>> Oracle 10.2.0.1,  RAC and Clusterware
>>
>> Anyway, the issue became "crabbed", since the last attempt to start the 
>> failing node succeeded, so I've one more task now...:)).
>>
>> The failed attempts reported on the console that the listener nodeapp could 
>> not start; looking into network configuration, I noticed vip IP address for 
>> the failing listener was not allocated on that node but on its partner; 
>> please, what log files do you suggest for errors?
>>
>> Thanks,
>>
>> Alessandro
>>   
>

--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- Re: Oracle RAC and VIPs
  - From: Dan Norris
Re: Oracle RAC and VIPs

Other related posts: