e-sending only to Oracle-L (overquoting....) > I have a couple of follow up questions: > 1. When was the last time you executed a successful failover test of this > environment? This time has been the first one I worked on that platform, which is under development by another team, but looking into log files I saw some failed and few successful attempts for allocating resources from the node which failed (after) to the partner and vice versa. > 2. What has changed since that last successful test? (assuming nothing) I assume nothing, too :)), but I cannot be sure... > 3. What are the public, private, and VIP IPs for these nodes? Public and virtual are in the same network xxx.xxx.xxx.xxx/24, private IPs are completely different 10.10.10.xxx/24 > It seems > at least possible that somehow there's a network misconfiguration > (however unlikely that may be). > It seems unusual for a VIP resource to be in UNKNOWN state since VIPs > are generally lightweight and there's little effort associated with > failover. When resources are in UNKNOWN, I generally try "crs_stop -f > <resource_name>" to clear the current state. Then I'd try "crs_start > -c <resource_name> <node-where-you-want-it-to-start>" to see if you > can start it manually. Hopefully, that (possibly in combination with > answers to the above questions) will yield something worth > investigating. > Dan Nodes are remote, so its difficult to check the whole network physical configuration for problems/conflicts; I didn't try the crs_stop -f command, but I will if this issue raises again. Many thanks for your help, Alessandro > Alessandro Vercelli wrote: > >The crash exact time is not clearly defined, in the morning of May 9th, it was >a database crash, not system; crsd.log reported many messages like: > >2008-05-09 12:32:33.833: [ CRSEVT][3695033264]0CAAMonitorHandler :: 0:Action S >cript /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for ora.<failed >node>.ons! (timeout=600) > >each message referred to a different resource. > >Last week, I tried to restart the failed node (in the meantime, other people ma >de other attempts) and crsd.log reported, among other messages, the following: > >2008-07-07 16:10:18.743: [ CRSRES][3781585840]0CRS-1028: Dependency analysis f >ailed because of: >'Resource in UNKNOWN state: ora.<failednode>.vip' > >Using crs_stat -t the ora.<failednode>.vip resource allocation was on the partn >er node - not the failed one - and its state was UNKNOWN (as expected). > >My opinion is that, at the crash time, the partner node performed an automatic >failover but it failed; crsd.log of partner node: > >2008-05-09 11:55:55.278: [ CRSRES][3686595504]0Attempting to start `ora.<faile >dnode>.vip` on member `<partnernode>` >2008-05-09 11:56:58.305: [ CRSAPP][3686595504]0StartResource error for ora.<fa >ilednode>.vip error code = -2 >2008-05-09 11:57:05.429: [ CRSEVT][3697085360]0CAAMonitorHandler :: 0:Action S >cript /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for ora.<failed >node>.vip! (timeout=60) > >and, finally: > >2008-05-09 11:58:01.422: [ CRSRES][3686595504]0X_OP_StopResourceFailed : Stop >Resource failed >(File: rti.cpp, line: 1698 > >2008-05-09 11:58:01.422: [ CRSRES][3686595504][ALERT]0`ora.<failednode>.vip` o >n member `<partnernode>` has experienced an unrecoverable failure. >2008-05-09 11:58:01.422: [ CRSRES][3686595504]0Human intervention required to >resume its availability. >2008-05-09 11:58:01.444: [ CRSRES][3686595504]0CRS-1028: Dependency analysis f >ailed because of: >'Resource in UNKNOWN state: ora.<failednode>.vip' > >Sorry for the *mess* of messages..... >Thanks, >Alessandro > > >If you think it's related to the resource not starting because of some >dependency, then I'd suggest looking at >$CRS_HOME/log/<nodename>/crsd/crsd.log on each node (especially the >crashed node) and see what's there around the time of startup. > >If the node won't boot, try booting it into single user mode and >disabling clusterware from starting if you think clusterware is what's >not allowing it to boot completely. > >Dan > >Alessandro Vercelli wrote: > > >O.S.: RHEL AS4 >Hardware is HP BL45P, 4 x AMD Dual core, 8 Gb RAM. >Oracle 10.2.0.1, RAC and Clusterware <cut> >The failed attempts reported on the console that the listener nodeapp could not > start; looking into network configuration, I noticed vip IP address for the fa >iling listener was not allocated on that node but on its partner; please, what >log files do you suggest for errors? >Thanks, >Alessandro > -- //www.freelists.org/webpage/oracle-l