RE: RAC Full cluster outage (almos)

  • From: "Crisler, Jon" <Jon.Crisler@xxxxxxx>
  • To: <exriscer@xxxxxxxxx>, "Oracle-L" <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 11 Mar 2009 13:48:54 -0400

I have seen this problem occur with Linux, due to a problem with GLIBC
versions.  I don't know if this happens under Solaris, but check
Metalink for RAC issues with GLIBC.  The fix is to install newer
versions of GLIBC and relink.

 

________________________________

From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of LS Cheng
Sent: Wednesday, March 11, 2009 11:36 AM
To: Oracle-L
Subject: RAC Full cluster outage (almos)

 

Hi 

A couple of days one of my customers faced a almost full cluster outage
in a 2 node 10.2.0.4 RAC on Sun Solaris 10 Sparc (full oracle stack).

The sequence was as follows

1. node 2 lost private network, interface went down
2. node 1 evicts noe 2 (as expected)
3. node 1 then evicts himself
4. after nodes 1 returned to the cluster and cluster reformed from 1
node to two nodes, node 2 lost private network again and this time
eviction occurs in node 2

So it was not really a full cluster outage but the eviction occured one
after another so it looked full outage to the users.

My doubt is, in a nodes cluster node 1 always survives which is not in
this case. My only theory is node 2 was so ill that it could not reboot
the server, node 1 then evicts himself to avoid corruptions.

Any more ideas?

Cheers

--
LSC

Other related posts: