Re: RAC Full cluster outage (almos)

  • From: LS Cheng <exriscer@xxxxxxxxx>
  • To: "Crisler, Jon" <Jon.Crisler@xxxxxxx>
  • Date: Thu, 12 Mar 2009 00:57:01 +0100

Hi Jon

I think I know which bug you are referring to, it is the one which was
introduced in RHEL 4.5 (or 4.6) right? The only workaround was upgrade to
4.7. Unfortunately that bug only applies on Linux platformas

Thanks!

--
LSC


On Wed, Mar 11, 2009 at 6:48 PM, Crisler, Jon <Jon.Crisler@xxxxxxx> wrote:

>  I have seen this problem occur with Linux, due to a problem with GLIBC
> versions.  I don’t know if this happens under Solaris, but check Metalink
> for RAC issues with GLIBC.  The fix is to install newer versions of GLIBC
> and relink.
>
>
>  ------------------------------
>
> *From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
> oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *LS Cheng
> *Sent:* Wednesday, March 11, 2009 11:36 AM
> *To:* Oracle-L
> *Subject:* RAC Full cluster outage (almos)
>
>
>
> Hi
>
> A couple of days one of my customers faced a almost full cluster outage in
> a 2 node 10.2.0.4 RAC on Sun Solaris 10 Sparc (full oracle stack).
>
> The sequence was as follows
>
> 1. node 2 lost private network, interface went down
> 2. node 1 evicts noe 2 (as expected)
> 3. node 1 then evicts himself
> 4. after nodes 1 returned to the cluster and cluster reformed from 1 node
> to two nodes, node 2 lost private network again and this time eviction
> occurs in node 2
>
> So it was not really a full cluster outage but the eviction occured one
> after another so it looked full outage to the users.
>
> My doubt is, in a nodes cluster node 1 always survives which is not in this
> case. My only theory is node 2 was so ill that it could not reboot the
> server, node 1 then evicts himself to avoid corruptions.
>
> Any more ideas?
>
> Cheers
>
> --
> LSC
>

Other related posts: