Re: RAC Full cluster outage (almos)

  • From: Christo Kutrovsky <kutrovsky.oracle@xxxxxxxxx>
  • To: exriscer@xxxxxxxxx
  • Date: Wed, 11 Mar 2009 14:09:10 -0400

Hi,

We had similar problem, except node 2 evicted node 1 via the voting
disk, which rebooted itself.

In reality, a 2 node cluster is not reliable enought in network
issues, as it is unknown which server should remain up. It's a 50/50
chance.

One approach is to have a 3 node cluster, with only 2 nodes running
instances. The clusterware does not require any licenses, it is free.

The 3th node only serves as an arbiter who should remain up.

-- 
Christo Kutrovsky
Senior DBA
The Pythian Group - www.pythian.com
I blog at http://www.pythian.com/blogs/


On Wed, Mar 11, 2009 at 11:35 AM, LS Cheng <exriscer@xxxxxxxxx> wrote:
> Hi
>
> A couple of days one of my customers faced a almost full cluster outage in a
> 2 node 10.2.0.4 RAC on Sun Solaris 10 Sparc (full oracle stack).
>
> The sequence was as follows
>
> 1. node 2 lost private network, interface went down
> 2. node 1 evicts noe 2 (as expected)
> 3. node 1 then evicts himself
> 4. after nodes 1 returned to the cluster and cluster reformed from 1 node to
> two nodes, node 2 lost private network again and this time eviction occurs
> in node 2
>
> So it was not really a full cluster outage but the eviction occured one
> after another so it looked full outage to the users.
>
> My doubt is, in a nodes cluster node 1 always survives which is not in this
> case. My only theory is node 2 was so ill that it could not reboot the
> server, node 1 then evicts himself to avoid corruptions.
>
> Any more ideas?
>
> Cheers
>
> --
> LSC
>
>



-- 
Christo Kutrovsky
Senior DBA
The Pythian Group - www.pythian.com
I blog at http://www.pythian.com/blogs/
--
//www.freelists.org/webpage/oracle-l


Other related posts: