Re: Oracle CRS and Split Brin
- From: Naqi Mirza <naqimirza@xxxxxxxxx>
- To: amonte <ax.mount@xxxxxxxxx>, Kevin Closson <kevinc@xxxxxxxxxxxxx>
- Date: Wed, 28 Mar 2007 16:55:15 -0700 (PDT)
node1 loses its interconnect - this is where I understand the misscount
parameter comes into play. The oracle cssd process checks for the network and
disk heartbeat. Misscount represents the maximum time that a heartbeat can
be missed before entering into a cluster reconfiguration to evict a node. So if
node1 were to lose its interconnect (regardless of being the master it should
be evicted shouldn't it?). This would leave one of the other(s) to become the
new master - which one becomes the master - i guess Kevin's already answered
that one.
When the evicted node is back in business - i assume you mean its interconnect
is now fixed, a cluster reconfiguration should take place adding that node back
into the cluster.
I'm sure once Kevin has a glance over this he'll correct me where I'm wrong.
Naqi
----- Original Message ----
From: amonte <ax.mount@xxxxxxxxx>
To: Kevin Closson <kevinc@xxxxxxxxxxxxx>
Cc: naqimirza@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Sent: Thursday, 29 March, 2007 4:32:02 AM
Subject: Re: Oracle CRS and Split Brin
That is what I mean Kevin, how does a node know which one will be evicted.
For instance if node 1 (the lower node) loses its interconnect what happens?
The other(s) will be evicted? What happens if the evicted node(s) is back to
business. Because it cannot contact node 1 through network what will happen?
(node 1 lost private network)
How does Voting Disk help to determine Split Brain?
Thanks
Alex
On 3/28/07, Kevin Closson <kevinc@xxxxxxxxxxxxx> wrote:
When the master node looses its private network, the surviving node becomes the
master, reconfiguration of the cluster takes place - the old master is ejected
from the cluster configuration - and rebooted. The following can be seen in its
crsd.log file (this is for the crs component of the oracle clusterware,
responsible for managing oracle resources):
.
I AM THE NEW OCR MASTER at incar 6. Node Number = 1
..[…lots good CRS stuff deleted…]
…This was a very good follow up, but the question was about split brain. Split
brain is when there is an equal number of "survivors" and both "think" they are
the sole survivor. I think the original post was asking how Oracle determines
who gets to anoint themselves the new master in a split brain scenario. I have
not seen the full algorithm Oracle uses documented anywhere on the net so if
someone has, please let us know. There are a lot of cluster implementations out
there. One common approach is to maintain knowledge of the IP addresses of
members and use the lowest IP node as one of the factors in choosing the winner
in a SB scenario. That is not how CRS does it though as has become evident in
a thread I've had with a reader of my blog. In his 2 node case his CRS master
was also the lowest IP and in a meltdown scenario, the other node was chosen as
the sole survivor. That really surprised me.
I think all I've said is Oracle is not telling us what the full algorithm is
for survivorship in a true split-brain scenario.
There are some clusterware topics here:
http://kevinclosson.wordpress.com/kevin-closson-index/real-application-clusters-related-topics/
Such as
http://kevinclosson.wordpress.com/2007/01/10/comparing-10201-and-10203-linux-rac-fencing-also-fencing-failures-split-brain/
___________________________________________________________
Web email has come of age. Don't settle for less than the All New Yahoo! Mail
http://uk.docs.yahoo.com/nowyoucan.html
- Follow-Ups:
- Re: Oracle CRS and Split Brin
- From: amonte
Other related posts:
- » Oracle CRS and Split Brin
- » Re: Oracle CRS and Split Brin
- » RE: Oracle CRS and Split Brin
- » Re: Oracle CRS and Split Brin
- » Re: Oracle CRS and Split Brin
- » Re: Oracle CRS and Split Brin
- » Re: Oracle CRS and Split Brin
- » RE: Oracle CRS and Split Brin
- » Re: Oracle CRS and Split Brin
- » RE: Oracle CRS and Split Brin
- » RE: Oracle CRS and Split Brin
- Re: Oracle CRS and Split Brin
- From: amonte