RE: Oracle clusterware related question

From: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
To: <tim@xxxxxxxxx>, <Mathias.Zarick@xxxxxxxxxxxx>
Date: Tue, 8 May 2012 12:11:25 -0400
So, if voting disks are not updated by a certain node for any reason for
an extended period of time, that node would not be evicted by the remote
nodes from the cluster?
 

From: Tim Gorman [mailto:tim@xxxxxxxxx] 
Sent: Tuesday, May 08, 2012 12:05 PM
To: Mathias.Zarick@xxxxxxxxxxxx; Hameed, Amir
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Oracle clusterware related question

 

Mathias hit the nail on the head.  Think about it this way:  NFS errors
and disconnects typically do not kill running programs, but cause them
to hang.  If the binaries for the clusterware are themselves on NFS,
then clearly they are going to hang also.

 

 

        -----Original Message-----
        From: Mathias Zarick [mailto:Mathias.Zarick@xxxxxxxxxxxx]
        Sent: Tuesday, May 8, 2012 10:00 AM
        To: Amir.Hameed@xxxxxxxxx
        Cc: oracle-l@xxxxxxxxxxxxx
        Subject: RE: Oracle clusterware related question
        
        Hi Amir, have seen similar behavior if logfiles of crs are also
residing on a non available location. you should install at least the
CRS home on local disks. if not possible point at least the logfiles
(symlink CRS_HOME/log to local disks). HTH Mathias -----Original
Message----- From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Hameed, Amir Sent:
Tuesday, May 08, 2012 5:50 PM To: tim@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: RE: Oracle clusterware related question Thanks Tim, The cables
remained unplugged for 30 minutes. I am using the default values for the
"disktimeout" and "miscount" parameters and they are pasted below:
crsctl get css disktimeout CRS-4678: Successful get disktimeout 200 for
Cluster Synchronization Services. crsctl get css misscount CRS-4678:
Successful get misscount 30 for Cluster Synchronization Services. In my
mind, the cluster should have evicted the node after 200 seconds (DTO).
Amir -----Original Message----- From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Tim Gorman Sent:
Tuesday, May 08, 2012 11:32 AM To: oracle-l@xxxxxxxxxxxxx Subject: Re:
Oracle clusterware related question Amir, Your phrase "/kept showing
that the node was still part of the cluster/" doesn't mention how long
that state lasted. Clearly, from your email, it lasted too long, but
equally obviously, at some point the clusterware reacted, and I'm
wondering how long that wait might have been? Armed with that
information about how long it took for the clusterware to react in mind,
I'd suggest using the "crsctl query css" command as suggested here in
the 11.2 docs online... /crsctl get css/ /Use the |crsctl get css|
command to obtain the value of a specific Cluster Synchronization
Services parameter./ // /Syntax/ /crsctl get cssparameter / /Usage
Notes/ * /Cluster Synchronization Services parameters include:/
/clusterguid diagwait disktimeout misscount reboottime priority
logfilesize / * /This command only affects the local server/ /Example/
/To display the value of the |disktimeout| parameter for CSS, use the
following command:/ /$ crsctl get css disktimeout 200 / So, you may want
to share what the values for "disktimeout" and "misscount" were, and
whether those values corroborated at all with your observations? Hope
this helps? -- Tim Gorman consultant -> Evergreen Database Technologies,
Inc. postal => PO Box 352151, Westminster CO 80035 website =>
http://www.EvDBT.com/ email => Tim@xxxxxxxxx mobile => +1-303-885-4526
fax => +1-303-484-3608 Lost Data? => http://www.ora600.be/ for info
about DUDE... On 5/8/2012 8:41 AM, Hameed, Amir wrote: > Folks, > I have
a three-node Oracle RAC running with Grid version 11.2.0.3. So, > far
there is no database created and only CRS is running on all nodes. I >
am using NFS for everything (binaries, OCR& voting disk and database >
files). Each server has two 10GbE NICs for dNFS. The binaries, OCR and >
voting disks are on an aggregated link (two 1GbE NIC). The OS is Solaris
> 10. > > > > While doing destructive testing to validate configuration
and to observe > behavior in extreme scenarios, when we pulled cables on
one RAC server > from both NICs that are part of the aggregated link for
the binaries, > voting disk and OCR, I was expecting that because CRS
would not be able > to access the voting disks on that node to update
its status, > clusterware would eject that node from the cluster. The
"crsctl status > resource -t" command from the other nodes kept showing
that the node was > still part of the cluster. I am trying to understand
this behavior and > would appreciate if someone can explain it. > > >
Thanks > > Amir > > > -- > //www.freelists.org/webpage/oracle-l --
//www.freelists.org/webpage/oracle-l --
//www.freelists.org/webpage/oracle-l --
//www.freelists.org/webpage/oracle-l 


--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- Re: Oracle clusterware related question
  - From: Martin Berger
References:
- Re: Oracle clusterware related question
  - From: Tim Gorman
RE: Oracle clusterware related question

Other related posts: