Re: Oracle clusterware related question

  • From: "Tim Gorman" <tim@xxxxxxxxx>
  • To: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>, Mathias.Zarick@xxxxxxxxxxxx
  • Date: Tue, 08 May 2012 16:43:40 +0000

According to the directives of the "disktimeout" and "misscount" parameters, 
yes.

-----Original Message-----
From: Hameed, Amir [mailto:Amir.Hameed@xxxxxxxxx]
Sent: Tuesday, May 8, 2012 10:11 AM
To: tim@xxxxxxxxx, Mathias.Zarick@xxxxxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: RE: Oracle clusterware related question

So, if voting disks are not updated by a certain node for any reason for an 
extended period of time, that node would not be evicted by the remote nodes 
from the cluster?

From: Tim Gorman [mailto:tim@xxxxxxxxx] 
Sent: Tuesday, May 08, 2012 12:05 PM
To: Mathias.Zarick@xxxxxxxxxxxx; Hameed, Amir
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Oracle clusterware related question


Mathias hit the nail on the head. Think about it this way: NFS errors and 
disconnects typically do not kill running programs, but cause them to hang. If 
the binaries for the clusterware are themselves on NFS, then clearly they are 
going to hang also.




-----Original Message-----
From: Mathias Zarick [mailto:Mathias.Zarick@xxxxxxxxxxxx]
Sent: Tuesday, May 8, 2012 10:00 AM
To: Amir.Hameed@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: RE: Oracle clusterware related question

Hi Amir, have seen similar behavior if logfiles of crs are also residing on a 
non available location. you should install at least the CRS home on local 
disks. if not possible point at least the logfiles (symlink CRS_HOME/log to 
local disks). HTH Mathias -----Original Message----- From: 
oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf 
Of Hameed, Amir Sent: Tuesday, May 08, 2012 5:50 PM To: tim@xxxxxxxxx; 
oracle-l@xxxxxxxxxxxxx Subject: RE: Oracle clusterware related question Thanks 
Tim, The cables remained unplugged for 30 minutes. I am using the default 
values for the "disktimeout" and "miscount" parameters and they are pasted 
below: crsctl get css disktimeout CRS-4678: Successful get disktimeout 200 for 
Cluster Synchronization Services. crsctl get css misscount CRS-4678: Successful 
get misscount 30 for Cluster Synchronization Services. In my mind, the cluster 
should have evicted the node after 200 seconds (DTO). Amir -----Original 
Message----- From: oracle-l-bounce@xxxxxxxxxxxxx 
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Tim Gorman Sent: Tuesday, 
May 08, 2012 11:32 AM To: oracle-l@xxxxxxxxxxxxx Subject: Re: Oracle 
clusterware related question Amir, Your phrase "/kept showing that the node was 
still part of the cluster/" doesn't mention how long that state lasted. 
Clearly, from your email, it lasted too long, but equally obviously, at some 
point the clusterware reacted, and I'm wondering how long that wait might have 
been? Armed with that information about how long it took for the clusterware to 
react in mind, I'd suggest using the "crsctl query css" command as suggested 
here in the 11.2 docs online... /crsctl get css/ /Use the |crsctl get css| 
command to obtain the value of a specific Cluster Synchronization Services 
parameter./ // /Syntax/ /crsctl get cssparameter / /Usage Notes/ * /Cluster 
Synchronization Services parameters include:/ /clusterguid diagwait disktimeout 
misscount reboottime priority logfilesize / * /This command only affects the 
local server/ /Example/ /To display the value of the |disktimeout| parameter 
for CSS, use the following command:/ /$ crsctl get css disktimeout 200 / So, 
you may want to share what the values for "disktimeout" and "misscount" were, 
and whether those values corroborated at all with your observations? Hope this 
helps? -- Tim Gorman consultant -> Evergreen Database Technologies, Inc. postal 
=> PO Box 352151, Westminster CO 80035 website => http://www.EvDBT.com/ email 
=> Tim@xxxxxxxxx mobile => +1-303-885-4526 fax => +1-303-484-3608 Lost Data? => 
http://www.ora600.be/ for info about DUDE... On 5/8/2012 8:41 AM, Hameed, Amir 
wrote: > Folks, > I have a three-node Oracle RAC running with Grid version 
11.2.0.3. So, > far there is no database created and only CRS is running on all 
nodes. I > am using NFS for everything (binaries, OCR& voting disk and database 
> files). Each server has two 10GbE NICs for dNFS. The binaries, OCR and > 
voting disks are on an aggregated link (two 1GbE NIC). The OS is Solaris > 10. 
> > > > While doing destructive testing to validate configuration and to 
observe > behavior in extreme scenarios, when we pulled cables on one RAC 
server > from both NICs that are part of the aggregated link for the binaries, 
> voting disk and OCR, I was expecting that because CRS would not be able > to 
access the voting disks on that node to update its status, > clusterware would 
eject that node from the cluster. The "crsctl status > resource -t" command 
from the other nodes kept showing that the node was > still part of the 
cluster. I am trying to understand this behavior and > would appreciate if 
someone can explain it. > > > Thanks > > Amir > > > -- > 
//www.freelists.org/webpage/oracle-l -- 
//www.freelists.org/webpage/oracle-l -- 
//www.freelists.org/webpage/oracle-l -- 
//www.freelists.org/webpage/oracle-l 



--
//www.freelists.org/webpage/oracle-l


Other related posts: