Re: Oracle clusterware related question

  • From: "Tim Gorman" <tim@xxxxxxxxx>
  • To: Mathias.Zarick@xxxxxxxxxxxx, Amir.Hameed@xxxxxxxxx
  • Date: Tue, 08 May 2012 16:05:24 +0000

Mathias hit the nail on the head. Think about it this way: NFS errors and 
disconnects typically do not kill running programs, but cause them to hang. If 
the binaries for the clusterware are themselves on NFS, then clearly they are 
going to hang also.


-----Original Message-----
From: Mathias Zarick [mailto:Mathias.Zarick@xxxxxxxxxxxx]
Sent: Tuesday, May 8, 2012 10:00 AM
To: Amir.Hameed@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: RE: Oracle clusterware related question

Hi Amir,have seen similar behavior if logfiles of crs are also residing on a 
non available location.you should install at least the CRS home on local disks. 
if not possible point at leastthe logfiles (symlink CRS_HOME/log to local 
disks).HTH Mathias-----Original Message-----From: oracle-l-bounce@xxxxxxxxxxxxx 
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Hameed, AmirSent: Tuesday, 
May 08, 2012 5:50 PMTo: tim@xxxxxxxxx; oracle-l@xxxxxxxxxxxxxxxxxxxx: RE: 
Oracle clusterware related questionThanks Tim,The cables remained unplugged for 
30 minutes. I am using the default values for the "disktimeout" and "miscount" 
parameters and they are pasted below:crsctl get css disktimeoutCRS-4678: 
Successful get disktimeout 200 for Cluster Synchronization Services.crsctl get 
css misscountCRS-4678: Successful get misscount 30 for Cluster Synchronization 
Services.In my mind, the cluster should have evicted the node after 200 seconds 
(DTO). Amir-----Original Message-----From: 
oracle-l-bounce@xxxxxxxxxxxxx[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf 
Of Tim GormanSent: Tuesday, May 08, 2012 11:32 AMTo: 
oracle-l@xxxxxxxxxxxxxxxxxxxx: Re: Oracle clusterware related questionAmir,Your 
phrase "/kept showing that the node was still part of the cluster/"doesn't 
mention how long that state lasted. Clearly, from your email, it lasted too 
long, but equally obviously, at some point the clusterwarereacted, and I'm 
wondering how long that wait might have been?Armed with that information about 
how long it took for the clusterware to react in mind, I'd suggest using the 
"crsctl query css" command as suggested here in the 11.2 docs online... /crsctl 
get css/ /Use the |crsctl get css| command to obtain the value of a specific 
Cluster Synchronization Services parameter./ // /Syntax/ /crsctl get 
cssparameter / /Usage Notes/ * /Cluster Synchronization Services parameters 
include:/ /clusterguid diagwait disktimeout misscount reboottime priority 
logfilesize / * /This command only affects the local server/ /Example/ /To 
display the value of the |disktimeout| parameter for CSS, use the following 
command:/ /$ crsctl get css disktimeout 200 /So, you may want to share what the 
values for "disktimeout" and "misscount" were, and whether those values 
corroborated at all with yourobservations?Hope this helps?--Tim 
Gormanconsultant -> Evergreen Database Technologies, Inc.postal => PO Box 
352151, Westminster CO 80035website => http://www.EvDBT.com/email => 
Tim@xxxxxxxxxxxxxxx => +1-303-885-4526fax => +1-303-484-3608Lost Data? => 
http://www.ora600.be/ for info about DUDE...On 5/8/2012 8:41 AM, Hameed, Amir 
wrote:> Folks,> I have a three-node Oracle RAC running with Grid version 
11.2.0.3. So,> far there is no database created and only CRS is running on all 
nodes.I> am using NFS for everything (binaries, OCR& voting disk and database> 
files). Each server has two 10GbE NICs for dNFS. The binaries, OCR and> voting 
disks are on an aggregated link (two 1GbE NIC). The OS isSolaris> 10.>>>> While 
doing destructive testing to validate configuration and toobserve> behavior in 
extreme scenarios, when we pulled cables on one RAC server> from both NICs that 
are part of the aggregated link for the binaries,> voting disk and OCR, I was 
expecting that because CRS would not beable> to access the voting disks on that 
node to update its status,> clusterware would eject that node from the cluster. 
The "crsctl status> resource -t" command from the other nodes kept showing that 
the nodewas> still part of the cluster. I am trying to understand this behavior 
and> would appreciate if someone can explain it.>>> Thanks>> Amir>>> --> 
//www.freelists.org/webpage/oracle-l--http://www.freelists.org/webpage/oracle-l--//www.freelists.org/webpage/oracle-l--http://www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l


Other related posts: