RAC Question about misscount messages in occsd.log

From: Harish Kalra <hkalra27@xxxxxxxxxxx>
To: racdba@xxxxxxxxxxxxx
Date: Tue, 14 Mar 2006 19:36:39 +0000 (GMT)

Hi All,
   
  One of my client is running two node cluster on linux AMD 64. Architecture is 
as follows:
   
  Two  node ASM instances +ASM1 and +ASM2
  Two node Database Instances GCTLOND1 and GCTLOND2. 
   
  Database version is 10.2.0.1.
  Node1 : lon3954xus
  Node2: lon3955xus
   
  From last few days node 1 is being rebooted itself couple of times. Today 
morning node1 was in hung condition.
  From my initial analysis,I found lot of misscounts for node 1 in occsd.log of 
lon3955xus. It seems when misscoutns exceeds 60 seconds then lon3954xus hung.
   
  I have attached occsd.log file.
   
  I would appreciate if anyone can suggest me, how to further diagnose 
following message from occsd.log.
   
  [ CSSD]2006-03-14 01:05:37.364 [196620] >TRACE: clssnmPollingThread: node 
lon3954xus (1) missed(53) checkin(s)
   
  Is there any way by which we can come to know that 
   
  1) node monitor polling is stucking at interconnect 
  or
  2)  ocssd.bin on the remote node failed to get CPU cycles for 60 "checkins"  
   
  Any help on above would be highly appreciated.
   
  Thanks & Regards
  -Harish Kalra
   

                                
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.

RAC Question about misscount messages in occsd.log

Other related posts: