Re: Really Strange Problem

From: Kevin Closson <ora_kclosson@xxxxxxxxx>
To: John Smith <john40855@xxxxxxxxx>
Date: Fri, 12 Nov 2010 09:26:04 -0800 (PST)

Wow..re-reading my email...massive "typos." Actually, the contacts on this old 
keyboard are nearly gone and I'm finding myself mashing keys...time to stop 
procrastinating and get another one.

Anyway, I don't think suicide is your problem. I was just addressing the bit 
about evidence. I'd check the common components (switches, storage) to see if 
there is anything there.




________________________________
From: John Smith <john40855@xxxxxxxxx>
To: Kevin Closson <ora_kclosson@xxxxxxxxx>
Cc: andrew.kerber@xxxxxxxxx; harish.kumar.kalra@xxxxxxxxx; 
oracle-l@xxxxxxxxxxxxx
Sent: Fri, November 12, 2010 8:36:53 AM
Subject: Re: Really Strange Problem

If it was a node eviction, wouldn't one server go before the other?  In this 
case, they appear to be going simultaneuosly.  If it is, is there anyplace 
besides the clusterware logs that would show evidence?


On Fri, Nov 12, 2010 at 10:25 AM, Kevin Closson <ora_kclosson@xxxxxxxxx> wrote:

>Absolutely no indication of a node eviction.
>
>I'm not sying this is your problme, bu... the messages you are looking for are 
>sent via syslogd and are buffered writes. Don't expect a catatonic server to 
>be 
>able to flush buffered writes to a log. There is a reason Oracle implemented 
>IPMI fencing in 11.2...I guess I wasn't such a renegade for blogging about 
>fencing approaches all those years... 
>
>
>
>
>
>
>
________________________________
 From: Andrew Kerber <andrew.kerber@xxxxxxxxx>
>To: harish.kumar.kalra@xxxxxxxxx
>Cc: john40855@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
>Sent: Thu, November 11, 2010 8:50:30 PM
>
>Subject: Re: Really Strange Problem
>
>
>Absolutely no indication of a node eviction.  Nothing in any of the 
>clusterware 
>logs indicates a node eviction on either node. (crsd.log, ocssd.log, etc)  
>They 
>are all normal until they suddenly start back up after an unexpected shutdown.
>
>
>On Thu, Nov 11, 2010 at 9:36 PM, Harish Kumar <harish.kumar.kalra@xxxxxxxxx> 
>wrote:
>
>John,
>> 
>>Have you checked ocssd.log and system logfiles. Download and installe CHM 
>>also 
>>know as Cluster Health Monitor and let it running until node evicts again. 
>>
>> 
>>Once nodes are evicted check and analyze logfiles collected by CHM. Oracle 
>>may 
>>evict node for different reasons such as CPU saturation, longer IO latencies, 
>>missconfigured network etc. 
>>
>> 
>>I think once you have logfiles in place then it will be more clearer what the 
>>actual problem is. 
>>
>> 
>>Reagrds
>>Harish Kumar
>>Independant Database Consultant
>> 
>>www.oraxperts.com
>>
>>
>> 
>>On Fri, Nov 12, 2010 at 1:20 PM, John Smith <john40855@xxxxxxxxx> wrote:
>>
>>Oh yes, if I didnt make it clear:
>>>
>>>OS - OEL 5.5 x86_64
>>>Clusterware:  11.1.0.7 x86_64
>>>ASM - 11.1.0.7 x86_64 (running over RAW)
>>>Database: 10.1.0.5 x86_64 (running)
>>>Database: 10.2.0.4 x86_64 (installed, but not running at this point) 
>>>
>>>
>>>
>>>---------- Forwarded message ----------
>>>From: John Smith <john40855@xxxxxxxxx>
>>>Date: Thu, Nov 11, 2010 at 8:14 PM
>>>Subject: Really Strange Problem
>>>To: oracle-l@xxxxxxxxxxxxx
>>>
>>>
>>>OK, I don't know if this one is related to oracle database, OEL, or 
>>>something 
>>>else entirely.  But here it is:
>>>
>>>We have oracle clusterware 11.1 installed and running with asm 11.1.  We 
>>>also 
>>>have oracle 10.2 installed, as well as 10.1.  I have created a 10.1 
>>>database.  
>>>ASM is on RAW against EMC storage.  This has to be on raw because the intent 
>>>is 
>>>to take 10.1, 32 bit database to 10.2 64 bit.  This requires a stop at 10.1 
>>>64 
>>>bit along the way, and 10.1 reqires ASM on raw.
>>>
>>>Anyway, the problem is that the servers are rebooting every 2-3 days at 2:15 
>>>am, 
>>>and we have not been able to figure out why.  There is nothing in the ASM or 
>>>clusterware or database logs, they show everything running fine then a 
>>>restart.  
>>>Nothing in /var/log/messages.  Just shows a restart.  Any ideas?
>>>
>>>
>
>
>-- 
>Andrew W. Kerber
>
>'If at first you dont succeed, dont take up skydiving.'
>
>

Follow-Ups:
- RE: Really Strange Problem
  - From: Amaral, Rui

References:
- Really Strange Problem
  - From: John Smith
- Fwd: Really Strange Problem
  - From: John Smith
- Re: Really Strange Problem
  - From: Harish Kumar
- Re: Really Strange Problem
  - From: Andrew Kerber
- Re: Really Strange Problem
  - From: Kevin Closson
- Re: Really Strange Problem
  - From: John Smith

Re: Really Strange Problem

Other related posts: