Re: Really Strange Problem

From: Kevin Closson <ora_kclosson@xxxxxxxxx>
To: andrew.kerber@xxxxxxxxx, harish.kumar.kalra@xxxxxxxxx
Date: Fri, 12 Nov 2010 08:25:05 -0800 (PST)

>Absolutely no indication of a node eviction.

I'm not sying this is your problme, bu... the messages you are looking for are 
sent via syslogd and are buffered writes. Don't expect a catatonic server to be 
able to flush buffered writes to a log. There is a reason Oracle implemented 
IPMI fencing in 11.2...I guess I wasn't such a renegade for blogging about 
fencing approaches all those years...






________________________________
From: Andrew Kerber <andrew.kerber@xxxxxxxxx>
To: harish.kumar.kalra@xxxxxxxxx
Cc: john40855@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Sent: Thu, November 11, 2010 8:50:30 PM
Subject: Re: Really Strange Problem

Absolutely no indication of a node eviction.  Nothing in any of the clusterware 
logs indicates a node eviction on either node. (crsd.log, ocssd.log, etc)  They 
are all normal until they suddenly start back up after an unexpected shutdown.


On Thu, Nov 11, 2010 at 9:36 PM, Harish Kumar <harish.kumar.kalra@xxxxxxxxx> 
wrote:

John,
> 
>Have you checked ocssd.log and system logfiles. Download and installe CHM also 
>know as Cluster Health Monitor and let it running until node evicts again. 
>
> 
>Once nodes are evicted check and analyze logfiles collected by CHM. Oracle may 
>evict node for different reasons such as CPU saturation, longer IO latencies, 
>missconfigured network etc. 
>
> 
>I think once you have logfiles in place then it will be more clearer what the 
>actual problem is. 
>
> 
>Reagrds
>Harish Kumar
>Independant Database Consultant
> 
>www.oraxperts.com
>
>
> 
>On Fri, Nov 12, 2010 at 1:20 PM, John Smith <john40855@xxxxxxxxx> wrote:
>
>Oh yes, if I didnt make it clear:
>>
>>OS - OEL 5.5 x86_64
>>Clusterware:  11.1.0.7 x86_64
>>ASM - 11.1.0.7 x86_64 (running over RAW)
>>Database: 10.1.0.5 x86_64 (running)
>>Database: 10.2.0.4 x86_64 (installed, but not running at this point) 
>>
>>
>>
>>---------- Forwarded message ----------
>>From: John Smith <john40855@xxxxxxxxx>
>>Date: Thu, Nov 11, 2010 at 8:14 PM
>>Subject: Really Strange Problem
>>To: oracle-l@xxxxxxxxxxxxx
>>
>>
>>OK, I don't know if this one is related to oracle database, OEL, or something 
>>else entirely.  But here it is:
>>
>>We have oracle clusterware 11.1 installed and running with asm 11.1.  We also 
>>have oracle 10.2 installed, as well as 10.1.  I have created a 10.1 database. 
>> 
>>ASM is on RAW against EMC storage.  This has to be on raw because the intent 
>>is 
>>to take 10.1, 32 bit database to 10.2 64 bit.  This requires a stop at 10.1 
>>64 
>>bit along the way, and 10.1 reqires ASM on raw.
>>
>>Anyway, the problem is that the servers are rebooting every 2-3 days at 2:15 
>>am, 
>>and we have not been able to figure out why.  There is nothing in the ASM or 
>>clusterware or database logs, they show everything running fine then a 
>>restart.  
>>Nothing in /var/log/messages.  Just shows a restart.  Any ideas?
>>
>>


-- 
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'

Follow-Ups:
- Re: Really Strange Problem
  - From: John Smith

References:
- Really Strange Problem
  - From: John Smith
- Fwd: Really Strange Problem
  - From: John Smith
- Re: Really Strange Problem
  - From: Harish Kumar
- Re: Really Strange Problem
  - From: Andrew Kerber

Re: Really Strange Problem

Other related posts: