Re: Really Strange Problem

  • From: John Smith <john40855@xxxxxxxxx>
  • To: Kevin Closson <ora_kclosson@xxxxxxxxx>
  • Date: Fri, 12 Nov 2010 10:36:53 -0600

If it was a node eviction, wouldn't one server go before the other?  In this
case, they appear to be going simultaneuosly.  If it is, is there anyplace
besides the clusterware logs that would show evidence?

On Fri, Nov 12, 2010 at 10:25 AM, Kevin Closson <ora_kclosson@xxxxxxxxx>wrote:

> >Absolutely no indication of a node eviction.
>
> I'm not sying this is your problme, bu... the messages you are looking for
> are sent via syslogd and are buffered writes. Don't expect a catatonic
> server to be able to flush buffered writes to a log. There is a reason
> Oracle implemented IPMI fencing in 11.2...I guess I wasn't such a renegade
> for blogging about fencing approaches all those years...
>
>
>
> ------------------------------
> *From:* Andrew Kerber <andrew.kerber@xxxxxxxxx>
> *To:* harish.kumar.kalra@xxxxxxxxx
> *Cc:* john40855@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
> *Sent:* Thu, November 11, 2010 8:50:30 PM
>
> *Subject:* Re: Really Strange Problem
>
> Absolutely no indication of a node eviction.  Nothing in any of the
> clusterware logs indicates a node eviction on either node. (crsd.log,
> ocssd.log, etc)  They are all normal until they suddenly start back up after
> an unexpected shutdown.
>
> On Thu, Nov 11, 2010 at 9:36 PM, Harish Kumar <
> harish.kumar.kalra@xxxxxxxxx> wrote:
>
>> John,
>>
>> Have you checked ocssd.log and system logfiles. Download and installe CHM
>> also know as Cluster Health Monitor and let it running until node evicts
>> again.
>>
>> Once nodes are evicted check and analyze logfiles collected by CHM. Oracle
>> may evict node for different reasons such as CPU saturation, longer IO
>> latencies, missconfigured network etc.
>>
>> I think once you have logfiles in place then it will be more clearer what
>> the actual problem is.
>>
>> Reagrds
>> Harish Kumar
>> Independant Database Consultant
>>
>> www.oraxperts.com
>>
>>
>>
>> On Fri, Nov 12, 2010 at 1:20 PM, John Smith <john40855@xxxxxxxxx> wrote:
>>
>>> Oh yes, if I didnt make it clear:
>>>
>>> OS - OEL 5.5 x86_64
>>> Clusterware:  11.1.0.7 x86_64
>>> ASM - 11.1.0.7 x86_64 (running over RAW)
>>> Database: 10.1.0.5 x86_64 (running)
>>> Database: 10.2.0.4 x86_64 (installed, but not running at this point)
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: John Smith <john40855@xxxxxxxxx>
>>> Date: Thu, Nov 11, 2010 at 8:14 PM
>>> Subject: Really Strange Problem
>>> To: oracle-l@xxxxxxxxxxxxx
>>>
>>>
>>> OK, I don't know if this one is related to oracle database, OEL, or
>>> something else entirely.  But here it is:
>>>
>>> We have oracle clusterware 11.1 installed and running with asm 11.1.  We
>>> also have oracle 10.2 installed, as well as 10.1.  I have created a 10.1
>>> database.  ASM is on RAW against EMC storage.  This has to be on raw because
>>> the intent is to take 10.1, 32 bit database to 10.2 64 bit.  This requires a
>>> stop at 10.1 64 bit along the way, and 10.1 reqires ASM on raw.
>>>
>>> Anyway, the problem is that the servers are rebooting every 2-3 days at
>>> 2:15 am, and we have not been able to figure out why.  There is nothing in
>>> the ASM or clusterware or database logs, they show everything running fine
>>> then a restart.  Nothing in /var/log/messages.  Just shows a restart.  Any
>>> ideas?
>>>
>>>
>
>
> --
> Andrew W. Kerber
>
> 'If at first you dont succeed, dont take up skydiving.'
>
>

Other related posts: