Wow..re-reading my email...massive "typos." Actually, the contacts on this old keyboard are nearly gone and I'm finding myself mashing keys...time to stop procrastinating and get another one. Anyway, I don't think suicide is your problem. I was just addressing the bit about evidence. I'd check the common components (switches, storage) to see if there is anything there. ________________________________ From: John Smith <john40855@xxxxxxxxx> To: Kevin Closson <ora_kclosson@xxxxxxxxx> Cc: andrew.kerber@xxxxxxxxx; harish.kumar.kalra@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx Sent: Fri, November 12, 2010 8:36:53 AM Subject: Re: Really Strange Problem If it was a node eviction, wouldn't one server go before the other? In this case, they appear to be going simultaneuosly. If it is, is there anyplace besides the clusterware logs that would show evidence? On Fri, Nov 12, 2010 at 10:25 AM, Kevin Closson <ora_kclosson@xxxxxxxxx> wrote: >Absolutely no indication of a node eviction. > >I'm not sying this is your problme, bu... the messages you are looking for are >sent via syslogd and are buffered writes. Don't expect a catatonic server to >be >able to flush buffered writes to a log. There is a reason Oracle implemented >IPMI fencing in 11.2...I guess I wasn't such a renegade for blogging about >fencing approaches all those years... > > > > > > > ________________________________ From: Andrew Kerber <andrew.kerber@xxxxxxxxx> >To: harish.kumar.kalra@xxxxxxxxx >Cc: john40855@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx >Sent: Thu, November 11, 2010 8:50:30 PM > >Subject: Re: Really Strange Problem > > >Absolutely no indication of a node eviction. Nothing in any of the >clusterware >logs indicates a node eviction on either node. (crsd.log, ocssd.log, etc) >They >are all normal until they suddenly start back up after an unexpected shutdown. > > >On Thu, Nov 11, 2010 at 9:36 PM, Harish Kumar <harish.kumar.kalra@xxxxxxxxx> >wrote: > >John, >> >>Have you checked ocssd.log and system logfiles. Download and installe CHM >>also >>know as Cluster Health Monitor and let it running until node evicts again. >> >> >>Once nodes are evicted check and analyze logfiles collected by CHM. Oracle >>may >>evict node for different reasons such as CPU saturation, longer IO latencies, >>missconfigured network etc. >> >> >>I think once you have logfiles in place then it will be more clearer what the >>actual problem is. >> >> >>Reagrds >>Harish Kumar >>Independant Database Consultant >> >>www.oraxperts.com >> >> >> >>On Fri, Nov 12, 2010 at 1:20 PM, John Smith <john40855@xxxxxxxxx> wrote: >> >>Oh yes, if I didnt make it clear: >>> >>>OS - OEL 5.5 x86_64 >>>Clusterware: 11.1.0.7 x86_64 >>>ASM - 11.1.0.7 x86_64 (running over RAW) >>>Database: 10.1.0.5 x86_64 (running) >>>Database: 10.2.0.4 x86_64 (installed, but not running at this point) >>> >>> >>> >>>---------- Forwarded message ---------- >>>From: John Smith <john40855@xxxxxxxxx> >>>Date: Thu, Nov 11, 2010 at 8:14 PM >>>Subject: Really Strange Problem >>>To: oracle-l@xxxxxxxxxxxxx >>> >>> >>>OK, I don't know if this one is related to oracle database, OEL, or >>>something >>>else entirely. But here it is: >>> >>>We have oracle clusterware 11.1 installed and running with asm 11.1. We >>>also >>>have oracle 10.2 installed, as well as 10.1. I have created a 10.1 >>>database. >>>ASM is on RAW against EMC storage. This has to be on raw because the intent >>>is >>>to take 10.1, 32 bit database to 10.2 64 bit. This requires a stop at 10.1 >>>64 >>>bit along the way, and 10.1 reqires ASM on raw. >>> >>>Anyway, the problem is that the servers are rebooting every 2-3 days at 2:15 >>>am, >>>and we have not been able to figure out why. There is nothing in the ASM or >>>clusterware or database logs, they show everything running fine then a >>>restart. >>>Nothing in /var/log/messages. Just shows a restart. Any ideas? >>> >>> > > >-- >Andrew W. Kerber > >'If at first you dont succeed, dont take up skydiving.' > >