both the servers going simultaneously indicates OS. Even with the san going away or all connectivity being lost something would still get written indicating the problem and the fact that it's clean then all of a sudden you see start up messages in the oracle logs indicates to me that an immediate reboot (OS crash if you will) happened with Oracle having no chance to write. Like Kevin indicated in scenarios like that there would be messages captured by syslogd but typically would be lost in those types of cases. However, there are ways to try and capture them going forward: 1) enable netdump on the servers. Netdump runs in it's own protected memory and would be able to dump those messages prior to the machine rebooting. I have had SA's do this with some success 2) disable the reboot so that SA's can eith iLo into the box, or manually connecting a terminal to the box, to do screen capture of the messages then manually restarting the box which we have also used with some success (especially in the very early days of ocfs) 3) or enable remote syslog capture (though I am not too convinced of this one) : http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch05_:_Troubleshooting_Linux_with_syslog Like Niall I suggest you looking at the OS cron - the timing is just too conspicuous. Make sure updatedb is not scheduled to run. Rui Amaral Database Administrator ITS - SSG TD Bank Financial Group 220 Bay St., 11th Floor Toronto, ON, CA, M5K1A2 (bb) (647) 204-9106 ________________________________ From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Kevin Closson Sent: Friday, November 12, 2010 12:26 PM To: John Smith Cc: andrew.kerber@xxxxxxxxx; harish.kumar.kalra@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx Subject: Re: Really Strange Problem Wow..re-reading my email...massive "typos." Actually, the contacts on this old keyboard are nearly gone and I'm finding myself mashing keys...time to stop procrastinating and get another one. Anyway, I don't think suicide is your problem. I was just addressing the bit about evidence. I'd check the common components (switches, storage) to see if there is anything there. ________________________________ From: John Smith <john40855@xxxxxxxxx> To: Kevin Closson <ora_kclosson@xxxxxxxxx> Cc: andrew.kerber@xxxxxxxxx; harish.kumar.kalra@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx Sent: Fri, November 12, 2010 8:36:53 AM Subject: Re: Really Strange Problem If it was a node eviction, wouldn't one server go before the other? In this case, they appear to be going simultaneuosly. If it is, is there anyplace besides the clusterware logs that would show evidence? On Fri, Nov 12, 2010 at 10:25 AM, Kevin Closson <ora_kclosson@xxxxxxxxx<mailto:ora_kclosson@xxxxxxxxx>> wrote: >Absolutely no indication of a node eviction. I'm not sying this is your problme, bu... the messages you are looking for are sent via syslogd and are buffered writes. Don't expect a catatonic server to be able to flush buffered writes to a log. There is a reason Oracle implemented IPMI fencing in 11.2...I guess I wasn't such a renegade for blogging about fencing approaches all those years... ________________________________ From: Andrew Kerber <andrew.kerber@xxxxxxxxx<mailto:andrew.kerber@xxxxxxxxx>> To: harish.kumar.kalra@xxxxxxxxx<mailto:harish.kumar.kalra@xxxxxxxxx> Cc: john40855@xxxxxxxxx<mailto:john40855@xxxxxxxxx>; oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx> Sent: Thu, November 11, 2010 8:50:30 PM Subject: Re: Really Strange Problem Absolutely no indication of a node eviction. Nothing in any of the clusterware logs indicates a node eviction on either node. (crsd.log, ocssd.log, etc) They are all normal until they suddenly start back up after an unexpected shutdown. On Thu, Nov 11, 2010 at 9:36 PM, Harish Kumar <harish.kumar.kalra@xxxxxxxxx<mailto:harish.kumar.kalra@xxxxxxxxx>> wrote: John, Have you checked ocssd.log and system logfiles. Download and installe CHM also know as Cluster Health Monitor and let it running until node evicts again. Once nodes are evicted check and analyze logfiles collected by CHM. Oracle may evict node for different reasons such as CPU saturation, longer IO latencies, missconfigured network etc. I think once you have logfiles in place then it will be more clearer what the actual problem is. Reagrds Harish Kumar Independant Database Consultant www.oraxperts.com<http://www.oraxperts.com/> On Fri, Nov 12, 2010 at 1:20 PM, John Smith <john40855@xxxxxxxxx<mailto:john40855@xxxxxxxxx>> wrote: Oh yes, if I didnt make it clear: OS - OEL 5.5 x86_64 Clusterware: 11.1.0.7 x86_64 ASM - 11.1.0.7 x86_64 (running over RAW) Database: 10.1.0.5 x86_64 (running) Database: 10.2.0.4 x86_64 (installed, but not running at this point) ---------- Forwarded message ---------- From: John Smith <john40855@xxxxxxxxx<mailto:john40855@xxxxxxxxx>> Date: Thu, Nov 11, 2010 at 8:14 PM Subject: Really Strange Problem To: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx> OK, I don't know if this one is related to oracle database, OEL, or something else entirely. But here it is: We have oracle clusterware 11.1 installed and running with asm 11.1. We also have oracle 10.2 installed, as well as 10.1. I have created a 10.1 database. ASM is on RAW against EMC storage. This has to be on raw because the intent is to take 10.1, 32 bit database to 10.2 64 bit. This requires a stop at 10.1 64 bit along the way, and 10.1 reqires ASM on raw. Anyway, the problem is that the servers are rebooting every 2-3 days at 2:15 am, and we have not been able to figure out why. There is nothing in the ASM or clusterware or database logs, they show everything running fine then a restart. Nothing in /var/log/messages. Just shows a restart. Any ideas? -- Andrew W. Kerber 'If at first you dont succeed, dont take up skydiving.' NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, please go to www.td.com/legal for instructions. AVIS : Message confidentiel dont le contenu peut être privilégié. Utilisation/divulgation interdites sans permission. Si reçu par erreur, prière d'aller au www.td.com/francais/avis_juridique pour des instructions.