Re: RE: Really Strange Problem

  • From: Gerwin Hendriksen <gerwin.hendriksen@xxxxxxxxx>
  • To: gerwin.hendriksen@xxxxxxxxx
  • Date: Fri, 12 Nov 2010 19:05:26 +0100

Although a lot has been said on this subject,

I was wondering if there are any stats from the system available like sar.
This could possible put a light on the fact if the issue has anything to do
with IO or CPU resources. I could imagine when something like cpu or io is
shortly very occupied that certain timeout limits are reached. Also for
example a mirror sync on the storage could shortly freeze things which might
end in a time out and strange side effects in your cluster environment.

Regards,

Gerwin Hendriksen

2010/11/12 Niall Litchfield <niall.litchfield@xxxxxxxxx>

> Also look for other servers running the same os that you might have missed
> (because say Apache is configured to autostart). Or, flavour, of my week
> this week blame unspecified server configuration issue - though to reproduce
> the insanity properly you'll need a clear error in the logs and confirmation
> from development that its a known bug before blaming the nebulous. :)
>
> On 12 Nov 2010 17:41, "Amaral, Rui" <Rui.Amaral@xxxxxxxxxxxxxxxx> wrote:
>
>  both the servers going simultaneously indicates OS. Even with the san
> going away or all connectivity being lost something would still get written
> indicating the problem and the fact that it's clean then all of a sudden you
> see start up messages in the oracle logs indicates to me that an immediate
> reboot (OS crash if you will) happened with Oracle having no chance to
> write. Like Kevin indicated in scenarios like that there would be messages
> captured by syslogd but typically would be lost in those types of cases.
> However, there are ways to try and capture them going forward:
>
> 1) enable netdump on the servers. Netdump runs in it's own protected memory
> and would be able to dump those messages prior to the machine rebooting. I
> have had SA's do this with some success
>
> 2) disable the reboot so that SA's can eith iLo into the box, or manually
> connecting a terminal to the box, to do screen capture of the messages then
> manually restarting the box which we have also used with some success
> (especially in the very early days of ocfs)
>
> 3) or enable remote syslog capture (though I am not too convinced of this
> one) :
>
>
> http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch05_:_Troubleshooting_Linux_with_syslog
>
> Like Niall I suggest you looking at the OS cron - the timing is just too
> conspicuous. Make sure updatedb is not scheduled to run.
>
>
>
> Rui Amaral
> Database Administrator
> ITS - SSG
> TD Bank Financial Group
> 220 Bay St., 11th Floor
> Toron...
> ------------------------------
> *From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
> oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *Kevin Closson
> *Sent:* Friday, November 12, 2010 12:26 PM
> *To:* John Smith
>
>
> Cc: andrew.kerber@xxxxxxxxx; harish.kumar.kalra@xxxxxxxxx;
> oracle-l@xxxxxxxxxxxxx
>
> Subject: Re: Really Strange Problem
>
> Wow..re-reading my email...massive "typos." Actually, the contacts on this
> old keyboard are nearly g...
>
>
> NOTICE: Confidential message which may be privileged. Unauthorized
> use/disclosure prohibited. If re...
>
>

Other related posts: