Re: Using Diagwait on Oracle Clusterware

  • From: LS Cheng <exriscer@xxxxxxxxx>
  • To: vishal@xxxxxxxxxxxxxxx
  • Date: Tue, 24 Nov 2009 08:08:48 +0100

one of the reasons I use diagwait is that it makes oprocd less sensitive :-)

the other reasons are those the note states but when there are evictions in
Solaris for example it is still quite hard to find out the root cause
(because CRSD sends some eviction messages to system console and that
usually is not wriiten to files unless configured so but many solaris admin
does not know how to do it!)



Thanks

--
LSC


On Mon, Nov 23, 2009 at 6:08 PM, Vishal Gupta <vishal@xxxxxxxxxxxxxxx>wrote:

>  Hello List,
>
> What is the general consensus among RAC users regarding use of diagwait on
> Oracle clusterware.
>
> Metalink Note - 559365.1
>
>  Symptoms
>
> Oracle Clusterware evicts the node from the cluster when
>
>    - Node is not pinging via the network heartbeat
>    - Node is not pinging the Voting disk
>    - Node is hung/busy and is unable to perform either of the earlier
>    tasks
>
> In Most cases when the node is evicted, there is information written to the
> logs to analyze the cause of the node eviction. However in certain cases
> this may be missing, the steps documented in this note are to be used for
> those cases where there is not enough information or no information to
> diagnose the cause of the eviction.
> Changes
>
> None
> Cause
>
> When the node is evicted and the node is extremely busy in terms of CPU (or
> lack of it) it is possible that the OS did  not get time to flush the
> logs/traces to the file system. It may be useful to set diagwait attribute
> to delay the node reboot to give additional time to the OS to write the
> traces. This setting will provide more time for diagnostic data to be
> collected by safely and will *NOT* increase probability of corruption.
> After setting diagwait, the Clusterware will wait an additional 10 seconds
> (Diagwait - reboottime). Customers can unset diagwait by following the steps
> documented below after fixing their OS scheduling issues.
>
>
>
>
>  Regards,
> Vishal Gupta
> http://www.vishalgupta.com
>

Other related posts: