Re: CRS-1615:voting device hang at 50% fatal, termination in 99620 ms

  • From: Marko Sutic <marko.sutic@xxxxxxxxx>
  • To: "D'Hooge Freek" <Freek.DHooge@xxxxxxxxx>
  • Date: Thu, 25 Aug 2011 12:42:37 +0200

Freek,

you are correct - heartbeat fatal messages are there due to the missing
voting disk.

I have another database up and running on second node and this database is
using same ocfs2 volume for Oracle database files as the first one.
This database is running without any error so I suppose that other OCFS2
volumes were accessible in the time of the failure.

In this configuration are 3 voting disk files located on 3 different luns
and separate OCFS2 volumes. When failure occurs two of three voting devices
hang.

It is also worth to mention that nothing else is running on that node except
import.


I simply can't figure out why two of three voting disks hang.


Regards,
Marko


On Thu, Aug 25, 2011 at 11:08 AM, D'Hooge Freek <Freek.DHooge@xxxxxxxxx>wrote:

> Marco,
>
> I don't know the error timings for the other node, but I think the
> heartbeat fatal messages are coming after the first node has terminated due
> to the missing voting disk.
>
> This would indicate that there is no general problem with the voting disk
> itself, but that the problem is specific to the first node.
> Either the connection itself or the load or an ocfs2 bug would then be the
> cause of the error.
>
> Do you know if at the time of the failure the other OCFS2 volumes where
> still accessible?
> Are your voting disks placed on the same luns as your database files or are
> they on a separate ocfs2 volume?
>
> Regards,
>
>
> Freek D'Hooge
> Uptime
> Oracle Database Administrator
> email: freek.dhooge@xxxxxxxxx
> tel +32(0)3 451 23 82
> http://www.uptime.be
> disclaimer: www.uptime.be/disclaimer
> ---
> From: Marko Sutic [mailto:marko.sutic@xxxxxxxxx]
> Sent: donderdag 25 augustus 2011 10:51
> To: D'Hooge Freek
> Cc: oracle-l@xxxxxxxxxxxxx
> Subject: Re: CRS-1615:voting device hang at 50% fatal, termination in 99620
> ms
>
> Errors messages from another node:
>
> 2011-08-25 10:38:33.563
> [cssd(18117)]CRS-1612:node l01ora3 (1) at 50% heartbeat fatal, eviction in
> 14.000 seconds
> 2011-08-25 10:38:40.558
> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in
> 7.010 seconds
> 2011-08-25 10:38:41.560
> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in
> 6.010 seconds
> 2011-08-25 10:38:45.558
> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
> 2.010 seconds
> 2011-08-25 10:38:46.560
> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
> 1.010 seconds
> 2011-08-25 10:38:47.562
> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
> 0.010 seconds
> 2011-08-25 10:38:47.574
> [cssd(18117)]CRS-1607:CSSD evicting node l01ora3. Details in
> /u01/app/crs/log/l01ora4/cssd/ocssd.log.
> 2011-08-25 10:39:01.579
> [cssd(18117)]CRS-1601:CSSD Reconfiguration complete. Active nodes are
> l01ora4 .
>
>
> Regards,
> Marko
>



-- 
Marko Sutic, dipl.ing.rač.
My LinkedIn Profile <http://hr.linkedin.com/in/markosutic>

Other related posts: