Re: CRS-1615:voting device hang at 50% fatal, termination in 99620 ms

  • From: David Barbour <david.barbour1@xxxxxxxxx>
  • To: marko.sutic@xxxxxxxxx
  • Date: Thu, 25 Aug 2011 17:14:42 -0500

Anything in /var/log/messages?

On Thu, Aug 25, 2011 at 5:42 AM, Marko Sutic <marko.sutic@xxxxxxxxx> wrote:

> Freek,
>
> you are correct - heartbeat fatal messages are there due to the missing
> voting disk.
>
> I have another database up and running on second node and this database is
> using same ocfs2 volume for Oracle database files as the first one.
> This database is running without any error so I suppose that other OCFS2
> volumes were accessible in the time of the failure.
>
> In this configuration are 3 voting disk files located on 3 different luns
> and separate OCFS2 volumes. When failure occurs two of three voting devices
> hang.
>
> It is also worth to mention that nothing else is running on that node
> except import.
>
>
> I simply can't figure out why two of three voting disks hang.
>
>
> Regards,
> Marko
>
>
> On Thu, Aug 25, 2011 at 11:08 AM, D'Hooge Freek <Freek.DHooge@xxxxxxxxx>wrote:
>
>> Marco,
>>
>> I don't know the error timings for the other node, but I think the
>> heartbeat fatal messages are coming after the first node has terminated due
>> to the missing voting disk.
>>
>> This would indicate that there is no general problem with the voting disk
>> itself, but that the problem is specific to the first node.
>> Either the connection itself or the load or an ocfs2 bug would then be the
>> cause of the error.
>>
>> Do you know if at the time of the failure the other OCFS2 volumes where
>> still accessible?
>> Are your voting disks placed on the same luns as your database files or
>> are they on a separate ocfs2 volume?
>>
>> Regards,
>>
>>
>> Freek D'Hooge
>> Uptime
>> Oracle Database Administrator
>> email: freek.dhooge@xxxxxxxxx
>> tel +32(0)3 451 23 82
>> http://www.uptime.be
>> disclaimer: www.uptime.be/disclaimer
>> ---
>> From: Marko Sutic [mailto:marko.sutic@xxxxxxxxx]
>> Sent: donderdag 25 augustus 2011 10:51
>> To: D'Hooge Freek
>> Cc: oracle-l@xxxxxxxxxxxxx
>> Subject: Re: CRS-1615:voting device hang at 50% fatal, termination in
>> 99620 ms
>>
>> Errors messages from another node:
>>
>> 2011-08-25 10:38:33.563
>> [cssd(18117)]CRS-1612:node l01ora3 (1) at 50% heartbeat fatal, eviction in
>> 14.000 seconds
>> 2011-08-25 10:38:40.558
>> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in
>> 7.010 seconds
>> 2011-08-25 10:38:41.560
>> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in
>> 6.010 seconds
>> 2011-08-25 10:38:45.558
>> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
>> 2.010 seconds
>> 2011-08-25 10:38:46.560
>> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
>> 1.010 seconds
>> 2011-08-25 10:38:47.562
>> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
>> 0.010 seconds
>> 2011-08-25 10:38:47.574
>> [cssd(18117)]CRS-1607:CSSD evicting node l01ora3. Details in
>> /u01/app/crs/log/l01ora4/cssd/ocssd.log.
>> 2011-08-25 10:39:01.579
>> [cssd(18117)]CRS-1601:CSSD Reconfiguration complete. Active nodes are
>> l01ora4 .
>>
>>
>> Regards,
>> Marko
>>
>
>
>
> --
> Marko Sutic, dipl.ing.rač.
> My LinkedIn Profile <http://hr.linkedin.com/in/markosutic>
>
>

Other related posts: