Anything in /var/log/messages? On Thu, Aug 25, 2011 at 5:42 AM, Marko Sutic <marko.sutic@xxxxxxxxx> wrote: > Freek, > > you are correct - heartbeat fatal messages are there due to the missing > voting disk. > > I have another database up and running on second node and this database is > using same ocfs2 volume for Oracle database files as the first one. > This database is running without any error so I suppose that other OCFS2 > volumes were accessible in the time of the failure. > > In this configuration are 3 voting disk files located on 3 different luns > and separate OCFS2 volumes. When failure occurs two of three voting devices > hang. > > It is also worth to mention that nothing else is running on that node > except import. > > > I simply can't figure out why two of three voting disks hang. > > > Regards, > Marko > > > On Thu, Aug 25, 2011 at 11:08 AM, D'Hooge Freek <Freek.DHooge@xxxxxxxxx>wrote: > >> Marco, >> >> I don't know the error timings for the other node, but I think the >> heartbeat fatal messages are coming after the first node has terminated due >> to the missing voting disk. >> >> This would indicate that there is no general problem with the voting disk >> itself, but that the problem is specific to the first node. >> Either the connection itself or the load or an ocfs2 bug would then be the >> cause of the error. >> >> Do you know if at the time of the failure the other OCFS2 volumes where >> still accessible? >> Are your voting disks placed on the same luns as your database files or >> are they on a separate ocfs2 volume? >> >> Regards, >> >> >> Freek D'Hooge >> Uptime >> Oracle Database Administrator >> email: freek.dhooge@xxxxxxxxx >> tel +32(0)3 451 23 82 >> http://www.uptime.be >> disclaimer: www.uptime.be/disclaimer >> --- >> From: Marko Sutic [mailto:marko.sutic@xxxxxxxxx] >> Sent: donderdag 25 augustus 2011 10:51 >> To: D'Hooge Freek >> Cc: oracle-l@xxxxxxxxxxxxx >> Subject: Re: CRS-1615:voting device hang at 50% fatal, termination in >> 99620 ms >> >> Errors messages from another node: >> >> 2011-08-25 10:38:33.563 >> [cssd(18117)]CRS-1612:node l01ora3 (1) at 50% heartbeat fatal, eviction in >> 14.000 seconds >> 2011-08-25 10:38:40.558 >> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in >> 7.010 seconds >> 2011-08-25 10:38:41.560 >> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in >> 6.010 seconds >> 2011-08-25 10:38:45.558 >> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in >> 2.010 seconds >> 2011-08-25 10:38:46.560 >> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in >> 1.010 seconds >> 2011-08-25 10:38:47.562 >> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in >> 0.010 seconds >> 2011-08-25 10:38:47.574 >> [cssd(18117)]CRS-1607:CSSD evicting node l01ora3. Details in >> /u01/app/crs/log/l01ora4/cssd/ocssd.log. >> 2011-08-25 10:39:01.579 >> [cssd(18117)]CRS-1601:CSSD Reconfiguration complete. Active nodes are >> l01ora4 . >> >> >> Regards, >> Marko >> > > > > -- > Marko Sutic, dipl.ing.rač. > My LinkedIn Profile <http://hr.linkedin.com/in/markosutic> > >