Re: Failover testing with 10g RAC

  • From: "Bradd Piontek" <piontekdd@xxxxxxxxx>
  • To: jeffthomas24@xxxxxxxxx
  • Date: Fri, 30 May 2008 10:48:35 -0500

Jeff,
  Are the pieces you are failing redundant in nature? For example, multiple
HBAs, switches etc? We had some issues in our fail-over testing that had to
do with Service Processor fail-over and it was due to a Linux kernel issue
and nmi watchdog processes (again, this was on linux). Without redundancy in
the components you mentioned, I would expect CRS to reboot the node. What
are you using for OCR and Voting Disk?
-- 
Bradd Piontek
Twitter: http://www.twitter.com/piontekdd
Oracle Blog: http://piontekdd.blogspot.com
Linked In: http://www.linkedin.com/in/piontekdd
Last.fm: http://www.last.fm/user/piontekdd/

On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas <jeffthomas24@xxxxxxxxx>
wrote:

> Solaris 10, RAC 10.2.0.3.   Using IPMP groups for  NIC redundancy.
>
> We've been conducting failover testing -- disabling a HBA port,  power
> off a switch,
> yank an IC link, etc.
>
> In every single case, CRS rebooted the server where the dire deed was
> performed,
> and when the server came back up, the repair was successful, e.g. failed
> over to
> the secondary HBA port, or the physical IP for the IPMP group floated
> to the standby
> NIC and so forth.
>
> The other server stayed up and all Oracle components remained
> available.   During
> the switch power off  test, the physical IP for the IC actually
> floated over to the
> standby NIC with no outage on this server.
>
> Is this what is to be expected?   CRS will always reboot a server to repair
> itself when an underlying hardware failure is detected?
>
> Thanks,
> Jeff
> --
> //www.freelists.org/webpage/oracle-l
>
>
>

Other related posts: