Re: Oracle RAC nodes eviction question

  • From: Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx>
  • To: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
  • Date: Wed, 13 Aug 2014 13:31:59 -0700

Hello Amir
   Losing binaries can, and most probably will, lead to node eviction. When
there is a fault for an executable page in the page cache, that page need
to be paged-in from the binary. If the binary is not available, then the GI
processes will be killed. Death of GI processes will lead to events such as
missing heartbeats etc and finally to node eviction. From 11gR2 onwards, GI
is restart is tried before restarting the node. Possibly that file system
may not have been available during GI restart try, so, it would have lead
to eventual node restart.

   This is analogous to the scenario of removing oracle binary while the
database is up (in that case also, database will crash eventually).

   I guess, an option to avoid node eviction due to loss of binaries
mounted through NFS, is to keep the GI and RDBMS homes local, still, it has
its own risk. Of course, in a big cluster environments, it is easier said
than done.

Cheers

Riyaj Shamsudeen
Principal DBA,
Ora!nternals -  http://www.orainternals.com - Specialists in Performance,
RAC and EBS
Blog: http://orainternals.wordpress.com/
Oracle ACE Director and OakTable member <http://www.oaktable.com/>

Co-author of the books: Expert Oracle Practices
<http://tinyurl.com/book-expert-oracle-practices/>, Pro Oracle SQL,
<http://tinyurl.com/ahpvms8> <http://tinyurl.com/ahpvms8>Expert RAC
Practices 12c. <http://tinyurl.com/expert-rac-12c> Expert PL/SQL practices
<http://tinyurl.com/book-expert-plsql-practices>

<http://tinyurl.com/book-expert-plsql-practices>



On Wed, Aug 13, 2014 at 12:57 PM, Hameed, Amir <Amir.Hameed@xxxxxxxxx>
wrote:

>  Folks,
>
> I am trying to understand the behavior of an Oracle RAC Cluster if the
> Grid and RAC binaries homes become unavailable while the Cluster and Oracle
> RAC are running. The Grid version is 11.2.0.3 and the platform is Solaris
> 10. The Oracle Grid and the Oracle RAC environments are on NAS with the
> database configured with dNFS. The storage for Grid and RAC binaries are
> coming from one NAS head whereas the OCR and Voting Disks (three of each)
> are spread over three NAS heads so that in the event that one NAS head
> becomes unavailable, the cluster can still access two voting disks. The
> recommendation for this configuration came from the storage vendor and
> Oracle. What we observed was that last weekend when the NAS head where the
> Grid and RAC binaries were mounted from went down for a few minutes, all
> RAC nodes were rebooted even though two voting disks were still accessible.
> In my destructive testing about a year ago, one of the tests run was to
> pull all cables of NICs that were used for kernel NFS on one of the RAC
> nodes but the cluster did not evict that node. Any feedback will be
> appreciated.
>
>
>
> Thanks,
>
> Amir
>

Other related posts: