Re: OS upgrade for RAC

  • From: Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx>
  • To: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
  • Date: Wed, 3 Jun 2015 09:18:35 -0700

Hi Amir,
I am *guess*ing here: Seemingly arbitrary (as Mark correctly pointed
out), insistence of1-2 day to complete rolling upgrade is quite possibly
triggered because of the following concerns:

(a) Support may not have tested this configuration and so, they are
being paranoid to give you a better answer.

(b) They are concerned about OS bug fixes. For example, in olden days,
there was an NTP bug which was fixed in a later release. So, the upgraded
servers were catching up time fast, but not-yet-upgraded servers were not
catching up that fast and the servers were drifting slowly from the mean
cluster time. We (accidentally) noticed that in the GI or CSSD log and had
to add a manual fix to correct time. Had we left the servers drift in time,
they would have restarted eventually, as this leads to a condition similar
to the missing heartbeat issue, but, that was many years ago. So, 1-2 day
may have come from such bad experience.

(c) Database files in the NFS file system: NFS client software might have
a bit more optimization in the later release, so, having the cluster nodes
with different NFS client *may* cause issues, again, this is untested from
the support point of view. If you don't use NFS files for the database,
ignore this point ( Just to be clear, I am not against NFS, and I do agree
that direct NFS is highly performant. Having different NFS client in
different nodes, for the same database, is probably not certified).

(d) Different OS release inevitably bring different firmware release too.
So, quite possibly, different firmware is not certified by the hardware
layer vendors too.

Setting aside above items, let's review the communication between the
nodes:

(1) Network heartbeat and other RAC messages flow through UDP protocol.
I doubt that this upgrade will change any functionality at that low level
layer. So, having two different OS versions should be fine.

(2) Cluster nodes should not drift away too far from the mean cluster
time. I am almost positive that you use NTP and that is a rock solid
product at this time. CTSSD is an Oracle product. So, in this case also,
having two different OS versions should be fine.

(3) CSSD based disk heartbeat also should not cause issues, as it is an
Oracle binary. Further, I doubt that there will be a big functional change
in the low I/O layer too.

If I were you, this is what I would recommend as a plan:
Say , you have n nodes in the cluster, then:
(i) Upgrade first node, bring that in to the cluster, test it for a day
or two.
(ii) Upgrade ~half the nodes. test for a day.
(ii) upgrade all other remaining nodes.

This plan should reduce the concerns. However, I don't know whether this
approach is feasible or not..

Cheers

Riyaj Shamsudeen
Principal DBA,
Ora!nternals - http://www.orainternals.com - Specialists in Performance,
RAC and EBS
Blog: http://orainternals.wordpress.com/
Oracle ACE Director and OakTable member <http://www.oaktable.com/>

Co-author of the books: Expert Oracle Practices
<http://tinyurl.com/book-expert-oracle-practices/>, Pro Oracle SQL,
<http://tinyurl.com/ahpvms8> <http://tinyurl.com/ahpvms8>Expert RAC
Practices 12c. <http://tinyurl.com/expert-rac-12c> Expert PL/SQL practices
<http://tinyurl.com/book-expert-plsql-practices>

<http://tinyurl.com/book-expert-plsql-practices>


On Tue, Jun 2, 2015 at 9:04 AM, Hameed, Amir <Amir.Hameed@xxxxxxxxx> wrote:

Thanks Mark.

What you have stated is exactly what I was also wondering, especially when
it is a rev. upgrade and not a Sol10-to-Sol11 upgrade.



*From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *Mark J. Bobak
*Sent:* Tuesday, June 02, 2015 11:56 AM
*To:* djeday84@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
*Subject:* Re: OS upgrade for RAC



1-2 days seems pretty arbitrary to me. What, exactly, will break on day
3, that didn't break on day 1 or day 2?

Obviously, probably shouldn't run that way indefinitely, but doing one
node per weekend for 4 weekends seems reasonable to me.

-Mark



On Tue, Jun 2, 2015, 11:20 AM Anton <djeday84@xxxxxxxxx> wrote:

it is linux, but because of bugs we had to work with

[root@pk7db01 ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 5.11 (Tikanga)
Kernel \r on an \m

[oracle@pk7db02 /home/oracle]$cat /etc/issue
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
Kernel \r on an \m

seems to work fine, at least this week.




On 06/02/2015 05:05 PM, Hameed, Amir wrote:

We are running a four-node RAC (both Grid and RDBMS are 11.2.0.4) on
Solaris 10 update 10. We need to upgrade our database and grid to 12c and
that requires the OS to be at a minimum of Solaris 10 Update 11. What I
would like to find out is that if we do a rolling OS upgrade where we
upgrade one RAC node at a time from *Solaris10/Update10* to
*Solaris10/Update11*, for how long can these RAC nodes stay out of synch
in terms of the OS revision level? Is it possible to upgrade OS on one node
every week and spread the entire process over four weeks? Oracle’s response
was that the maximum these hosts can stay out of synch is 1-2 days but I
would like to validate it from the list.




Other related posts: