RE: Network interconnect traffic on RAC

From: "Crisler, Jon" <Jon.Crisler@xxxxxxx>
To: <karlarao@xxxxxxxxx>, "K Gopalakrishnan" <kaygopal@xxxxxxxxx>
Date: Tue, 16 Feb 2010 12:48:59 -0500
You need to drill down into OCFS2 issues.  There are some Metalink notes
about gcc versions, and an issue where high CPU can starve the OCFS2
processes.  Changing the grub.conf file to use the deadline I/O
scheduler may help (CFQ is the default).  The messages about O2NET
correspond to what we have found.

Also, the ocfs2.conf file can be tweaked-  increase the I/O timeouts if
you are using multipathed SAN storage, and increase network timeouts if
you are using bonded NICs.  Find out the failover time set for both SAN
multipathing and Network bonding, and make sure that OCFS2 is configured
with higher timeout values that SAN and NIC timeouts.  What can happen
is that if you have a multipath or nic bond timeout or failover, OCFS2
can trigger an outage before the OS resource failover is completed.
Again from the o2net messages this seems likely in your case.

We have otherwise identical RAC clusters running ASM vs. OCFS2 on 10.2
and the ASM clusters rarely have problems compared to OCFS2 in this
regard.  Once you have a stable gcc / glibc and stable OCFS2 time
parameters, then OCFS2 should be a lot more reliable.


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Karl Arao
Sent: Wednesday, February 10, 2010 4:38 PM
To: K Gopalakrishnan
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Network interconnect traffic on RAC

Thanks for the replies Andrew, Krishna, Aaron, Gopal...

I had this client running on three node RAC. Just recently the two
nodes got evicted... trying to diagnose if it was a CPU capacity, disk
latency, interconnect issue...
and been reading
- Oracle Clusterware and Private Network Considerations
- Practical Performance Management for Oracle RAC
- RAC Performance Tuning best practices

BTW they are running on 2 x 3.00GHz Xeon CPU on each node with 4GB
memory connected on EMC CX300.

From the time of the eviction, the two nodes that got evicted were
60-65% (run queue was 5 & 2.5 respectively) CPU utilization and the
surviving node was only 30% utilized (got the data from SAR)
then the cluster evicted the two nodes, BTW the ocfs2 (where the OCR &
voting disk resides) was also on the interconnect IPs so it was also
affected by the latency problem (shown on the OS logs)...

Unfortunately since the servers restarted the data from the current
SNAP_ID at the time of its busy load were all lost.. So I just have
the SAR data and priod & after SNAP_IDs for diagnosis:
- OS: 2 nodes at 60-65% (run queue was 5 & 2.5 respectively) CPU
utilization, the other was only 30%
- Disk: I don't have latency numbers, but from the SAR Disk data, the
2 evicted nodes had Block Transfer Read/Write/s of 450-500 and TPS/s
60-65... the surviving node had Block Transfer Read/Write/s of 60 and
TPS/s 10
- Network: On the interconnect interface, from the SAR Network data,
the 2 evicted nodes had similar utilization to the surviving node...
txbytes/rxbytes/s of 3,000,000-4,000,000
- Database: prior & after SNAP_IDs all nodes have an AAS of < CPU, and
the 2 evicted nodes just have 7 MB/s Read/Write activity... Looking at
the ASH data I can see "CPU" and "gc cr multi block" as top two
events.


Below are some of the output on one of the failing nodes:

-- OS log
Jan 25 14:47:57 rac1-3 kernel: o2net: connection to node rac1-2 (num
1) at 192.168.0.2:7777 has been idle for 30.0 seconds, shutting it
down.

-- Clusterware Alert log
[cssd(13414)]CRS-1610:node rac1-1 (3) at 90% heartbeat fatal, eviction
in 0.130 seconds 2010-01-25 14:47:55.880

-- CSS log
[    CSSD]2010-01-25 14:47:26.691 [1199618400] >WARNING:
clssnmPollingThread: node rac1-1 (3) at 90 3.123428e-317artbeat fatal,
eviction in 0.130 seconds
[    CSSD]2010-01-25 14:47:26.823 [1199618400] >TRACE:
clssnmPollingThread: Eviction started for node rac1-1 (3), flags
0x040d, state 3, wt4c 0
[    CSSD]2010-01-25 14:47:26.823 [1199618400] >TRACE:
clssnmDiscHelper: rac1-1, node(3) connection failed, con (0x785550),
probe((nil))
[    CSSD]2010-01-25 14:47:27.328 [1115699552] >TRACE:
clssnmReadDskHeartbeat: node(3) is down. rcfg(30) wrtcnt(519555)
LATS(534471324) Disk lastSeqNo(519555)


So from the data above.. I could have an initial finding that the
latency issue could be caused by high sustained CPU utilization on the
OS side which affected the scheduling of critical RAC processes or
could be caused by the congested interconnect switch...
I'd like to drill down which of the two is the culprit.. which is the
reason behind my asking..




- Karl Arao
karlarao.wordpress.com
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l
References:
- Network interconnect traffic on RAC
  - From: Karl Arao
- Re: Network interconnect traffic on RAC
  - From: K Gopalakrishnan
- Re: Network interconnect traffic on RAC
  - From: Karl Arao
RE: Network interconnect traffic on RAC

Other related posts: