Re: RAC node "has a disk HB, but no network HB" but traceroute

From: "Yong Huang" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "yong321" for DMARC)
To: <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 5 Jan 2017 22:08:40 +0000 (UTC)

Thanks, Justin, Jure and Martin. Martin's article is great. Interpreting "no
network HB" as "there are 2 or more processes which missed to communicate"
instead of a network problem is the key. That's exactly what I meant in the SR
I opened by saying "We begin to doubt about the meaning of the "no network HB"
message". So far the SR hasn't gone anywhere after uploading various types of
logs.

Our log does show fast increase in IP packets that need reassembly and all
these reassemblies failed:
$ egrep '^zzz|reassembl' <OSWatcher netstat log>
...
zzz Sun Dec 18 02:01:58 CST 2016
555539624 reassemblies required
100653307 packets reassembled ok
60026 packet reassembles failed
zzz Sun Dec 18 02:02:28 CST 2016
555545702 reassemblies required
100653307 packets reassembled ok
66103 packet reassembles failed
zzz Sun Dec 18 02:02:58 CST 2016
555551748 reassemblies required
100653307 packets reassembled ok
72149 packet reassembles failed

Of all the documents I found, Red Hat "IP fragmentation fails and fragmented
packets get dropped" at
https://access.redhat.com/solutions/1498603
is a good one. But you have to login to read it. In short, if I understand the
confusing Root Cause section correctly, kernel-2.6.32-477.el6 or RHEL6.6 has a
bug that incorrectly calculates IP fragmentation memory, which causes false
evictions (i.e. drop) of IP fragments on systems with many CPUs. (Our problem
server has 80 CPUs. Other servers have much less.) Upgrade of the kernel or Red
Hat release version is the solution. An easy workaround is to increase the
fragmentation buffer size. The article says doubling the fragmentation
thresholds is enough, i.e. from the default 4M to 8M. We'll set the IP
fragmentation buffer low and high values to 15 and 16 MB per Oracle note
2008933.1. I think the counter "fragments dropped after timeout" in `netstat
-s' is related to /proc/sys/net/ipv4/ipfrag_time and ours seems to be fairly
stable even before the crash, I'll leave that parameter alone for now.

Now I think I know why our OSWatcher did not report a traceroute problem at the
last crash: the default packet size used by traceroute is only 60 bytes. To
detect the problem, we should append a packet length parameter to the
traceroute command with a value greater than 1500, the Ethernet MTU.

Yong Huang
--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: RAC node "has a disk HB, but no network HB" but traceroute
  - From: Gus Spier
- RE: RAC node "has a disk HB, but no network HB" but traceroute
  - From: CRISLER, JON

Re: RAC node "has a disk HB, but no network HB" but traceroute

Other related posts: