RE: Linux NIC bonding

  • From: "Matthew Zito" <mzito@xxxxxxxxxxx>
  • To: <dannorris@xxxxxxxxxxxxx>, <a.piesk@xxxxxxx>
  • Date: Wed, 12 Dec 2007 17:54:17 -0500

In general, you always disable spanning tree on host-side ports (or, as
cisco likes to look at it, enabling "portfast"), because otherwise the
switch will prevent the host from passing traffic for a specified period
(typically 30 seconds) of time after link-up, while it attempts to
determine if the link that has just come up created an Ethernet loop (we
can discuss the magic of broadcast storms if people would like). This
can create all sorts of nastiness, since very often on today's
fast-booting x86 systems, there is a short amount of time from when
link-up occurs until network services are trying to get going, except
they won't be able to DHCP or talk to their NTP server or download their
DNS zones, etc. etc. until the spanning-tree discover phase ends, and
badness results.  Setting portfast/disabling spanning tree makes the
switch "trust" that the host that is being plugged in does not create an
Ethernet loop.

 

It sounds like the link is failing back and forth between the different
interfaces, and potentially the updelay is causing the dropped packets.
When you are seeing 50% packet loss, what does cat /proc/bonding/bondX
look like (try cat'ing it several times over a few seconds and seeing if
it changes active interfaces)?  What does a netstat -rn show?  

 

It could also be that your cards don't support the MII polling mechanism
for determining link up.  Again, the proc/bonding/bondX  (for the
appropriate interfaces) can provide a pile of information about how
everything seems to be working.

 

Thanks,

Matt

 

________________________________

From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Dan Norris
Sent: Wednesday, December 12, 2007 2:53 PM
To: a.piesk@xxxxxxx
Cc: Oracle L
Subject: Re: Linux NIC bonding

 

>> back to your problem: have you disabled spanning-tree on both
switches?


I'm just the Oracle/Linux guy, so I don't know. They said that they
tried that, but I'm not sure how carefully it was tested. I think there
was some guessing going on at that point, so they may need to recheck
that. I suppose that's a requirement to use active-backup bonding? Is
that documented anywhere?

Dan

 

Other related posts: