Re: Linux NIC bonding

  • From: Dan Norris <dannorris@xxxxxxxxxxxxx>
  • To: mzito@xxxxxxxxxxx, a.piesk@xxxxxxx
  • Date: Wed, 12 Dec 2007 18:06:49 -0800 (PST)

Thanks, Matt. I didn't know where the bonding info was kept, so I'll take a 
look at /proc/bonding tomorrow. I suppose I should have guessed that I'd find 
the info there ;). The background information helps too--I'll share that with 
the network gang and see if they can verify and validate some of the switch 
configs. I'll check the servers and see if they're "ping-ponging" between 
interfaces which might give us some more clues.

Thanks,
Dan

----- Original Message ----
From: Matthew Zito <mzito@xxxxxxxxxxx>
To: dannorris@xxxxxxxxxxxxx; a.piesk@xxxxxxx
Cc: Oracle L <oracle-l@xxxxxxxxxxxxx>
Sent: Wednesday, December 12, 2007 4:54:17 PM
Subject: RE: Linux NIC bonding





 
 


<!--
 _filtered {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times 
New Roman";}
a:link, span.MsoHyperlink
        {color:blue;text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;text-decoration:underline;}
span.EmailStyle17
        {font-family:Arial;color:navy;}
 _filtered {margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {}
-->






In general, you always disable spanning
tree on host-side ports (or, as cisco likes to look at it, enabling “portfast”),
because otherwise the switch will prevent the host from passing traffic for a
specified period (typically 30 seconds) of time after link-up, while it
attempts to determine if the link that has just come up created an Ethernet
loop (we can discuss the magic of broadcast storms if people would like). This
can create all sorts of nastiness, since very often on today’s
fast-booting x86 systems, there is a short amount of time from when link-up
occurs until network services are trying to get going, except they won’t
be able to DHCP or talk to their NTP server or download their DNS zones, etc.
etc. until the spanning-tree discover phase ends, and badness results. 
Setting portfast/disabling spanning tree makes the switch “trust”
that the host that is being plugged in does not create an Ethernet loop.
 

  
 

It sounds like the link is failing back
and forth between the different interfaces, and potentially the updelay is
causing the dropped packets.  When you are seeing 50% packet loss, what
does cat /proc/bonding/bondX look like (try cat’ing it several times over
a few seconds and seeing if it changes active interfaces)?  What does a
netstat –rn show?  
 

  
 

It could also be that your cards don’t
support the MII polling mechanism for determining link up.  Again, the
proc/bonding/bondX  (for the appropriate interfaces) can provide a pile of
information about how everything seems to be working.
 

  
 

Thanks,
 

Matt
 

  
 










From:
oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf 
Of Dan Norris

Sent: Wednesday, December 12, 2007
2:53 PM

To: a.piesk@xxxxxxx

Cc: Oracle L

Subject: Re: Linux NIC bonding
 




  
 





>> back to your problem: have you disabled
spanning-tree on both switches?
 





I'm just the Oracle/Linux guy, so I don't know. They said that they tried that,
but I'm not sure how carefully it was tested. I think there was some guessing
going on at that point, so they may need to recheck that. I suppose that's a
requirement to use active-backup bonding? Is that documented anywhere?



Dan
 




  
 













Other related posts: