[tor] Re: Migration to 2host? FreeBSD on current host?

  • From: Julian Wissmann <julianwissmann@xxxxxxxxx>
  • To: torservers@xxxxxxxxxxxxx
  • Date: Sat, 14 Aug 2010 15:17:27 +0200

Am 14.08.2010 um 14:26 schrieb Moritz Bartl:

> Hi,
> 
>> (I kinda missed the history of this. With the kind of throughput we're
>> talking about I'm surprised there is a problem to begin with. Is it
>> the specific driver that's the problem?)
> 
> Something is limiting the throughput of our main Tor exit node. Our plan 
> covers more bandwidth than we're currently pushing, and from talking to Tor 
> devs, there seem to be several problems, some - maybe - on the Tor side of 
> things. Another problem I think needs to be solved is that one of the CPU 
> cores is maxed out, nearly constantly.
> Fejk suggested to look into NIC IRQ handling, and it turned out that all 
> network interrupts hit one core only.
Thats the expected behavior actually, but I did some more research on this and 
stumbeld upon CONFIG_HOTPLUG_CPU which seems to brick APIC when turned on, so 
you might want to try that.

> When I set smp_affinity of the public NIC to a different value (or use 
> irqbalance, which does the same), the 100% usage moves to a different core, 
> so to me it looks like a limiting element indeed.
> 
> On the hardware side of things, the Intel driver (e1000e) should be able to 
> distribute load, at least for some NIC models. On the software side of 
> things, the latest Kernel should have at least helped distribute the load 
> (RPS, RFS; http://bit.ly/dwp2LA), but nobody seems to know how to enable this 
> - just installing the Kernel didn't help.
> 
> What I tried was update the NIC driver, various kernels and kernel options.
> 
> Here are a few more technical details. This is how it looks at the moment:
> 
> # lspci -v
> see http://us1.torservers.net/lspci.txt
> 
> # cat /proc/interrupts | grep eth1
> 46:   3080885755 0   0          0   PCI-MSI-edge      eth1
> Rate: Around 10-15k per second.
> 
> # cat /proc/irq/46/smp_affinity
> f
> 
> When I use irqbalance, the 100% load gets moved to one of the other CPUs 
> every once in a while (it just writes different values to the smp_affinity 
> file). This does not help anything.
> 
> # ethtool -i eth1
> driver: e1000e
> version: 1.0.2-k4
> firmware-version: 0.5-7
> bus-info: 0000:0e:00.0
> 
> # modinfo e1000e | grep version
> version:        1.2.10-NAPI
> srcversion:     CA55700C9061B2DA1D58D4A
> vermagic:       2.6.35-custom SMP mod_unload modversions
> 
> # vnstat -i eth1 -l
> Monitoring eth1...    (press CTRL-C to stop)
> 
>   rx:   26853.76 kB/s 33748 p/s            tx:   28223.70 kB/s 34338 p/s
> 
> # dmesg | grep MSI
> [    3.226133] pcieport 0000:00:01.0: irq 40 for MSI/MSI-X
> [    3.226191] pcieport 0000:00:03.0: irq 41 for MSI/MSI-X
> [    3.226258] pcieport 0000:00:1c.0: irq 42 for MSI/MSI-X
> [    3.226348] pcieport 0000:00:1c.4: irq 43 for MSI/MSI-X
> [    3.226433] pcieport 0000:00:1c.5: irq 44 for MSI/MSI-X
> [    4.909612] e1000e 0000:0d:00.0: irq 45 for MSI/MSI-X
> [    5.038117] e1000e 0000:0e:00.0: irq 46 for MSI/MSI-X
> [   19.580209] e1000e 0000:0e:00.0: irq 46 for MSI/MSI-X
> [   19.636073] e1000e 0000:0e:00.0: irq 46 for MSI/MSI-X
> [   20.064217] e1000e 0000:0d:00.0: irq 45 for MSI/MSI-X
> [   20.120093] e1000e 0000:0d:00.0: irq 45 for MSI/MSI-X
> 
> # ethtool -k eth1
> Offload parameters for eth1:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: on
> udp fragmentation offload: off
> generic segmentation offload: on
> large receive offload: off
> 
> ethtool -c eth1
> Coalesce parameters for eth1:
> Adaptive RX: off  TX: off
> stats-block-usecs: 0
> sample-interval: 0
> pkt-rate-low: 0
> pkt-rate-high: 0
> 
> rx-usecs: 1000
> rx-frames: 0
> rx-usecs-irq: 0
> rx-frames-irq: 0
> 
> tx-usecs: 0
> tx-frames: 0
> tx-usecs-irq: 0
> tx-frames-irq: 0
> 
> rx-usecs-low: 0
> rx-frame-low: 0
> tx-usecs-low: 0
> tx-frame-low: 0
> 
> rx-usecs-high: 0
> rx-frame-high: 0
> tx-usecs-high: 0
> tx-frame-high: 0
> 
> Moritz
> 


Other related posts: