Slow network speed between VM and external
I've got controller and network node on the one phisycal machine running ubuntu 12.04 (3.8.0-36-generic). The problem is that the bandwidth from my VM network to outside network is:
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.3 sec 46.9 MBytes 38.2 Mbits/sec
[ 4] 0.0-12.5 sec 896 KBytes 586 Kbits/sec
[ 5] local 172.100.0.20 port 5001 connected with 172.100.0.101 port 50791
I'm running neutron with VLAN networking. The speeds between VM's are ok ~450 Mb/s. From VM to qrouter is also slow:
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 218 MBytes 182 Mbits/sec
[ 4] 0.0-10.1 sec 29.9 MBytes 24.9 Mbits/sec
I have disabled gro for my br-ex int eth0 on my network/controller node:
$ ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off
Please help me get this working at higher speeds.
When i disabled rx-checksumming and tx-checksumming the transfers gow higher but not as high as i expected: [ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 266 MBytes 222 Mbits/sec
[ 5] 0.0-10.1 sec 27.5 MBytes 22.9 Mbits/sec Please help. I'm trying turning off and on and it won't help. Maybe i need to turn off gro at node interface or something?
run tcpdump on the qr-xxxxxxxx-xx interface in the qrouter namespace and check for packets much greater than 1500 bytes, eg >1600. That would suggest offloading is happening somewhere.
From IRC: found that the machine was rebooted and so GRO was enabled on ethX again. After GRO disabled again, there where no more big packets on ethX or gr-xxxxxxx-xx, but still slow. Tcpdump show retransmissions. Suggested turning off all offloading stuff on ethX - that didn't work either.
I think this issue is only seen with recent Ubuntu kernels - 3.5 and 3.8. A possible solution is to use the older 3.2 kernel on the node running the L3 agent.