DPDK: Inter VM communication of iperf3 TCP throughput is very low on same host compare to non DPDK throughput

asked 2016-12-22 00:41:05 -0600

Rajalakshmi gravatar image

updated 2017-01-09 15:52:55 -0600

rbowen gravatar image

Host - ubuntu16.04 devstack - stable/newton which install DPDK 16.07 and OVS 2.6 versions

with DPDK plugin and following DPDK configurations

Grub changes

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash default_hugepagesz=1G hugepagesz=1G hugepages=8 iommu=pt intel_iommu=on"

local.conf - changes for DPDK

enable_plugin networking-ovs-dpdk https://git.openstack.org/openstack/networking-ovs-dpdk master

before VM creation

#nova flavor-key m1.small set hw:mem_page_size=1048576

Able to create two ubuntu instance in flavor m1.small

Achieved iperf3 tcp throughput of ~7.5Gbps Ensured the vhostport is created and HugePage is consumed at the end of 2VM created each of 2GB ie 4GB for VMs and 2GB for socket totally 6GB

$ sudo cat /proc/meminfo |grep Huge
AnonHugePages: 0 kB
HugePages_Total: 8
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB

The same scenario carried for without DPDK case of openstack and achieved higher throughput of ~19Gbps, which is contradictory to the expected results. Kindly suggest me what additional DPDK configuration to be done for high throughput. Also tried cpu pinning and multi queue for OpenStack DPDK but no improvement in the result.

Test PC is single NUMA only.I am not doing NIC binding as only trying to validate inter-VM communication in same host. PFB my PC configurations.

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 1212.000
CPU max MHz: 2400.0000
CPU min MHz: 1200.0000
BogoMIPS: 4794.08
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1g b rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_t sc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline _timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ep t vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

I am following INSTALL.DPDK.ADVANCED.md but no clue on low throughput.

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2017-01-10 06:04:42 -0600

darragh-oreilly gravatar image

I see the about the same numbers. Ethtool -k on the non-DPDK VM nics show a lot of offloads that are not available on the DPDK VM nics. One of these is tcp segmentation offload, and with this I see packets of avg size ~60,000 bytes between VMs on the same host. Turn tso off (ethtool -K ens3 tso off), the packet size is ~1500 and the rate drops from about 17 to 2 Gbps.

It seems DPDK/vhostuser does not provide this offload, and the packet size is ~1500. But the packets per second is better.

edit flag offensive delete link more


Thanking you. Now I got clarity in the throughput difference. Yes, by default TSO is enabled in the test VM for without DPDK case.

Rajalakshmi gravatar imageRajalakshmi ( 2017-02-09 05:18:48 -0600 )edit

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower


Asked: 2016-12-22 00:41:05 -0600

Seen: 443 times

Last updated: Jan 10 '17