This guide is more of a tutorial that should guide you through all of the steps of installing DPDK and Open vSwitch from the packages built by the Debian Linux. This guide assumes the use of Intel Niantic NIC cards. Mellanox ConnectX3-Pro cards will be supported but are right now largely untested with the packages. This guide builds on the basic one and hopefully provides all of the steps you'll need to setup a working OVS-DPDK with a running VM (also with its own DPDK).
Technical Setup of our system:
Using a single DL360gen9, running Debian Linux (currently running the 4.4.7 kernel)
In the BIOS, changed a couple of parameters:
Power Profile = Maximum Performance (so no C-state and no P-states)
Intel Turbo Boost Technology = Disabled
Note about cpu core masks:
Remember that the Linux kernel runs on core 0. Of the OVS DPDK applications that are assigned cores, only the EAL should also run on core 0 (but doesn't have to).
The Environment Abstraction Layer (EAL) runs on only one core, as specified by the DPDK_OPTS variable in /etc/default/openvswitch-switch. Example: –dpdk -c 0x1
means the EAL will be running on core 0.
The Poll Mode Drivers (PMDs) can run on a single core or multiple cores. These should run on cores isolated from the kernel and are set by ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
. The pmd-cpu-mask=6 really means 0x6, which is the hex sum of the binary values of 1 and 2. To visually see this, you can use a calculator in programming mode . So in this case, the PMDs would be running on cores 1 and 2. So we will isolate cores 1, and 2 along with their hyperthreaded pairs (on the system I used, these are cores 17 and 18). To see what the pairs are, issue this command:
for i in $(ls /sys/devices/system/cpu/ | grep ^cpu | grep -v freq | grep -v idle); do echo -e "$i" '\t' `cat /sys/devices/system/cpu/$i/topology/thread_siblings_list`; done
testpmd can run on a single or multiple cores, like PMDs. The core mask is set in the testpmd command line with option -c 0x7
. 0x7 is the hex sum of cores 0, 1, and 2 and the master core is the lowest numbered core. So in this example, core 0 is the master core that is not actually used for the forwarding and only cores 1 and 2 are used for the forwarding.
We will also need to isolate the cores that will be given to qemu to run the vm. For the VM we use cores 3, 4, and 5. Even though we will only use cores 3, 4, and 5 for the VM, we will also isolate their HT siblings to keep the kernel from using them (so better performance).
Note about Hypthreading:
The goal with all of this testing is to compare with the results reported in the Intel ONP document and they mention how they tested with HT turned off and on. I did not do any testing with HT explicitly turned off in the BIOS. The only BIOS changes I made are listed above. A full list of low latency tunings - including how to properly turn of HT in the BIOS - can be found in the Configuring and tuning HP ProLiant Servers for low-latency applications whitepaper.
We go ahead and isolate 4 logical cores (2 HT pairs): 1, 2, 17, 18. This is done so that a test can be run with 1, 2, or 4 logical cores as reported in the Intel ONP document.
Install some foundation packages on the system and set it up to use hugepages. Note that all of the cores you will use need to be on the same NUMA node and will need to be isolated (except core 0 - Linux needs this so don't isolate it). You can get a better idea of which cores are on witch NUMA node with lscpu
.
root@hlx:~# apt-get install linux-headers-amd64 openvswitch-switch-dpdk root@hlx:~# update-alternatives --set ovs-vswitchd /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk root@hlx:~# vim /etc/default/grub # Add hugepages and isolcpus to the kernel command line GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=16 isolcpus=1-5,17-21" root@hlx:~# update-grub root@hlx:~# reboot
echo 0> /proc/sys/kernel/randomize_va_space echo 0> /proc/sys/net/ipv4/ip_forward rmmod ipmi_si rmmod ipmi_msghandler rmmod ipmi_devintf rmmod ipc_ich rmmod bridge
When the system comes back up, you can bind your network ports to the dpdk-compatible driver, and run testpmd. Running testpmd at this point just verifies that DPDK has been setup properly - more as a quick sanity test - but doesn't need to be done every time.
root@hlx:~# servicectl stop openvswitch-switch # Make sure openvswitch is not running. root@hlx:~# modprobe igb_uio root@hlx:~# ls -la /sys/class/net # shows you mapping of eth devices to bus:function:device root@hlx:~# dpdk_nic_bind --status root@hlx:~# dpdk_nic_bind --bind=igb_uio 08:00.0 # you may need to take the device down from the linux kernel first: 'ip l set dev eth2 down' root@hlx:~# dpdk_nic_bind --bind=igb_uio 08:00.1 root@hlx:~# testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_ixgbe.so.1.1 -c 0x7 -n 4 -- -i --nb-cores=2 # note that the core mask 0x7 is for cpu cores 0, 1, and 2 ... testpmd>
If you get the testpmd prompt above, then most likely everything went well. You can check the forwarding paths for fun and then just quit. On the DPDK website, you can find afull list of testpmd options.
testpmd> show config fwd io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support disabled, MP over anonymous pages disabled Logical Core 1 (socket 0) forwards packets on 2 streams: RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01 RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 testpmd> quit
Alright, so we're satisfied at this point that DPDK has been setup and we're ready to proceed onto the next step - configuring OVS.
root@hlx:~# cat /etc/default/openvswitch-switch # export the DPDK_OPTS variable. Anytime you change this, you will need to restart openvswitch-switch # This is a POSIX shell fragment -*- sh -*- # FORCE_COREFILES: If 'yes' then core files will be enabled. # FORCE_COREFILES=yes # OVS_CTL_OPTS: Extra options to pass to ovs-ctl. This is, for example, # a suitable place to specify --ovs-vswitchd-wrapper=valgrind. # OVS_CTL_OPTS= export DPDK_OPTS="--dpdk -c 0x1 -n 2 --socket-mem 4096" # note that the core mask 0x1 is for the EAL to run on core 0
Then run:
root@hlx:~# systemctl restart openvswitch-switch root@hlx:~# mkdir /etc/qemu # Only need to create /etc/qemu and /etc/qemu/bridge.conf once after installing the packages. root@hlx:~# touch /etc/qemu/bridge.conf
Create the bridge and ports. For completeness sake, if ovs-vsctl show
does not contain the bridge or the following ports you will have to create them manually (usually just after installing openvswitch-switch) and the bridge ports are persistent on reboots:
root@hlx:~# ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev # bridge that the ports belong to root@hlx:~# ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk # dpdk ports talk to the hardware root@hlx:~# ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk root@hlx:~# ovs-vsctl add-port br0 vhost-user-0 -- set Interface vhost-user-0 type=dpdkvhostuser # vhost ports are virtual and connect root@hlx:~# ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 type=dpdkvhostuser # to the dpdk ports
I found that when using the Mellanox card that these instructions mentioned in Step 2 were not sufficient. Exporting DPDK_OPTS
and restarting the openvswitch-switch
service did not allow me to successfully create ports dpdk0
and dpdk1
. The error indicated that they were not able to directly communicate with the Mellanox card. The only way I could add these two ports without causing the error was to bind the ports and remove and load the modules before the openvswitch-switch
service started AND run testpmd
after the openvswitch-switch
service started. testpmd
appears to trigger some kind of communication between OVS-DPDK and the Mellanox cards that allows for the dpdk ports to be created correctly.
In short, if you experience issues adding the dpdk0
and dpdk1
ports on the Mellanox cards:
1) Remove the two ports using the ovs-vsctl del-port
command
2) Make the following additions the /etc/init.d/openvswitch-switch
script
3) Reboot the system and add the ports again.
In my experience I have had to remove and re-add the dpdk ports after every reboot of the system. Sometimes the ovs-vsctl
add and remove commands seem to stall on execution. Ctrll-C'ing out of them appears to work as the ports are still added/removed anyways.
# At the beginning of the file (after the initial comment block): modprobe uio_pci_generic dpdk_nic_bind --bind=uio_pci_generic 08:00.0 rmmod mlx4_en rmmod mlx4_ib rmmod mlx4_core modprobe -a mlx4_en mlx4_ib mlx4_core # Note that this differs slightly than the command above. The system I was on did not have enough RAM to support 8GBs of HugePages export DPDK_OPTS="--dpdk -c f -n 4 --socket-mem 4096" ... # At the end of the file (before the exit 0 command): echo quit | testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_mlx4.so.1.1 -c 0x3 -n 4 -w 0000:${PCI_ID}.0 -w 0000:${PCI_ID}.1 -- --nb-cores=2 -i
We need to setup some flows on the bridge from ports dpdk0 ↔ vhost-user-0 and dpdk1 ↔ vhost-user-1. Note that the flows will have to be re-setup every time (del-flows and all of the add-flow commands) after openvswitch-switch is restarted.
root@hlx:~# ovs-ofctl show br0 OFPT_FEATURES_REPLY (xid=0x2): dpid:00008cdcd4afe1ec n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(dpdk1): addr:8c:dc:d4:af:e1:ed config: 0 state: 0 current: 10GB-FD supported: 10MB-HD 100MB-HD speed: 10000 Mbps now, 100 Mbps max 2(dpdk0): addr:8c:dc:d4:af:e1:ec config: 0 state: 0 current: 10GB-FD supported: 10MB-HD 100MB-HD speed: 10000 Mbps now, 100 Mbps max 3(vhost-user-0): addr:00:00:00:00:00:00 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 4(vhost-user-1): addr:00:00:00:00:00:00 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max LOCAL(br0): addr:8c:dc:d4:af:e1:ec config: PORT_DOWN state: LINK_DOWN current: 10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 root@hlx:~# ovs-ofctl del-flows br0 # You really do need to delete the flows first because there is a default flow that will cause problems. root@hlx:~# ovs-ofctl add-flow br0 in_port=1,action=output:4 root@hlx:~# ovs-ofctl add-flow br0 in_port=4,action=output:1 root@hlx:~# ovs-ofctl add-flow br0 in_port=2,action=output:3 root@hlx:~# ovs-ofctl add-flow br0 in_port=3,action=output:2 root@hlx:~# ovs-ofctl dump-flows br0 NXST_FLOW reply (xid=0x4): cookie=0x0, duration=20.612s, table=0, n_packets=0, n_bytes=0, idle_age=20, in_port=1 actions=output:4 cookie=0x0, duration=13.639s, table=0, n_packets=0, n_bytes=0, idle_age=13, in_port=4 actions=output:1 cookie=0x0, duration=6.645s, table=0, n_packets=0, n_bytes=0, idle_age=6, in_port=2 actions=output:3 cookie=0x0, duration=1.976s, table=0, n_packets=0, n_bytes=0, idle_age=1, in_port=3 actions=output:2
We want to make sure to run the VM on the same NUMA node as ovs-vswitchd and as the backing NIC. Based on the export command we added to /etc/default/openvswitch-switch, we see that there is just one 4096 (export DPDK_OPTS=“–dpdk -c 0x1 -n 2 –socket-mem 4096”) so NUMA node 0 has 4G of memory while NUMA node 1 has 0G of memory. Ovs-vswitchd is running on the node that looks more heavily consumed via the numastat command. In this case, node 0 is being more heavily used and so we'll run our VM on that node. Also know that there is a known performance issue when running on NUMA node 1 and so it is recommended that all testing be run on NUMA node 0.
To determine which NUMA node a PCI device (NIC) is on, you can cat /sys/class/net/eth<#>/device/numa_node
to see either a 0 or 1.
root@hlx:~# ps -ef | grep ovs-vswitchd root 2308 2307 96 14:02 ? 00:00:03 ovs-vswitchd --dpdk -c 0x1 -n 2 --socket-mem 4096 4096 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor root 2364 2176 0 14:02 pts/0 00:00:00 grep ovs-vswitchd root@hlx:~# numastat 2308 Per-node process memory usage (in MBs) for PID 2308 (ovs-vswitchd) Node 0 Node 1 Total --------------- --------------- --------------- Huge 4096.00 4096.00 8192.00 Heap 9.01 0.00 9.01 Stack 160.13 0.00 160.13 Private 8.99 5.23 14.23 ---------------- --------------- --------------- --------------- Total 4274.13 4101.23 8375.37
For my VM image, I just created a blank 10G qcow2 image on my workstation (called hlx-vm) and installed HPE Linux:
$ qemu-img create -f qcow2 hlx-vm 10G $ qemu-system-x86_64 -enable-kvm -smp 4 -cdrom /work/vms/hLinux-cattleprod-amd64-guided-netinst-20160418.iso -boot d hlx-vm
Note that I created the VM first on my workstation because my system under test doesn't have a desktop environment, which is needed for the pop up qemu window to go through the Linux install.
root@hlx:~# cat ifup-my_tap #create this little script to setup the linux bridge #!/bin/sh /sbin/brctl addif virbr0 $1 /sbin/ifconfig $1 0.0.0.0 promisc up root@hlx:~# chmod +x ifup-my_tap root@hlx:~# qemu-system-x86_64 -smp 4 -hda hlx-vm -device virtio-net-pci,netdev=net0,mac=ba:d5:ca:0c:a3:d5 -netdev type=tap,id=net0,ifname=my_tap,script=ifup-my_tap -snapshot root@hlx:~# arp -an | grep virbr0 # to find out what the ipv4 address is of the VM so that you can ssh to it ? (192.168.122.168) at ba:d5:ca:0c:a3:d5 [ether] on virbr0
Here is the VM kernel command line, set in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=2 isolcpus=1,2,3"
After you set the kernel command line, don't forget to 'update-grub' so that it takes effect on next boot.
Be sure to set a static ip address, for example:
me@vm:~$ cat /etc/network/interfaces source /etc/network/interfaces.d/* auto lo iface lo inet loopback allow-hotplug eth0 iface eth0 inet static address 192.168.122.155 gateway 192.168.122.1 netmask 255.255.255.0
Run the VM
Ok, now we're ready to run the VM with qemu_linux.pl using the following command. Note that we're using node 0 because of what node we found ovs-vswitchd to be running on. Also note the veth-addr: this can be whatever subnet you want to set. Just make sure that subnet ip isn't being used by any of your other devices.
root@hlx:~/scripts# perl scripts/qemu_linux.pl --isoloc hLinux-cattleprod-amd64-blaster-netinst-20160603-hos4.0_alpha5-hlm.iso --imgloc ../hlx-vm --memory-gb 4 --veth-addr 192.168.122.1/24 --veth-name-root myveth --mgmt-attach-to-bridge mgmt-br --vhostuser-sock vhost-user-0,00:00:00:00:00:01 --vhostuser-sock vhost-user-1,00:00:00:00:00:02 --use-hugepage-backend yes --cores 4 --numa-node 0 --core-list 3,4,5,6
Keep in mind that you may need to tailor the above command to match your VM's name, the bridge name, the vhost user socket names, and such.
Now, if all goes well, this command seems to hang your terminal and not return a prompt. If all does not go well, there are some troubleshooting things you can do. When the script is running successfully, you should see a few new devices:
root@hlx:~# ip a ... 14: ovs-netdev:mtu 1500 qdisc noop state DOWN group default qlen 500 link/ether da:78:c3:ad:4b:6e brd ff:ff:ff:ff:ff:ff 15: br0: mtu 1500 qdisc noop state DOWN group default qlen 500 link/ether 8c:dc:d4:af:e1:ec brd ff:ff:ff:ff:ff:ff 16: mgmt-br: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 42:97:5d:e1:cd:f0 brd ff:ff:ff:ff:ff:ff inet6 fe80::4097:5dff:fee1:cdf0/64 scope link valid_lft forever preferred_lft forever 17: myveth1@myveth0: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 3a:ee:49:2b:6d:a4 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 scope global myveth1 valid_lft forever preferred_lft forever 18: myveth0@myveth1: mtu 1500 qdisc noqueue master mgmt-br state LOWERLAYERDOWN group default qlen 1000 link/ether 42:97:5d:e1:cd:f0 brd ff:ff:ff:ff:ff:ff 19: tap0: mtu 1500 qdisc noqueue master mgmt-br state UNKNOWN group default qlen 500 link/ether fe:54:ea:fb:12:5c brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:eaff:fefb:125c/64 scope link valid_lft forever preferred_lft forever
If mgmt-br or myveth1@myveth0 are down, you can up the interface:
root@hlx:~# ip l set dev myveth1 up
Also make sure that you have…
loaded the igb_uio module
bound the host's network devices with dpdk_nic_bind
exported the Perl libraries (PERL5LIB) while in the scripts directory
have hugepages (cat /proc/meminfo) and they're mounted (cat /proc/mounts)
/etc/qemu/bridge.conf should exist, even if it's an empty file
Once we have the VM up and running, we need to install DPDK so that we can use testpmd to forward packets. The problem is that your VM can't see the outside world to be able to add a repo and install from it. So, what you can do is grab the ovs-dpdk-16.04 test repo tarball and get it onto the VM (scp it from the host). Once you unpack it on the VM, you can install from that ovs-dpdk-16.04 directory by adding a line to your /etc/sources.list like this:
deb file:///home/me/ovs-dpdk-16.04 cattleprod main
Then you should be able to install DPDK (from the test repo) and install the DPDK dependencies (from the blaster iso):
root@vm:~# apt-cdrom add # we need this so that apt recognizes the iso we loaded when we started the VM root@vm:~# apt-get update root@vm:~# apt-get install dpdk
At this point, you should have DPDK installed in the VM and you're ready to use it! Woo hoo!
You should have 3 devices visible in the VM: eth0 (tied to myveth1), eth1, and eth2. eth1 and eth1 are the two virtio devices that are tied to the vhost user sockets.
root@vm:~# ip a 1: lo:mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff inet 192.168.122.115/24 brd 192.168.122.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe12:3456/64 scope link valid_lft forever preferred_lft forever 3: eth1: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff 4: eth2: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:02 brd ff:ff:ff:ff:ff:ff
We need to load the DPDK-compatible driver and bind our devices:
root@vm:~# modprobe igb_uio root@vm:~# dpdk_nic_bind --status # to see what your devices are called root@vm:~# dpdk_nic_bind --bind=igb_uio 00:04.0 root@vm:~# dpdk_nic_bind --bind=igb_uio 00:05.0
Now we're ready to run testpmd, making sure to point to the virtio driver:
root@vm:~# testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_virtio.so.1.1 -c 0x7 -n 4 -w 0000:00:04.0 -w 0000:00:05.0 -- --burst=64 --disable-hw-vlan --txd=2048 --rxd=2048 --txqflags=0xf00 -i # note that the cpu mask of 0x7 means the EAL will run on core 0 and the forwarding will be done with cores 1 and 2 ... Port 0 Link Up - speed 10000 Mbps - full-duplex Port 1 Link Up - speed 10000 Mbps - full-duplex Done testpmd> set fwd mac_retry Set mac_retry packet forwarding mode testpmd> start io packet forwarding - CRC stripping disabled - packets/burst=32 nb forwarding cores=1 - nb forwarding ports=2 RX queues=1 - RX desc=128 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX queues=1 - TX desc=512 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX RS bit threshold=0 - TXQ flags=0xf00
This should be forwarding traffic from one port to the other and visa versa:
testpmd> show config fwd io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support disabled, MP over anonymous pages disabled Logical Core 1 (socket 0) forwards packets on 2 streams: RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01 RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 testpmd> set fwd mac_retry testpmd>
Make sure testpmd process is running on a different core than all other qemu processes
I've had issues with all qemu processes running on one core, even though I gave the qemu 4 cores: 3,4,5,6 (see the command where we actually run the VM). In order to obtain descent throughput with the packet forwarding in the VM, we need to manually change which core (on the host) the testpmd process (so the qemu process) is running on. Simply giving taskset a core range is not good enough. For this task, the htop package is extremely valuable to be able to see which of the host's cpu cores are being heavily used.
Once the VM is running, on the host take a look at all of the qemu parent and child processes running:
root@hlx:~# ps -eLF | grep -i qemu | less -S root 4308 3417 4308 0 1 13400 16000 28 11:00 pts/1 00:00:00 perl scripts/qemu_linux.pl --imgloc ../../hlx-gold-small --memory root 4331 4308 4331 0 1 1084 724 26 11:00 pts/1 00:00:00 sh -c numactl --membind=0 -- taskset -c 3,4,5,6 sudo kvm -boot o root 4335 4331 4335 0 1 12772 3704 3 11:00 pts/1 00:00:00 sudo kvm -boot order=cd -cpu host -vnc :5 -m 4096 -name hlinux qe root 4338 4335 4338 0 7 1212308 56548 3 11:00 pts/1 00:00:01 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4339 0 7 1212308 56548 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4348 2 7 1212308 56548 3 11:00 pts/1 00:00:09 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4349 0 7 1212308 56548 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4350 0 7 1212308 56548 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4351 0 7 1212308 56548 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4353 0 7 1212308 56548 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4366 3431 4366 0 1 3183 2208 0 11:07 pts/5 00:00:00 grep qemu
The 9th column shows what cpu core each process is running on. All of the qemu-system-x86_64 processes are running on core 3.
Now start testpmd in the VM and begin the forwarding:
root@vm:~# testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_virtio.so.1.1 -c 0x7 -n 4 -w 0000:00:04.0 -w 0000:00:05.0 -- --burst=64 --disable-hw-vlan --txd=2048 --rxd=2048 --txqflags=0xf00 -i ... testpmd> start
Then you can see the process that has started a clock on the host:
root@hlx:~# ps -eLF | grep -i qemu | less -S root 4308 3417 4308 0 1 13400 16000 28 11:00 pts/1 00:00:00 perl scripts/qemu_linux.pl --imgloc ../../hlx-gold-small --memory root 4331 4308 4331 0 1 1084 724 26 11:00 pts/1 00:00:00 sh -c numactl --membind=0 -- taskset -c 3,4,5,6 sudo kvm -boot o root 4335 4331 4335 0 1 12772 3704 3 11:00 pts/1 00:00:00 sudo kvm -boot order=cd -cpu host -vnc :5 -m 4096 -name hlinux qe root 4338 4335 4338 0 7 1212308 51460 3 11:00 pts/1 00:00:01 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4339 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4348 1 7 1212308 51460 3 11:00 pts/1 00:00:10 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4349 7 7 1212308 51460 3 11:00 pts/1 00:00:51 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4350 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4351 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4353 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4376 3431 4376 0 1 3183 2208 0 11:11 pts/5 00:00:00 grep qemu
Process 4349 has been running for 51 seconds, the same amount of time since I typed start at the testpmd prompt in the VM. Now we know which child qemu process on the host corresponds to the packet forwarding in the VM, let's have it run on a different core than 3:
root@hlx:~# taskset -pc 4 4349 pid 4349's current affinity list: 3-6 pid 4349's new affinity list: 4
Now we can see that that very process is running on core 4:
root@hlx:~# ps -eLF | grep -i qemu | less -S root 4308 3417 4308 0 1 13400 16000 28 11:00 pts/1 00:00:00 perl scripts/qemu_linux.pl --imgloc ../../hlx-gold-small --memory root 4331 4308 4331 0 1 1084 724 26 11:00 pts/1 00:00:00 sh -c numactl --membind=0 -- taskset -c 3,4,5,6 sudo kvm -boot o root 4335 4331 4335 0 1 12772 3704 3 11:00 pts/1 00:00:00 sudo kvm -boot order=cd -cpu host -vnc :5 -m 4096 -name hlinux qe root 4338 4335 4338 0 7 1212308 51460 3 11:00 pts/1 00:00:01 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4339 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4348 1 7 1212308 51460 3 11:00 pts/1 00:00:10 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4349 28 7 1212308 51460 4 11:00 pts/1 00:03:58 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4350 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4351 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4338 4335 4353 0 7 1212308 51460 3 11:00 pts/1 00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 - root 4380 3431 4380 0 1 3183 2208 23 11:14 pts/5 00:00:00 grep qemu
Now the load is better distributed. In my case, I was doing throughput and latency testing using a network traffic generator (Spirent TestCenter) and setting the testpmd process to run on a different core was absolutely critical to getting decent throughput results.
At this point you're ready to configure Spirent to send and receive traffic over these dpdk-bound ports. To do this, continue onto Configuring Spirent.