Begining-To-End DPDK Guide

Introduction

This guide is more of a tutorial that should guide you through all of the steps of installing DPDK and Open vSwitch from the packages built by the Debian Linux. This guide assumes the use of Intel Niantic NIC cards. Mellanox ConnectX3-Pro cards will be supported but are right now largely untested with the packages. This guide builds on the basic one and hopefully provides all of the steps you'll need to setup a working OVS-DPDK with a running VM (also with its own DPDK).

Technical Setup of our system:

  • Using a single DL360gen9, running Debian Linux (currently running the 4.4.7 kernel)

  • In the BIOS, changed a couple of parameters:

    • Power Profile = Maximum Performance (so no C-state and no P-states)

    • Intel Turbo Boost Technology = Disabled

Note about cpu core masks:

  • Remember that the Linux kernel runs on core 0. Of the OVS DPDK applications that are assigned cores, only the EAL should also run on core 0 (but doesn't have to).

  • The Environment Abstraction Layer (EAL) runs on only one core, as specified by the DPDK_OPTS variable in /etc/default/openvswitch-switch. Example: –dpdk -c 0x1means the EAL will be running on core 0.

  • The Poll Mode Drivers (PMDs) can run on a single core or multiple cores. These should run on cores isolated from the kernel and are set by ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6. The pmd-cpu-mask=6 really means 0x6, which is the hex sum of the binary values of 1 and 2. To visually see this, you can use a calculator in programming mode . So in this case, the PMDs would be running on cores 1 and 2. So we will isolate cores 1, and 2 along with their hyperthreaded pairs (on the system I used, these are cores 17 and 18). To see what the pairs are, issue this command:

for i in $(ls /sys/devices/system/cpu/ | grep ^cpu | grep -v freq | grep -v idle); do echo -e "$i" '\t' `cat /sys/devices/system/cpu/$i/topology/thread_siblings_list`; done
  • testpmd can run on a single or multiple cores, like PMDs. The core mask is set in the testpmd command line with option -c 0x7. 0x7 is the hex sum of cores 0, 1, and 2 and the master core is the lowest numbered core. So in this example, core 0 is the master core that is not actually used for the forwarding and only cores 1 and 2 are used for the forwarding.

  • We will also need to isolate the cores that will be given to qemu to run the vm. For the VM we use cores 3, 4, and 5. Even though we will only use cores 3, 4, and 5 for the VM, we will also isolate their HT siblings to keep the kernel from using them (so better performance).

Note about Hypthreading:
The goal with all of this testing is to compare with the results reported in the Intel ONP document and they mention how they tested with HT turned off and on. I did not do any testing with HT explicitly turned off in the BIOS. The only BIOS changes I made are listed above. A full list of low latency tunings - including how to properly turn of HT in the BIOS - can be found in the Configuring and tuning HP ProLiant Servers for low-latency applications whitepaper.
We go ahead and isolate 4 logical cores (2 HT pairs): 1, 2, 17, 18. This is done so that a test can be run with 1, 2, or 4 logical cores as reported in the Intel ONP document.

Step 1: Setup DPDK on the host

Setup Hugepages

Install some foundation packages on the system and set it up to use hugepages. Note that all of the cores you will use need to be on the same NUMA node and will need to be isolated (except core 0 - Linux needs this so don't isolate it). You can get a better idea of which cores are on witch NUMA node with lscpu.

root@hlx:~# apt-get install linux-headers-amd64 openvswitch-switch-dpdk
root@hlx:~# update-alternatives --set ovs-vswitchd /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk
root@hlx:~# vim /etc/default/grub
# Add hugepages and isolcpus to the kernel command line
GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=16 isolcpus=1-5,17-21"
root@hlx:~# update-grub
root@hlx:~# reboot

Additional Setup

echo 0> /proc/sys/kernel/randomize_va_space
echo 0> /proc/sys/net/ipv4/ip_forward
rmmod ipmi_si
rmmod ipmi_msghandler
rmmod ipmi_devintf
rmmod ipc_ich
rmmod bridge

Bind devices and run testpmd

When the system comes back up, you can bind your network ports to the dpdk-compatible driver, and run testpmd. Running testpmd at this point just verifies that DPDK has been setup properly - more as a quick sanity test - but doesn't need to be done every time.

root@hlx:~# servicectl stop openvswitch-switch   # Make sure openvswitch is not running.
root@hlx:~# modprobe igb_uio
root@hlx:~# ls -la /sys/class/net     # shows you mapping of eth devices to bus:function:device
root@hlx:~# dpdk_nic_bind --status
root@hlx:~# dpdk_nic_bind --bind=igb_uio 08:00.0    #  you may need to take the device down from the linux kernel first: 'ip l set dev eth2 down'
root@hlx:~# dpdk_nic_bind --bind=igb_uio 08:00.1
root@hlx:~# testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_ixgbe.so.1.1 -c 0x7 -n 4 -- -i --nb-cores=2 # note that the core mask 0x7 is for cpu cores 0, 1, and 2
...
testpmd>

If you get the testpmd prompt above, then most likely everything went well. You can check the forwarding paths for fun and then just quit. On the DPDK website, you can find afull list of testpmd options.

testpmd> show config fwd
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support disabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
testpmd> quit

Step 2: Configure OVS

Alright, so we're satisfied at this point that DPDK has been setup and we're ready to proceed onto the next step - configuring OVS.

root@hlx:~# cat /etc/default/openvswitch-switch   # export the DPDK_OPTS variable. Anytime you change this, you will need to restart openvswitch-switch
# This is a POSIX shell fragment                -*- sh -*-

# FORCE_COREFILES: If 'yes' then core files will be enabled.
# FORCE_COREFILES=yes

# OVS_CTL_OPTS: Extra options to pass to ovs-ctl.  This is, for example,
# a suitable place to specify --ovs-vswitchd-wrapper=valgrind.
# OVS_CTL_OPTS=
export DPDK_OPTS="--dpdk -c 0x1 -n 2 --socket-mem 4096"   # note that the core mask 0x1 is for the EAL  to run on core 0

Then run:

root@hlx:~# systemctl restart openvswitch-switch
root@hlx:~# mkdir /etc/qemu  # Only need to create /etc/qemu and /etc/qemu/bridge.conf once after installing the packages.
root@hlx:~# touch /etc/qemu/bridge.conf

Adding Ports

Create the bridge and ports. For completeness sake, if ovs-vsctl show does not contain the bridge or the following ports you will have to create them manually (usually just after installing openvswitch-switch) and the bridge ports are persistent on reboots:

root@hlx:~# ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev                      # bridge that the ports belong to
root@hlx:~# ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk                    # dpdk ports talk to the hardware
root@hlx:~# ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
root@hlx:~# ovs-vsctl add-port br0 vhost-user-0 -- set Interface vhost-user-0 type=dpdkvhostuser # vhost ports are virtual and connect
root@hlx:~# ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 type=dpdkvhostuser # to the dpdk ports

Notes from setting up OVS-DPDK using the Mellanox card

I found that when using the Mellanox card that these instructions mentioned in Step 2 were not sufficient. Exporting DPDK_OPTS and restarting the openvswitch-switch service did not allow me to successfully create ports dpdk0 and dpdk1. The error indicated that they were not able to directly communicate with the Mellanox card. The only way I could add these two ports without causing the error was to bind the ports and remove and load the modules before the openvswitch-switch service started AND run testpmd after the openvswitch-switch service started. testpmd appears to trigger some kind of communication between OVS-DPDK and the Mellanox cards that allows for the dpdk ports to be created correctly. 

In short, if you experience issues adding the dpdk0 and dpdk1 ports on the Mellanox cards: 
1) Remove the two ports using the ovs-vsctl del-port command 
2) Make the following additions the /etc/init.d/openvswitch-switch script 
3) Reboot the system and add the ports again. 
In my experience I have had to remove and re-add the dpdk ports after every reboot of the system. Sometimes the ovs-vsctl add and remove commands seem to stall on execution. Ctrll-C'ing out of them appears to work as the ports are still added/removed anyways.

# At the beginning of the file (after the initial comment block):
modprobe uio_pci_generic
dpdk_nic_bind --bind=uio_pci_generic 08:00.0
rmmod mlx4_en
rmmod mlx4_ib
rmmod mlx4_core
modprobe -a mlx4_en mlx4_ib mlx4_core
# Note that this differs slightly than the command above. The system I was on did not have enough RAM to support 8GBs of HugePages
export DPDK_OPTS="--dpdk -c f -n 4 --socket-mem 4096"

...

# At the end of the file (before the exit 0 command):
echo quit | testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_mlx4.so.1.1 -c 0x3 -n 4 -w 0000:${PCI_ID}.0 -w 0000:${PCI_ID}.1 -- --nb-cores=2 -i

OpenFlow Configuration

Setup Open Flow rules

We need to setup some flows on the bridge from ports dpdk0 ↔ vhost-user-0 and dpdk1 ↔ vhost-user-1. Note that the flows will have to be re-setup every time (del-flows and all of the add-flow commands) after openvswitch-switch is restarted.

root@hlx:~# ovs-ofctl show br0
OFPT_FEATURES_REPLY (xid=0x2): dpid:00008cdcd4afe1ec
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(dpdk1): addr:8c:dc:d4:af:e1:ed
     config:     0
     state:      0
     current:    10GB-FD
     supported:  10MB-HD 100MB-HD
     speed: 10000 Mbps now, 100 Mbps max
 2(dpdk0): addr:8c:dc:d4:af:e1:ec
     config:     0
     state:      0
     current:    10GB-FD
     supported:  10MB-HD 100MB-HD
     speed: 10000 Mbps now, 100 Mbps max
 3(vhost-user-0): addr:00:00:00:00:00:00
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 4(vhost-user-1): addr:00:00:00:00:00:00
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br0): addr:8c:dc:d4:af:e1:ec
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
root@hlx:~# ovs-ofctl del-flows br0   # You really do need to delete the flows first because there is a default flow that will cause problems.
root@hlx:~# ovs-ofctl add-flow br0 in_port=1,action=output:4
root@hlx:~# ovs-ofctl add-flow br0 in_port=4,action=output:1
root@hlx:~# ovs-ofctl add-flow br0 in_port=2,action=output:3
root@hlx:~# ovs-ofctl add-flow br0 in_port=3,action=output:2
root@hlx:~# ovs-ofctl dump-flows br0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=20.612s, table=0, n_packets=0, n_bytes=0, idle_age=20, in_port=1 actions=output:4
 cookie=0x0, duration=13.639s, table=0, n_packets=0, n_bytes=0, idle_age=13, in_port=4 actions=output:1
 cookie=0x0, duration=6.645s, table=0, n_packets=0, n_bytes=0, idle_age=6, in_port=2 actions=output:3
 cookie=0x0, duration=1.976s, table=0, n_packets=0, n_bytes=0, idle_age=1, in_port=3 actions=output:2

Which NUMA node should we run on?

We want to make sure to run the VM on the same NUMA node as ovs-vswitchd and as the backing NIC. Based on the export command we added to /etc/default/openvswitch-switch, we see that there is just one 4096 (export DPDK_OPTS=“–dpdk -c 0x1 -n 2 –socket-mem 4096”) so NUMA node 0 has 4G of memory while NUMA node 1 has 0G of memory. Ovs-vswitchd is running on the node that looks more heavily consumed via the numastat command. In this case, node 0 is being more heavily used and so we'll run our VM on that node. Also know that there is a known performance issue when running on NUMA node 1 and so it is recommended that all testing be run on NUMA node 0.

To determine which NUMA node a PCI device (NIC) is on, you can cat /sys/class/net/eth<#>/device/numa_node to see either a 0 or 1.

root@hlx:~# ps -ef | grep ovs-vswitchd
root      2308  2307 96 14:02 ?        00:00:03 ovs-vswitchd --dpdk -c 0x1 -n 2 --socket-mem 4096 4096 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
root      2364  2176  0 14:02 pts/0    00:00:00 grep ovs-vswitchd
root@hlx:~# numastat 2308

Per-node process memory usage (in MBs) for PID 2308 (ovs-vswitchd)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                      4096.00         4096.00         8192.00
Heap                         9.01            0.00            9.01
Stack                      160.13            0.00          160.13
Private                      8.99            5.23           14.23
----------------  --------------- --------------- ---------------
Total                     4274.13         4101.23         8375.37

Step 3: Running a VM on the host

Initial VM Setup

For my VM image, I just created a blank 10G qcow2 image on my workstation (called hlx-vm) and installed HPE Linux:

$ qemu-img create -f qcow2 hlx-vm 10G
$ qemu-system-x86_64 -enable-kvm -smp 4 -cdrom /work/vms/hLinux-cattleprod-amd64-guided-netinst-20160418.iso -boot d hlx-vm

Note that I created the VM first on my workstation because my system under test doesn't have a desktop environment, which is needed for the pop up qemu window to go through the Linux install.

root@hlx:~# cat ifup-my_tap    #create this little script to setup the linux bridge 
#!/bin/sh
/sbin/brctl addif virbr0 $1
/sbin/ifconfig $1 0.0.0.0 promisc up
root@hlx:~# chmod +x ifup-my_tap
root@hlx:~# qemu-system-x86_64 -smp 4 -hda hlx-vm -device virtio-net-pci,netdev=net0,mac=ba:d5:ca:0c:a3:d5 -netdev type=tap,id=net0,ifname=my_tap,script=ifup-my_tap -snapshot
root@hlx:~# arp -an | grep virbr0   # to find out what the ipv4 address is of the VM so that you can ssh to it
? (192.168.122.168) at ba:d5:ca:0c:a3:d5 [ether] on virbr0

Here is the VM kernel command line, set in /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=2 isolcpus=1,2,3"

After you set the kernel command line, don't forget to 'update-grub' so that it takes effect on next boot.

Be sure to set a static ip address, for example:

me@vm:~$ cat /etc/network/interfaces
source /etc/network/interfaces.d/*
auto lo
iface lo inet loopback
allow-hotplug eth0
iface eth0 inet static
	address 192.168.122.155
	gateway 192.168.122.1
	netmask 255.255.255.0

Run the VM

Ok, now we're ready to run the VM with qemu_linux.pl using the following command. Note that we're using node 0 because of what node we found ovs-vswitchd to be running on. Also note the veth-addr: this can be whatever subnet you want to set. Just make sure that subnet ip isn't being used by any of your other devices.

root@hlx:~/scripts# perl scripts/qemu_linux.pl --isoloc hLinux-cattleprod-amd64-blaster-netinst-20160603-hos4.0_alpha5-hlm.iso --imgloc ../hlx-vm --memory-gb 4 --veth-addr 192.168.122.1/24 --veth-name-root myveth --mgmt-attach-to-bridge mgmt-br --vhostuser-sock vhost-user-0,00:00:00:00:00:01 --vhostuser-sock vhost-user-1,00:00:00:00:00:02 --use-hugepage-backend yes --cores 4 --numa-node 0 --core-list 3,4,5,6

Keep in mind that you may need to tailor the above command to match your VM's name, the bridge name, the vhost user socket names, and such.

Troubleshooting

Now, if all goes well, this command seems to hang your terminal and not return a prompt. If all does not go well, there are some troubleshooting things you can do. When the script is running successfully, you should see a few new devices:

root@hlx:~# ip a
...
14: ovs-netdev:  mtu 1500 qdisc noop state DOWN group default qlen 500
    link/ether da:78:c3:ad:4b:6e brd ff:ff:ff:ff:ff:ff
15: br0:  mtu 1500 qdisc noop state DOWN group default qlen 500
    link/ether 8c:dc:d4:af:e1:ec brd ff:ff:ff:ff:ff:ff
16: mgmt-br:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 42:97:5d:e1:cd:f0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::4097:5dff:fee1:cdf0/64 scope link
       valid_lft forever preferred_lft forever
17: myveth1@myveth0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 3a:ee:49:2b:6d:a4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 scope global myveth1
       valid_lft forever preferred_lft forever
18: myveth0@myveth1:  mtu 1500 qdisc noqueue master mgmt-br state LOWERLAYERDOWN group default qlen 1000
    link/ether 42:97:5d:e1:cd:f0 brd ff:ff:ff:ff:ff:ff
19: tap0:  mtu 1500 qdisc noqueue master mgmt-br state UNKNOWN group default qlen 500
    link/ether fe:54:ea:fb:12:5c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:eaff:fefb:125c/64 scope link
       valid_lft forever preferred_lft forever

If mgmt-br or myveth1@myveth0 are down, you can up the interface:

root@hlx:~# ip l set dev myveth1 up

Also make sure that you have…

  • loaded the igb_uio module

  • bound the host's network devices with dpdk_nic_bind

  • exported the Perl libraries (PERL5LIB) while in the scripts directory

  • have hugepages (cat /proc/meminfo) and they're mounted (cat /proc/mounts)

  • /etc/qemu/bridge.conf should exist, even if it's an empty file

Step 4: Running DPDK (testpmd) on the VM

Installing DPDK and its dependencies

Once we have the VM up and running, we need to install DPDK so that we can use testpmd to forward packets. The problem is that your VM can't see the outside world to be able to add a repo and install from it. So, what you can do is grab the ovs-dpdk-16.04 test repo tarball and get it onto the VM (scp it from the host). Once you unpack it on the VM, you can install from that ovs-dpdk-16.04 directory by adding a line to your /etc/sources.list like this:

deb file:///home/me/ovs-dpdk-16.04 cattleprod main

Then you should be able to install DPDK (from the test repo) and install the DPDK dependencies (from the blaster iso):

root@vm:~# apt-cdrom add    # we need this so that apt recognizes the iso we loaded when we started the VM
root@vm:~# apt-get update
root@vm:~# apt-get install dpdk

Bind the eth devices

At this point, you should have DPDK installed in the VM and you're ready to use it! Woo hoo!

You should have 3 devices visible in the VM: eth0 (tied to myveth1), eth1, and eth2. eth1 and eth1 are the two virtio devices that are tied to the vhost user sockets.

root@vm:~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.115/24 brd 192.168.122.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe12:3456/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff
4: eth2:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:02 brd ff:ff:ff:ff:ff:ff

We need to load the DPDK-compatible driver and bind our devices:

root@vm:~# modprobe igb_uio
root@vm:~# dpdk_nic_bind --status   # to see what your devices are called
root@vm:~# dpdk_nic_bind --bind=igb_uio 00:04.0
root@vm:~# dpdk_nic_bind --bind=igb_uio 00:05.0

Testpmd in the VM

Now we're ready to run testpmd, making sure to point to the virtio driver:

root@vm:~# testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_virtio.so.1.1 -c 0x7 -n 4 -w 0000:00:04.0 -w 0000:00:05.0 -- --burst=64 --disable-hw-vlan --txd=2048 --rxd=2048 --txqflags=0xf00 -i   # note that the cpu mask of 0x7 means the EAL will run on core 0 and the forwarding will be done with cores 1 and 2
...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Port 1 Link Up - speed 10000 Mbps - full-duplex
Done
testpmd> set fwd mac_retry
Set mac_retry packet forwarding mode
testpmd> start
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0xf00

This should be forwarding traffic from one port to the other and visa versa:

testpmd> show config fwd
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support disabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
testpmd> set fwd mac_retry
testpmd>

Setting the Core for the Testpmd Process

Make sure testpmd process is running on a different core than all other qemu processes 
I've had issues with all qemu processes running on one core, even though I gave the qemu 4 cores: 3,4,5,6 (see the command where we actually run the VM). In order to obtain descent throughput with the packet forwarding in the VM, we need to manually change which core (on the host) the testpmd process (so the qemu process) is running on. Simply giving taskset a core range is not good enough. For this task, the htop package is extremely valuable to be able to see which of the host's cpu cores are being heavily used.

Once the VM is running, on the host take a look at all of the qemu parent and child processes running:

root@hlx:~# ps -eLF | grep -i qemu | less -S
root      4308  3417  4308  0    1 13400 16000  28 11:00 pts/1    00:00:00 perl scripts/qemu_linux.pl --imgloc ../../hlx-gold-small --memory
root      4331  4308  4331  0    1  1084   724  26 11:00 pts/1    00:00:00 sh -c numactl --membind=0  -- taskset -c 3,4,5,6 sudo kvm -boot o
root      4335  4331  4335  0    1 12772  3704   3 11:00 pts/1    00:00:00 sudo kvm -boot order=cd -cpu host -vnc :5 -m 4096 -name hlinux qe
root      4338  4335  4338  0    7 1212308 56548 3 11:00 pts/1    00:00:01 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4339  0    7 1212308 56548 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4348  2    7 1212308 56548 3 11:00 pts/1    00:00:09 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4349  0    7 1212308 56548 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4350  0    7 1212308 56548 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4351  0    7 1212308 56548 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4353  0    7 1212308 56548 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4366  3431  4366  0    1  3183  2208   0 11:07 pts/5    00:00:00 grep qemu

The 9th column shows what cpu core each process is running on. All of the qemu-system-x86_64 processes are running on core 3.

Now start testpmd in the VM and begin the forwarding:

root@vm:~# testpmd -d /usr/lib/x86_64-linux-gnu/dpdk/librte_pmd_virtio.so.1.1 -c 0x7 -n 4 -w 0000:00:04.0 -w 0000:00:05.0 -- --burst=64 --disable-hw-vlan --txd=2048 --rxd=2048 --txqflags=0xf00 -i
...
testpmd> start

Then you can see the process that has started a clock on the host:

root@hlx:~# ps -eLF | grep -i qemu | less -S
root      4308  3417  4308  0    1 13400 16000  28 11:00 pts/1    00:00:00 perl scripts/qemu_linux.pl --imgloc ../../hlx-gold-small --memory
root      4331  4308  4331  0    1  1084   724  26 11:00 pts/1    00:00:00 sh -c numactl --membind=0  -- taskset -c 3,4,5,6 sudo kvm -boot o
root      4335  4331  4335  0    1 12772  3704   3 11:00 pts/1    00:00:00 sudo kvm -boot order=cd -cpu host -vnc :5 -m 4096 -name hlinux qe
root      4338  4335  4338  0    7 1212308 51460 3 11:00 pts/1    00:00:01 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4339  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4348  1    7 1212308 51460 3 11:00 pts/1    00:00:10 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4349  7    7 1212308 51460 3 11:00 pts/1    00:00:51 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4350  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4351  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4353  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4376  3431  4376  0    1  3183  2208   0 11:11 pts/5    00:00:00 grep qemu

Process 4349 has been running for 51 seconds, the same amount of time since I typed start at the testpmd prompt in the VM. Now we know which child qemu process on the host corresponds to the packet forwarding in the VM, let's have it run on a different core than 3:

root@hlx:~# taskset -pc 4 4349
pid 4349's current affinity list: 3-6
pid 4349's new affinity list: 4

Now we can see that that very process is running on core 4:

root@hlx:~# ps -eLF | grep -i qemu | less -S
root      4308  3417  4308  0    1 13400 16000  28 11:00 pts/1    00:00:00 perl scripts/qemu_linux.pl --imgloc ../../hlx-gold-small --memory
root      4331  4308  4331  0    1  1084   724  26 11:00 pts/1    00:00:00 sh -c numactl --membind=0  -- taskset -c 3,4,5,6 sudo kvm -boot o
root      4335  4331  4335  0    1 12772  3704   3 11:00 pts/1    00:00:00 sudo kvm -boot order=cd -cpu host -vnc :5 -m 4096 -name hlinux qe
root      4338  4335  4338  0    7 1212308 51460 3 11:00 pts/1    00:00:01 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4339  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4348  1    7 1212308 51460 3 11:00 pts/1    00:00:10 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4349 28    7 1212308 51460 4 11:00 pts/1    00:03:58 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4350  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4351  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4338  4335  4353  0    7 1212308 51460 3 11:00 pts/1    00:00:00 qemu-system-x86_64 -enable-kvm -boot order=cd -cpu host -vnc :5 -
root      4380  3431  4380  0    1  3183  2208  23 11:14 pts/5    00:00:00 grep qemu

Now the load is better distributed. In my case, I was doing throughput and latency testing using a network traffic generator (Spirent TestCenter) and setting the testpmd process to run on a different core was absolutely critical to getting decent throughput results.

At this point you're ready to configure Spirent to send and receive traffic over these dpdk-bound ports. To do this, continue onto Configuring Spirent.

你可能感兴趣的:(DPDK,QEMU,linux,debian,运维)