dpdk 多进程模型对 pmd 驱动实现的要求

前言

dpdk 多进程模型支持在多个进程间共享网卡硬件资源,一般会在 primary 进程中进行网卡初始化, secondary 进程中不再进行初始化,仅仅执行必要的本地化逻辑。

实现这一功能的基础是基于大页内存的数据共享以及不同类型进程的不同初始化逻辑。实际测试发现不同的 pmd 驱动多多进程模型的支持情况不太一样,有些能够正常工作,有些却不能正常工作。这些不能正常工作的驱动,存在的主要问题是访问共享数据中的本地内容如本地数据、函数指针,由于这些本地内容的作用单位为进程,只在单个进程中生效,在其它进程中指向非法内容,直接共享使用时可能会触发段错误、程序执行异常等问题。

dpdk 多进程模型共享队列收发包 virtio 网卡段错误问题 这篇文章描述的就是共享数据中存在的函数指针为本地内容,在其它进程中为非法地址的问题,实际要支持多进程模型却不限于这一个问题。

本文将以 dpdk-16.11 virtio pmd 驱动为例,描述支持 dpdk 多进程访问网卡资源对驱动的一些要求。

virtio modern 网卡支持 dpdk 多进程模型的关键修改

  1. hw 结构中的 vtpci_ops 函数指针数组本地化
  2. hw 结构中的 rte_pci_ioport 结构本地化
  3. rte_pci_ioport 结构中的 rte_pci_device 结构本地化
  4. secondary 需要 mmap pci resource bar

virtio legacy 网卡支持 dpdk 多进程模型的关键修改

  1. hw 结构中的 vtpci_ops 函数指针数组本地化
  2. hw 结构中的 rte_pci_ioport 结构本地化
  3. rte_pci_ioport 结构中的 rte_pci_device 结构本地化
  4. secondary 进程需要 mmap portio resource

dpdk 相关 commit 信息

vtpci_ops 的本地化:

commit 553f45932fb797e9fbd6342016e0dd53e1f61fc7
Author: Yuanhan Liu 
Date:   Fri Jan 6 18:16:17 2017 +0800

    net/virtio: store PCI operators pointer locally
    
    We used to store the vtpci_ops at virtio_hw structure. The struct,
    however, is stored in shared memory. That means only one value is
    allowed. For the multiple process model, however, the address of
    vtpci_ops should be different among different processes.
    
    Take virtio PMD as example, the vtpci_ops is set by the primary
    process, based on its own process space. If we access that address
    from the secondary process, that would be an illegal memory access,
    A crash then might happen.
    
    To make the multiple process model work, we need store the vtpci_ops
    in local memory but not in a shared memory. This is what the patch
    does: a local virtio_hw_internal array of size RTE_MAX_ETHPORTS is
    allocated. This new structure is used to store all these kind of
    info in a non-shared memory. Current, we have:
    - vtpci_ops
    - rte_pci_ioport
    - virtio pci mapped memory, such as common_cfg.

核心修改如下:

-               hw->vtpci_ops = &modern_ops;
-               hw->modern    = 1;
+               virtio_hw_internal[hw->port_id].vtpci_ops = &modern_ops;
+               hw->modern = 1;
                *dev_flags |= RTE_ETH_DEV_INTR_LSC;
                return 0;
        }
@@ -755,7 +755,7 @@ vtpci_init(struct rte_pci_device *dev, struct virtio_hw *hw,
                return -1;
        }
 
-       hw->vtpci_ops = &legacy_ops;
+       virtio_hw_internal[hw->port_id].vtpci_ops = &legacy_ops;

此 patch 使用一个本地 virtio_hw_internal 数组来在每个进程中初始化不同的 vtpci_ops,实现本地化。
rte_pci_ioport 的本地化:

commit 1ca893f11d1d47c13535805c3ec7ca11e26cbe03
Author: Yuanhan Liu 
Date:   Fri Jan 6 18:16:18 2017 +0800

    net/virtio: store IO port info locally

    Like vtpci_ops, the rte_pci_ioport has to store in local memory. This
    is basically for the rte_pci_device field is allocated from process
    local memory, but not from shared memory.

核心修改如下:

 struct virtio_hw {
        struct virtnet_ctl *cvq;
-       struct rte_pci_ioport io;
        uint64_t    req_guest_features;
        uint64_t    guest_features;
        uint32_t    max_queue_pairs;
@@ -275,9 +274,11 @@ struct virtio_hw {
  */
 struct virtio_hw_internal {
        const struct virtio_pci_ops *vtpci_ops;
+       struct rte_pci_ioport io;
 };
 
 #define VTPCI_OPS(hw)  (virtio_hw_internal[(hw)->port_id].vtpci_ops)
+#define VTPCI_IO(hw)   (&virtio_hw_internal[(hw)->port_id].io)

修改方法与 vtpci_ops 本地化方法一致。

secondary 进程 remap pci resource bar:

commit 6d890f8ab51295045a53f41c4d2654bb1f01cf38
Author: Yuanhan Liu 
Date:   Fri Jan 6 18:16:19 2017 +0800

    net/virtio: fix multiple process support
    
    The introduce of virtio 1.0 support brings yet another set of ops, badly,
    it's not handled correctly, that it breaks the multiple process support.
    
    The issue is the data/function pointer may vary from different processes,
    and the old used to do one time set (for primary process only). That
    said, the function pointer the secondary process saw is actually from the
    primary process space. Accessing it could likely result to a crash.
    
    Kudos to the last patches, we now be able to maintain those info that may
    vary among different process locally, meaning every process could have its
    own copy for each of them, with the correct value set. And this is what
    this patch does:
    
    - remap the PCI (IO port for legacy device and memory map for modern
      device)
    
    - set vtpci_ops correctly
    
    After that, multiple process would work like a charm. (At least, it
    passed my fuzzy test)

核心修改:

+static int
+virtio_remap_pci(struct rte_pci_device *pci_dev, struct virtio_hw *hw)
+{
+       if (hw->modern) {
+               /*
+                * We don't have to re-parse the PCI config space, since
+                * rte_eal_pci_map_device() makes sure the mapped address
+                * in secondary process would equal to the one mapped in
+                * the primary process: error will be returned if that
+                * requirement is not met.
+                *
+                * That said, we could simply reuse all cap pointers
+                * (such as dev_cfg, common_cfg, etc.) parsed from the
+                * primary process, which is stored in shared memory.
+                */
+               if (rte_eal_pci_map_device(pci_dev)) {
+                       PMD_INIT_LOG(DEBUG, "failed to map pci device!");
+                       return -1;
+               }
+       } else {
+               if (rte_eal_pci_ioport_map(pci_dev, 0, VTPCI_IO(hw)) < 0)
+                       return -1;
+       }
+
+   

此 patch 新添加了一个 virtio_remap_pci 函数,在 secondary 进程运行的时候被调用来 remap pci resource bar。

legacy virtio 使用 portio 方式访问网卡物理地址,当网卡绑定到 igb_uio 驱动时,它并不会 remap pci resource bar,而仅仅解析地址,然后通过 ioport 来访问。需要rte_eal_pci_ioport_map 函数的最后有如下代码:

        if (!ret)
               p->dev = dev;

这里将 rte_pci_ioport 结构中的 rte_pci_device 结构本地化,缺少这一步 legacy virtio 驱动不能正常工作。原因在于 legacy 会调用 rte_eal_pci_ioport_write 接口来写入网卡寄存器,此接口依赖 rte_pci_ioport 中的 rte_pci_device 结构来分发到不同的逻辑上,当此结构为空、非法时,virtio 无法写入寄存器就不能正常工作。

rte_eal_pci_ioport_write 函数部分代码摘录如下:

void
rte_eal_pci_ioport_write(struct rte_pci_ioport *p,
                         const void *data, size_t len, off_t offset)
{
        switch (p->dev->kdrv) {
#ifdef VFIO_PRESENT
        case RTE_KDRV_VFIO:
                pci_vfio_ioport_write(p, data, len, offset);
                break;
#endif
        case RTE_KDRV_IGB_UIO:
                pci_uio_ioport_write(p, data, len, offset);
                break;
        case RTE_KDRV_UIO_GENERIC:
...........................................................

总结

dpdk 多进程模型支持在多个进程中访问网卡硬件资源,拓宽了使用场景的同时也对驱动实现有了进一步的要求,驱动要解决的核心问题在于部分共享资源的本地化,根本问题是 linux 系统中,每个进程的虚拟内存空间相互隔离。当前的驱动实现为了支持多进程模型都需要修改代码,也许可以考虑引入一种通用的方案,避免每个驱动单独适配。

你可能感兴趣的:(dpdk,dpdk,多进程模型,virtio,portio)