The number of use cases for virtualizing DMA devices that do not have built-in SR_IOV capability is increasing. Previously, to virtualize such devices, developers had to create their own management interfaces and APIs, and then integrate them with user space software. To simplify integration with user space software, we have identified common requirements and a unified management interface for such devices. 虚拟化没有内置SR_IOV能力的DMA设备的用例数量正在增加。以前,为了虚拟化这类设备,开发者必须创建自己的管理界面和API,然后与用户空间软件集成。为了简化与用户空间软件的集成,我们已经确定了此类设备的共同要求和统一的管理界面。
The VFIO driver framework provides unified APIs for direct device access. It is an IOMMU/device-agnostic framework for exposing direct device access to user space in a secure, IOMMU-protected environment. This framework is used for multiple devices, such as GPUs, network adapters, and compute accelerators. With direct device access, virtual machines or user space applications have direct access to the physical device. This framework is reused for mediated devices. VFIO驱动框架为直接设备访问提供统一的API。它是一个IOMMU/设备无关的框架,用于在一个安全的、受IOMMU保护的环境中向用户空间暴露直接设备访问。这个框架用于多种设备,如GPU、网络适配器和计算加速器。通过直接设备访问,虚拟机或用户空间应用程序可以直接访问物理设备。这个框架被重用在中间设备上。
The mediated core driver provides a common interface for mediated device management that can be used by drivers of different devices. This module provides a generic interface to perform these operations: 中间设备核心驱动为设备管理提供了一个通用接口,可以被不同设备的驱动使用。这个模块提供了一个通用接口来执行这些操作。
The mediated core driver also provides an interface to register a bus driver. For example, the mediated VFIO mdev driver is designed for mediated devices and supports VFIO APIs. The mediated bus driver adds a mediated device to and removes it from a VFIO group. 中间的核心驱动也提供了一个接口来注册一个总线驱动。例如,中间的VFIO mdev驱动是为中间设备设计的,支持VFIO APIs。中间总线驱动将一个中间设备添加到一个VFIO组中,并从该组中移除。
The following high-level block diagram shows the main components and interfaces in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM devices as examples, as these devices are the first devices to use this module: 下面的概要框图显示了VFIO中间驱动框架的主要组件和接口。图中以英伟达、英特尔和IBM设备为例,因为这些设备是第一批使用该模块的设备。
Registration Interfaces
The mediated core driver provides the following types of registration interfaces: 中间的核心驱动提供以下类型的注册接口。
Registration Interface for a Mediated Bus Driver
The registration interface for a mediated device driver provides the following structure to represent a mediated device’s driver: 中间设备驱动程序的注册接口提供了以下结构来表示一个中间设备的驱动程序。
/*
* struct mdev_driver [2] - Mediated device's driver
* @probe: called when new device created
* @remove: called when device removed
* @driver: device driver structure
*/
struct mdev_driver {
int (*probe) (struct mdev_device *dev);
void (*remove) (struct mdev_device *dev);
unsigned int (*get_available)(struct mdev_type *mtype);
ssize_t (*show_description)(struct mdev_type *mtype, char *buf);
struct device_driver driver;
};
A mediated bus driver for mdev should use this structure in the function calls to register and unregister itself with the core driver: mdev的中间总线驱动应该在函数调用中使用这个结构,以便在核心驱动中注册和取消注册。
The mediated bus driver’s probe function should create a vfio_device on top of the mdev_device and connect it to an appropriate implementation of vfio_device_ops. 中间总线驱动的probe函数应该在mdev_device之上创建一个vfio_device,并将其连接到vfio_device_ops的适当实现。
When a driver wants to add the GUID creation sysfs to an existing device it has probe’d to then it should call: 当一个驱动程序想把GUID创建sysfs添加到它已经探测到的现有设备上时,它应该调用。
int mdev_register_parent(struct mdev_parent *parent, struct device *dev,
struct mdev_driver *mdev_driver);
This will provide the ‘mdev_supported_types/XX/create’ files which can then be used to trigger the creation of a mdev_device. The created mdev_device will be attached to the specified driver. 这将提供'mdev_supported_types/XX/create'文件,然后可以用它来触发mdev_device的创建。创建的mdev_device将被连接到指定的驱动程序上。
When the driver needs to remove itself it calls: 当驱动程序需要移除自己时,它会调用
void mdev_unregister_parent(struct mdev_parent *parent);
Which will unbind and destroy all the created mdevs and remove the sysfs files. 这将解除绑定并销毁所有创建的mdevs,并删除sysfs文件。
Mediated Device Management Interface Through sysfs
The management interface through sysfs enables user space software, such as libvirt, to query and configure mediated devices in a hardware-agnostic fashion. This management interface provides flexibility to the underlying physical device’s driver to support features such as: 通过sysfs的管理接口,用户空间的软件,如libvirt,可以以一种硬件无关的方式查询和配置中间设备。这个管理接口为底层物理设备的驱动程序提供了灵活性,以支持以下功能。
Links in the mdev_bus Class Directory
The /sys/class/mdev_bus/ directory contains links to devices that are registered with the mdev core driver. /sys/class/mdev_bus/目录包含了与mdev核心驱动注册的设备的链接。
Directories and files under the sysfs for Each Physical Device
|- [parent physical device]
|--- Vendor-specific-attributes [optional]
|--- [mdev_supported_types]
| |--- [
| | |--- create
| | |--- name
| | |--- available_instances
| | |--- device_api
| | |--- description
| | |--- [devices]
| |--- [
| | |--- create
| | |--- name
| | |--- available_instances
| | |--- device_api
| | |--- description
| | |--- [devices]
| |--- [
| |--- create
| |--- name
| |--- available_instances
| |--- device_api
| |--- description
| |--- [devices]
The list of currently supported mediated device types and their details. 当前支持的中间设备类型的列表和它们的细节。
[
The [
sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
This attribute shows which device API is being created, for example, “vfio-pci” for a PCI device. 这个属性显示了正在创建的设备API,例如,"vfio-pci "代表一个PCI设备。
This attribute shows the number of devices of type
This directory contains links to the devices of type
This attribute shows a human readable name. 这个属性显示的是一个人可读的名字。
This attribute can show brief features/description of the type. This is an optional attribute. 这个属性可以显示该类型的简要特征/描述。这是一个可选的属性。
Directories and Files Under the sysfs for Each mdev Device
|- [parent phy device]
|--- [$MDEV_UUID]
|--- remove
|--- mdev_type {link to its type}
|--- vendor-specific-attributes [optional]
Writing ‘1’ to the ‘remove’ file destroys the mdev device. The vendor driver can fail the remove() callback if that device is active and the vendor driver doesn’t support hot unplug. 向'remove'文件写'1'会破坏mdev设备。如果该设备处于活动状态,并且厂商驱动不支持热拔插,那么厂商驱动可以使remove()回调失败。
Example:
# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
Mediated device Hot plug
Mediated devices can be created and assigned at runtime. The procedure to hot plug a mediated device is the same as the procedure to hot plug a PCI device. 中间设备可以在运行时创建和分配。热插拔中间设备的程序与热插拔PCI设备的程序相同。
Translation APIs for Mediated Devices
The following APIs are provided for translating user pfn to host pfn in a VFIO driver: 以下是提供的API,用于在VFIO驱动中把用户pfn转换为主机pfn。
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, struct page **pages);
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
int npage);
These functions call back into the back-end IOMMU module by using the pin_pages and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently these callbacks are supported in the TYPE1 IOMMU module. To enable them for other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide these two callback functions. 这些函数通过使用结构vfio_iommu_driver_ops[4]的pin_pages和unpin_pages回调来回调到后端IOMMU模块。目前这些回调在TYPE1 IOMMU模块中被支持。为了使其他IOMMU后端模块,如PPC64 sPAPR模块能够使用它们,它们需要提供这两个回调函数。
Using the Sample Code
mtty.c in samples/vfio-mdev/ directory is a sample driver program to demonstrate how to use the mediated device framework. samples/vfio-dev/目录下的mtty.c是一个示例驱动程序,用来演示如何使用中间设备框架。
The sample driver creates an mdev device that simulates a serial port over a PCI card. 该样本驱动程序创建了一个mdev设备,通过PCI卡模拟一个串行端口。
This step creates a dummy device, 这一步创建了一个dummy设备 /sys/devices/virtual/mtty/mtty/
Files in this device directory in sysfs are similar to the following: sysfs中该设备目录下的文件与以下内容类似。
# tree /sys/devices/virtual/mtty/mtty/
/sys/devices/virtual/mtty/mtty/
|-- mdev_supported_types
| |-- mtty-1
| | |-- available_instances
| | |-- create
| | |-- device_api
| | |-- devices
| | `-- name
| `-- mtty-2
| |-- available_instances
| |-- create
| |-- device_api
| |-- devices
| `-- name
|-- mtty_dev
| `-- sample_mtty_dev
|-- power
| |-- autosuspend_delay_ms
| |-- control
| |-- runtime_active_time
| |-- runtime_status
| `-- runtime_suspended_time
|-- subsystem -> ../../../../class/mtty
`-- uevent
In the Linux guest VM, with no hardware on the host, the device appears as follows: 在Linux客户虚拟机中,由于主机上没有硬件,设备显示如下。
# lspci -s 00:05.0 -xxvv
00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
Subsystem: Device 4348:3253
Physical Slot: 5
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at c150 [size=8]
Region 1: I/O ports at c158 [size=8]
Kernel driver in use: serial
00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
In the Linux guest VM, dmesg output for the device is as follows:
serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
Data is loop backed from hosts mtty driver. 数据会在host mtty驱动回显
# echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove