创建虚拟机时,会为 qemu 配置直通设备的 bdf, VM 启动后 guest 中设备驱动初始化时,qemu会通过 VFIO 配置 IOMMU 的 IRTE,也会配置设备的 MSI 的 DATA 信息(原来 MSI Data vector只有 8位,使能了 interrupt remapping 后,vector 存在放 IRTE中,IRTE表的索引放置在MSI Address 和Data 寄存器中,spec 5.1.3 )。同时创建VM时, posted descriptor 基地址会呈现在VMCS中。
posted 模式整个过程是由硬件做的,设备中断如果是 msi, 那先到IOMMU, IOMMU 根据 msi 信息为索引,查找中断重映射表,然后根据 posted descriptor 基址修改 posted descriptor 中的中断位,当 IOMMU 修改 posted descreptor 后,会给相应 vCPU 所在的物理 CPU 发送一个中断。
vt-d spec:
5.1.3 Interrupt Remapping Table
Interrupt-remapping hardware utilizes a memory-resident single-level table, called the Interrupt Remapping Table. The interrupt remapping table is expected to be setup by system software, and its base address and size is specified through the Interrupt Remap Table Address Register. Each entry in the table is 128-bits in size and is referred to as Interrupt Remapping Table Entry (IRTE).Section 9.9 illustrates the IRTE format.
For interrupt requests in Remappable format, the interrupt-remapping hardware computes the ‘interrupt_index’ as below. The Handle, SHV and Subhandle are respective fields from the interrupt address and data per the Remappable interrupt format.
if (address.SHV == 0) {
interrupt_index = address.handle;
} else {
interrupt_index = (address.handle + data.subhandle);
}
The Interrupt Remap Table Address Register is programmed by software to specify the number of IRTEs in the Interrupt Remapping Table (maximum number of IRTEs in an Interrupt Remapping Table is 64K). Remapping hardware units in the platform may be configured to share interrupt-remapping table or use independent tables. The interrupt_index is used to index the appropriate IRTE in the interrupt-remapping table. If the interrupt_index value computed is equal to or larger than the number of IRTEs in the remapping table, hardware treats the interrupt request as error.
Unlike the Compatibility interrupt format where all the interrupt attributes are encoded in the interrupt request address/data, the Remappable interrupt format specifies only the fields needed to compute the interrupt_index. The attributes of the remapped interrupt request is specified through the IRTE referenced by the interrupt_index.The interrupt-remapping architecture defines support for hardware to cache frequently used IRTEs for improved performance. For usages where software may need to dynamically update the IRTE, architecture defines commands to invalidate the IEC. Chapter 6 describes the caching constructs and associated invalidation commands.
5.1.4 Interrupt-Remapping Hardware Operation
5.2 Interrupt Posting
Interrupt-posting capability is an extension of interrupt-remapping hardware for extended processing of remappable format interrupt requests. Interrupt-posting enables a remappable format interrupt request to be posted (recorded) in a coherent main memory resident data-structure, with an optional notification event to the CPU complex to signal pending posted interrupt.
5.2.5 Using Interrupt Posting for Virtual Interrupt Delivery
This section is informative and intended to illustrate a simplified example1 usage of how a Virtual Machine Monitor (VMM) software may use interrupt-posting hardware to support efficient delivery of virtual interrupts from assigned devices to virtual machines.
VMM software may enable interrupt-posting for a virtual machine as follows:
• For each virtual processor in the virtual machine, the VMM software may allocate a Posted Interrupt Descriptor. Each such descriptor is used for posting all interrupts that are to be delivered to the respective virtual processor。。。