libvirt之vfio pci的passthrough
2015-08-18
一、准备
1.开机启动项
/etc/default/grub文件
在GRUB_CMDLINE_LINUX参数加入intel_iommu=on
2.加载模块
modprobe vfio
modprobe vfio-pci
3.准备一台虚拟机
e.g 一台虚拟机vfio_test
4. bios开启iommu groups
Navigate through each BIOS screen using the "arrow" keys and find the "IOMMU," "I/O Memory Management Unit," or "Intel ® VT-d"setting (usually located under the "Advanced" or "Chipset/Northbridge/Tylersburg IOH/Intel VT for Directed I/O Configuration"settings menu). Move the cursor over the setting selection box using the "arrow" keys and press the "Page Up" or "Page Down" or specified key to select "Enabled."
二、iommu_group单个设备
1.选择需要passthrough的pci设备,其pci地址是0000:0b:00.0
[root@host192 libvirt]# lspci | grep -i fibre
0b:00.0 Fibre Channel: QLogic Corp.ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 03)
2. 查询其所处的iommu_group,看到iommu_group是19,并且整个组就一个设备0000:0b:00.0
[root@host192 libvirt]# readlink /sys/bus/pci/devices/0000\:0b\:00.0/iommu_group
../../../../kernel/iommu_groups/19
[root@host192 libvirt]# ls /sys/bus/pci/devices/0000\:0b\:00.0/iommu_group/devices/
0000:0b:00.0
3.将iommu_group下的设备unbind,将设备unbind之后,会发现目录下面的driver目录消失了
echo 0000:0b:00.0 >>/sys/bus/pci/devices/0000\:0b\:00.0/driver/unbind
注意:这条指令执行后/sys/bus/pci/devices/0000\:0b\:00.0/下的driver目录会消失
4.将设备地vendorid和deviceid绑定到new_id
[root@host192 libvirt]# lspci -n -s 0000:0b:00.0
0b:00.0 0c04: 1077:2432 (rev 03)
[root@host192 libvirt]# echo 1077 2432 >/sys/bus/pci/drivers/vfio-pci/new_id
5.检测iommu_group是否绑定成功,会发现/dev/vfio下多了19这个组号
[root@host192 vfio]# ls /dev/vfio
19 vfio
6.构建xml,将设备绑定到虚拟机
[root@host192 ljl]# cat net2.xml
[root@host192 ljl]# virsh attach-device vfio_test net2.xml --config
Device attached successfully
7.启动虚拟机
[root@host192 ljl]# virsh start vfio_test
Domain vfio_test started
三、iommu_group下多个设备(比如,双口的网卡)
[root@host192 0000:02:00.0]# ls /sys/bus/pci/devices/0000\:02\:00.0/iommu_group/devices/
0000:02:00.0 0000:02:00.1
多个设备的时候和单个设备类似,只是需要将iommu_group组下所有的设备都unbind掉,比如我想passthrough0000:02:00.0这个设备
echo 0000:02:00.1 >>/sys/bus/pci/devices/0000:02:00.0/driver/ubind
echo 0000:02:00.0 >>/sys/bus/pci/devices/0000:02:00.0/driver/ubind
注意先将0000:02:00.1给unbind掉,因为将0000:02:00.0给unbind之后,…/devices/0000:02:00.0/driver这个目录会消失,这时候再unbind掉0000:02:00.1显然不太可能。
这样就可以像之前那样passthrough掉02:00.0这个设备了。
绑定过程可以通过shell脚本进行完成:
#!/bin/sh if [ $# -ne 1 ] then echo "usage: $(basename $0) pciaddr" exit 1 fi pciaddr=$1 prefullpath="/sys/bus/pci/devices/"$pciaddr fullpath=$prefullpath"/iommu_group/devices" driverfullpath=$prefullpath"/driver/unbind" for i in `ls $fullpath` do if [ $i != $pciaddr ] then # unbind echo $i >>$driverfullpath # new_id prdmsg=`/usr/sbin/lspci -n -s $pciaddr` prdmsg=`echo $prdmsg | /usr/bin/sed 's/:/ /g' | /usr/bin/awk -F' ' '{print $4 " " $5 }'` echo $prdmsg >/sys/bus/pci/drivers/vfio-pci/new_id fi done # unbind and new_id itself echo $pciaddr >>$driverfullpath prdmsg=`/usr/sbin/lspci -n -s $pciaddr` prdmsg=`echo $prdmsg | /usr/bin/sed 's/://g' | /usr/bin/awk -F' ' '{ print $4 " " $5 }'` echo $prdmsg>/sys/bus/pci/drivers/vfio-pci/new_id exit 0
四、iommu_group下不仅是多个设备,而且还有bridge
[root@host192 ljl]# ls /sys/bus/pci/devices/0000\:03\:00.0/iommu_group/devices/
0000:00:1c.0 0000:00:1c.4 0000:02:00.2 0000:03:00.0 0000:04:00.0
0000:00:1c.2 0000:02:00.0 0000:02:00.4 0000:03:00.1 0000:04:00.1
[root@host192 ljl]# lspci -s 0000:00:1c.0
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10Family) PCI Express Root Port 1
[root@host192 ljl]# lspci -s 0000:00:1c.2
00:1c.2 PCI bridge: Intel Corporation 82801JI (ICH10Family) PCI Express Root Port 3
[root@host192 ljl]# lspci -s 0000:03:00.0
03:00.0 Ethernet controller: BroadcomCorporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
主要的区别还是在unbind和new_id的时候,需要将iommu_group下的每个设备在自己的driver/unbind把自己给unbind掉。
比如:
for i in $(ls /sys/kernel/iommu_groups/8/devices/)
do
echo $i >>/sys/kernel/iommu_groups/devices/$i/driver/unbind
ven=$(cat /sys/kernel/iommu_groups/8/devices/$i/vendor
dev=$(cat /sys/kernel/iommu_groups/8/devices/$i/device
echo $ven $dev >/sys/bus/pci/drivers/vfio-pci/new_id
done
五、问题记录
1.启动虚拟机报错
error: internal error: process exited whileconnecting to monitor: 2015-08-11T06:11:06.627255Zqemu-kvm: -device vfio-pci,host=0e:00.0,id=hostdev0,bus=pci.2,addr=0x6: vfio:failed to open /dev/vfio/vfio: Operation not permitted
2015-08-11T06:11:06.627315Z qemu-kvm: -devicevfio-pci,host=0e:00.0,id=hostdev0,bus=pci.2,addr=0x6: vfio: failed to setupcontainer for group 19
2015-08-11T06:11:06.627331Z qemu-kvm: -devicevfio-pci,host=0e:00.0,id=hostdev0,bus=pci.2,addr=0x6: vfio: failed to get group19
2015-08-11T06:11:06.627351Z qemu-kvm: -devicevfio-pci,host=0e:00.0,id=hostdev0,bus=pci.2,addr=0x6: Device initialization failed.
2015-08-11T06:11:06.627371Z qemu-kvm: -devicevfio-pci,host=0e:00.0,id=hostdev0,bus=pci.2,addr=0x6: Device 'vfio-pci' couldnot be initialized
解决方法:
/etc/libvirt/qemu.conf文件,在cgroup_device_acl结构体里加上“/dev/vfio/vfio"这条记录
2.直通了一张显卡,虚拟机不能开机
pci 0000:08:00.0 is not assignable
原因:qemu默认不能直通没有ACS(Access Control Service)控制的设备,要想直通这些设备,需要放开qemu的校验
/etc/libvirt/qemu.conf
开启relaxed_acs_check = 1的选项,这样这些设备就可以被虚拟机直通使用了
3.虚拟机不能开机,failed to set iommu for container
开启vfio_iommu_type1.allow_unsafe_interrupts之后,虚拟机可以启动了
也可以在系统启动的时候,就开启vfio_iommu_type1.allow_unsafe_interrupts
4.虚拟机不能开机,/dev/vfio/16 Operation not permitted
/etc/libvirt/qemu.conf文件,在cgroup_device_acl结构体里加上“/dev/vfio/16"这条记录
六、参考引用
http://www.redhat.com/archives/libvir-list/2013-March/msg00514.html
http://www.linux-kvm.org/images/b/b4/2012-forum-VFIO.pdf
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/chap-Guest_virtual_machine_device_configuration.html#sect-PCI_devices-PCI_passthrough