RDMA在KVM实现条件

KVM 支持VF passthrough条件

CPU必须支持 Intel VT-d 或 AMD-Vi(IOMMU)技术

demsg要包含下述两部分

  • DMAR: Intel(R) Virtualization Technology for Directed I/O
  • DMAR: IOMMU enabled

检查CPU是否支持VT-d或AMD-Vi

# dmesg |grep -e "DMAR" -e "IOMMU"|grep -e "Virtualization" -e enabled

[    0.000000] DMAR: IOMMU enabled

[    0.001068] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.

[    1.150702] DMAR: Intel(R) Virtualization Technology for Directed I/O

内核必须支持vfiovfio_iommu_type1vfio_pci 等模块

检查Kernel加载 IOMMU 相关的内核模块

[root@stgExt1 qemu]# lsmod|grep -e vfio -e iommu

vfio_pci               61440  0

vfio_virqfd            16384  1 vfio_pci

vfio_iommu_type1       36864  0

vfio                   36864  2 vfio_iommu_type1,vfio_pci

irqbypass              16384  422 vfio_pci,kvm

QEMU必须2.0版本以上

centos8.4自带qemu版本4.2.0,BVT环境已升级至8.0.2,且QEMU需要重新编译

configure ./

./configure --prefix=/usr/local/qemu_rdma/ --enable-debug --enable-kvm --enable-vnc --target-list=x86_64-softmmu --enable-spice --enable-spice-protocol --enable-vnc --enable-usb-redir --enable-rdma

QEMU替换步骤

example

ln -sf /usr/local/qemu_rdma/bin/qemu-system-x86_64 /usr/libexec/qemu-kvm

setenforce 0

libvirt 版本是 1.2.9 或更高版本

centos8.4自带libvirt 版本为6.0.0

KVM支持SR-IOV

我们把SR-IOV创建出的虚拟网卡称为VF,如下命令可以查看网卡物理端口ens4f0/1(称PF)最大支持创建的VF均为8个;

KVM支持SR-IOV

我们把SR-IOV创建出的虚拟网卡称为VF,如下命令可以查看网卡物理端口ens4f0/1(称PF)最大支持创建的VF均为8个;

# cat /sys/class/net/ens4f0/device/sriov_totalvfs

8

# cat /sys/class/net/ens4f1/device/sriov_totalvfs

8

ens4f0单个网口虚拟出6个VF

# echo 6 > /sys/class/net/ens4f0/device/sriov_numvfs

# lspci|grep Mellanox

b1:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

b1:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

b1:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

# ip link |grep ens4

261: ens4f0v0: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

262: ens4f0v1: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

263: ens4f0v2: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

264: ens4f0v3: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

265: ens4f0v4: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

266: ens4f0v5: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

18: ens4f0: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

19: ens4f1: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

# ip link show ens4f0v0

261: ens4f0v0: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

    link/ether 56:ba:79:b5:fb:3a brd ff:ff:ff:ff:ff:ff

[root@stgExt1 qemu]# ip link show ens4f0v1

262: ens4f0v1: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

    link/ether 42:f9:c8:62:be:fd brd ff:ff:ff:ff:ff:ff

[root@stgExt1 qemu]# ip link show ens4f0v2

263: ens4f0v2: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

    link/ether 2e:2b:21:22:a7:da brd ff:ff:ff:ff:ff:ff

[root@stgExt1 qemu]# ip link show ens4f0v3

264: ens4f0v3: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

    link/ether 22:cd:f8:8e:8b:39 brd ff:ff:ff:ff:ff:ff

[root@stgExt1 qemu]# ip link show ens4f0v4

265: ens4f0v4: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

    link/ether b6:b1:22:d5:28:46 brd ff:ff:ff:ff:ff:ff

[root@stgExt1 qemu]# ip link show ens4f0v5

266: ens4f0v5: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

    link/ether be:64:4f:36:e0:f7 brd ff:ff:ff:ff:ff:ff

lspci命令行输出

# lspci -nn |grep Mellanox

b1:00.0 Ethernet controller [0200]: Mellanox Technologies MT2894 Family [ConnectX-6 Lx] [15b3:101f]

b1:00.1 Ethernet controller [0200]: Mellanox Technologies MT2894 Family [ConnectX-6 Lx] [15b3:101f]

b1:00.2 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.3 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.4 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.5 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.6 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.7 Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

永久生效还需要

创建文件 /etc/modprobe.d/mlx5.conf,并添加以下内容:

cat /etc/modprobe.d/mlx5.conf

options mlx5_core num_vfs=2

为VF接口创建一个udev 规则/etc/udev/rules.d/ens4f0.rules, 使创建的VF持久化

cat /etc/udev/rules.d/ens4f0.rules

ACTION=="add", SUBSYSTEM=="net", DRIVERS=="mlx5_core", ATTR{device/sriov_numvfs}="8"

重新加载 mlx5_core 内核模块以使配置生效:

modprobe -r mlx5_core && modprobe mlx5_core

$ modprobe -r mlx5_core && modprobe mlx5_core

保存生效后,可以查看到VF,例如:

$ ip link show

$ ip link show

查看RDMA链接状态

$ ip link show

$ rdma link show

0/1: mlx5_0/1: state ACTIVE physical_state LINK_UP netdev ens1f0np0

1/1: mlx5_1/1: state ACTIVE physical_state LINK_UP netdev ens1f1np1

网口下层Link Layer: Ethernet表示RoCE协议

# ibstat

CA 'mlx5_0'

    CA type: MT4123

    Number of ports: 1

    Firmware version: 20.30.1004

    Hardware version: 0

    Node GUID: 0xb83fd20300d3e4c6

    System image GUID: 0xb83fd20300d3e4c6

    Port 1:

        State: Active

        Physical state: LinkUp

        Rate: 100

        Base lid: 0

        LMC: 0

        SM lid: 0

        Capability mask: 0x00010000

        Port GUID: 0xba3fd2fffed3e4c6

        Link layer: Ethernet

CA 'mlx5_1'

    CA type: MT4123

    Number of ports: 1

    Firmware version: 20.30.1004

    Hardware version: 0

    Node GUID: 0xb83fd20300d3e4c7

    System image GUID: 0xb83fd20300d3e4c6

    Port 1:

        State: Active

        Physical state: LinkUp

        Rate: 100

        Base lid: 0

        LMC: 0

        SM lid: 0

        Capability mask: 0x00010000

        Port GUID: 0xba3fd2fffed3e4c7

        Link layer: Ethernet

ibv_devinfo -v 的输出中,每个网络接口都可能包含多个 GID(Global Identifier),每个 GID 表示一个全局唯一标识符,用于唯一标识 InfiniBand 网络中的节点或端口。其中,每个 GID 都会指定一个协议版本,如 RoCE v1 或 RoCE v2。

在 ibv_devinfo -v 命令的输出中

  • 如果看到 transport: Ethernet,则表示使用以太网协议;
  • 如果同时看到 RoCE v1 或 RoCE v2,则说明使用了 RoCE 协议;

ibv_devinfo -v |grep GID

# ibv_devinfo -v

hca_id: mlx5_0

    transport:          InfiniBand (0)

    fw_ver:             20.30.1004

    node_guid:          b83f:d203:00d3:e4c6

    sys_image_guid:         b83f:d203:00d3:e4c6

    vendor_id:          0x02c9

    vendor_part_id:         4123

    hw_ver:             0x0

    board_id:           LNV0000000017

    phys_port_cnt:          1

    max_mr_size:            0xffffffffffffffff

    page_size_cap:          0xfffffffffffff000

    max_qp:             262144

    max_qp_wr:          32768

    device_cap_flags:       0x25321c36

                    BAD_PKEY_CNTR

                    BAD_QKEY_CNTR

                    AUTO_PATH_MIG

                    CHANGE_PHY_PORT

                    PORT_ACTIVE_EVENT

                    SYS_IMAGE_GUID

                    RC_RNR_NAK_GEN

                    MEM_WINDOW

                    XRC

                    MEM_MGT_EXTENSIONS

                    MEM_WINDOW_TYPE_2B

                    RAW_IP_CSUM

                    MANAGED_FLOW_STEERING

    max_sge:            30

    max_sge_rd:         30

    max_cq:             16777216

    max_cqe:            4194303

    max_mr:             16777216

    max_pd:             8388608

    max_qp_rd_atom:         16

    max_ee_rd_atom:         0

    max_res_rd_atom:        4194304

    max_qp_init_rd_atom:        16

    max_ee_init_rd_atom:        0

    atomic_cap:         ATOMIC_HCA (1)

    max_ee:             0

    max_rdd:            0

    max_mw:             16777216

    max_raw_ipv6_qp:        0

    max_raw_ethy_qp:        0

    max_mcast_grp:          2097152

    max_mcast_qp_attach:        240

    max_total_mcast_qp_attach:  503316480

    max_ah:             2147483647

    max_fmr:            0

    max_srq:            8388608

    max_srq_wr:         32767

    max_srq_sge:            31

    max_pkeys:          128

    local_ca_ack_delay:     16

    general_odp_caps:

                    ODP_SUPPORT

                    ODP_SUPPORT_IMPLICIT

    rc_odp_caps:

                    SUPPORT_SEND

                    SUPPORT_RECV

                    SUPPORT_WRITE

                    SUPPORT_READ

                    SUPPORT_SRQ

    uc_odp_caps:

                    NO SUPPORT

    ud_odp_caps:

                    SUPPORT_SEND

    xrc_odp_caps:

                    SUPPORT_SEND

                    SUPPORT_WRITE

                    SUPPORT_READ

                    SUPPORT_SRQ

    completion timestamp_mask:          0x7fffffffffffffff

    hca_core_clock:         156250kHZ

    raw packet caps:

                    C-VLAN stripping offload

                    Scatter FCS offload

                    IP csum offload

                    Delay drop

    device_cap_flags_ex:        0x3000005425321C36

                    RAW_SCATTER_FCS

                    PCI_WRITE_END_PADDING

                    Unknown flags: 0x3000004000000000

    tso_caps:

        max_tso:            262144

        supported_qp:

                    SUPPORT_RAW_PACKET

    rss_caps:

        max_rwq_indirection_tables:         1048576

        max_rwq_indirection_table_size:         2048

        rx_hash_function:               0x1

        rx_hash_fields_mask:                0x800000FF

        supported_qp:

                    SUPPORT_RAW_PACKET

    max_wq_type_rq:         8388608

    packet_pacing_caps:

        qp_rate_limit_min:  1kbps

        qp_rate_limit_max:  100000000kbps

        supported_qp:

                    SUPPORT_RAW_PACKET

    tag matching not supported

    cq moderation caps:

        max_cq_count:   65535

        max_cq_period:  4095 us

    maximum available device memory:    131072Bytes

    num_comp_vectors:       63

        port:   1

            state:          PORT_ACTIVE (4)

            max_mtu:        4096 (5)

            active_mtu:     1024 (3)

            sm_lid:         0

            port_lid:       0

            port_lmc:       0x00

            link_layer:     Ethernet

            max_msg_sz:     0x40000000

            port_cap_flags:     0x04010000

            port_cap_flags2:    0x0000

            max_vl_num:     invalid value (0)

            bad_pkey_cntr:      0x0

            qkey_viol_cntr:     0x0

            sm_sl:          0

            pkey_tbl_len:       1

            gid_tbl_len:        255

            subnet_timeout:     0

            init_type_reply:    0

            active_width:       4X (2)

            active_speed:       25.0 Gbps (32)

            phys_state:     LINK_UP (5)

            GID[  0]:       fe80:0000:0000:0000:ba3f:d2ff:fed3:e4c6, RoCE v1

            GID[  1]:       fe80::ba3f:d2ff:fed3:e4c6, RoCE v2

hca_id: mlx5_1

    transport:          InfiniBand (0)

    fw_ver:             20.30.1004

    node_guid:          b83f:d203:00d3:e4c7

    sys_image_guid:         b83f:d203:00d3:e4c6

    vendor_id:          0x02c9

    vendor_part_id:         4123

    hw_ver:             0x0

    board_id:           LNV0000000017

    phys_port_cnt:          1

    max_mr_size:            0xffffffffffffffff

    page_size_cap:          0xfffffffffffff000

    max_qp:             262144

    max_qp_wr:          32768

    device_cap_flags:       0x25321c36

                    BAD_PKEY_CNTR

                    BAD_QKEY_CNTR

                    AUTO_PATH_MIG

                    CHANGE_PHY_PORT

                    PORT_ACTIVE_EVENT

                    SYS_IMAGE_GUID

                    RC_RNR_NAK_GEN

                    MEM_WINDOW

                    XRC

                    MEM_MGT_EXTENSIONS

                    MEM_WINDOW_TYPE_2B

                    RAW_IP_CSUM

                    MANAGED_FLOW_STEERING

    max_sge:            30

    max_sge_rd:         30

    max_cq:             16777216

    max_cqe:            4194303

    max_mr:             16777216

    max_pd:             8388608

    max_qp_rd_atom:         16

    max_ee_rd_atom:         0

    max_res_rd_atom:        4194304

    max_qp_init_rd_atom:        16

    max_ee_init_rd_atom:        0

    atomic_cap:         ATOMIC_HCA (1)

    max_ee:             0

    max_rdd:            0

    max_mw:             16777216

    max_raw_ipv6_qp:        0

    max_raw_ethy_qp:        0

    max_mcast_grp:          2097152

    max_mcast_qp_attach:        240

    max_total_mcast_qp_attach:  503316480

    max_ah:             2147483647

    max_fmr:            0

    max_srq:            8388608

    max_srq_wr:         32767

    max_srq_sge:            31

    max_pkeys:          128

    local_ca_ack_delay:     16

    general_odp_caps:

                    ODP_SUPPORT

                    ODP_SUPPORT_IMPLICIT

    rc_odp_caps:

                    SUPPORT_SEND

                    SUPPORT_RECV

                    SUPPORT_WRITE

                    SUPPORT_READ

                    SUPPORT_SRQ

    uc_odp_caps:

                    NO SUPPORT

    ud_odp_caps:

                    SUPPORT_SEND

    xrc_odp_caps:

                    SUPPORT_SEND

                    SUPPORT_WRITE

                    SUPPORT_READ

                    SUPPORT_SRQ

    completion timestamp_mask:          0x7fffffffffffffff

    hca_core_clock:         156250kHZ

    raw packet caps:

                    C-VLAN stripping offload

                    Scatter FCS offload

                    IP csum offload

                    Delay drop

    device_cap_flags_ex:        0x3000005425321C36

                    RAW_SCATTER_FCS

                    PCI_WRITE_END_PADDING

                    Unknown flags: 0x3000004000000000

    tso_caps:

        max_tso:            262144

        supported_qp:

                    SUPPORT_RAW_PACKET

    rss_caps:

        max_rwq_indirection_tables:         1048576

        max_rwq_indirection_table_size:         2048

        rx_hash_function:               0x1

        rx_hash_fields_mask:                0x800000FF

        supported_qp:

                    SUPPORT_RAW_PACKET

    max_wq_type_rq:         8388608

    packet_pacing_caps:

        qp_rate_limit_min:  1kbps

        qp_rate_limit_max:  100000000kbps

        supported_qp:

                    SUPPORT_RAW_PACKET

    tag matching not supported

    cq moderation caps:

        max_cq_count:   65535

        max_cq_period:  4095 us

    maximum available device memory:    131072Bytes

    num_comp_vectors:       63

        port:   1

            state:          PORT_ACTIVE (4)

            max_mtu:        4096 (5)

            active_mtu:     1024 (3)

            sm_lid:         0

            port_lid:       0

            port_lmc:       0x00

            link_layer:     Ethernet

            max_msg_sz:     0x40000000

            port_cap_flags:     0x04010000

            port_cap_flags2:    0x0000

            max_vl_num:     invalid value (0)

            bad_pkey_cntr:      0x0

            qkey_viol_cntr:     0x0

            sm_sl:          0

            pkey_tbl_len:       1

            gid_tbl_len:        255

            subnet_timeout:     0

            init_type_reply:    0

            active_width:       4X (2)

            active_speed:       25.0 Gbps (32)

            phys_state:     LINK_UP (5)

            GID[  0]:       fe80:0000:0000:0000:ba3f:d2ff:fed3:e4c7, RoCE v1

            GID[  1]:       fe80::ba3f:d2ff:fed3:e4c7, RoCE v2

更多参考:

QEMU官网 Download QEMU - QEMU

你可能感兴趣的:(linux,服务器,数据库)