使用Soft-RoCE实践RDMA

RDMA介绍

RDMA( Remote Direct Memory Access )意为远程直接地址访问,通过RDMA,本端节点可以“直接”访问远端节点的内存。所谓直接,指的是可以像访问本地内存一样,绕过传统以太网复杂的TCP/IP网络协议栈读写远端内存,而这个过程对端是不感知的,而且这个读写过程的大部分工作是由硬件而不是软件完成的。

RDMA本身指的是一种技术,具体协议层面,包含Infiniband(IB),RDMA over Converged Ethernet(RoCE)和internet Wide Area RDMA Protocol(iWARP)。三种协议都符合RDMA标准,使用相同的上层接口,在不同层次上有一些差别。RoCE成本比IB低,效果比iWARP好。

使用Soft-RoCE实践RDMA_第1张图片

RDMA技术实际应用的话是得依赖网卡来完成大部分工作的,需要硬件层面支持RDMA协议的智能网卡,好在我们有Soft-RoCE,它通过软件代替硬件来将IB传输层的报文加在普通UDP报文中,从而得以让普通网卡也可以发送RoCE报文,这对于为我们学习IB传输层协议,以及编写调试基于Verbs的RDMA程序提供了一种非常低成本的方案。接下来就介绍如何安装Soft-RoCE。

Soft-RoCE与网卡硬件支持的RoCE对比如下:

使用Soft-RoCE实践RDMA_第2张图片

 下面我们就开始演示在Linux上实践RoCE。

安装Soft-RoCE

apt-get安装必要组件

$ sudo apt-get install libibverbs1 ibverbs-utils
软件包名 主要功能
libibverbs1 ib verbs动态链接库
ibverbs-utils ibverbs示例程序
librdmacm1 rdmacm动态链接库
libibumad3 ibumad动态链接库
ibverbs-providers ibverbs各厂商用户态驱动(包括RXE)
rdma-core 文档及用户态配置文件

 加载驱动

$ sudo modprobe rdma_rxe

创建支持RDMA协议的逻辑网卡

$ sudo rdma link add rxe0 type rxe netdev enp0s3

查看创建的RXE逻辑接口

$ ibv_devices
    device                 node GUID
    ------              ----------------
    rxe0                0a0027fffe5ac323

$ ibv_devinfo -d rxe0
hca_id: rxe0
        transport:                      InfiniBand (0)
        fw_ver:                         0.0.0
        node_guid:                      0a00:27ff:fe5a:c323
        sys_image_guid:                 0a00:27ff:fe5a:c323
        vendor_id:                      0xffffff
        vendor_part_id:                 0
        hw_ver:                         0x0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet
$ sudo rdma link show
link rxe0/1 state ACTIVE physical_state LINK_UP netdev enp0s3
$ sudo ibstat
CA 'rxe0'
        CA type:
        Number of ports: 1
        Firmware version:
        Hardware version:
        Node GUID: 0x0a0027fffe5ac323
        System image GUID: 0x0a0027fffe5ac323
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 2.5
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00010000
                Port GUID: 0x0a0027fffe5ac323
                Link layer: Ethernet

测试RDMA连通性

Server端

$ sudo rping -s -a 192.168.31.79 -v -C 10
[sudo] password for wq:
server ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
server ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
server ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
server ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
server ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
server ping data: rdma-ping-5: FGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvw
server ping data: rdma-ping-6: GHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwx
server ping data: rdma-ping-7: HIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy
server ping data: rdma-ping-8: IJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
server ping data: rdma-ping-9: JKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyzA
server DISCONNECT EVENT...
wait for RDMA_READ_ADV state 10

Client端

$ rping -c -a 192.168.31.79 -v -C 10
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
ping data: rdma-ping-5: FGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvw
ping data: rdma-ping-6: GHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwx
ping data: rdma-ping-7: HIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy
ping data: rdma-ping-8: IJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
ping data: rdma-ping-9: JKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyzA
client DISCONNECT EVENT...

 wireshark抓包验证如下

使用Soft-RoCE实践RDMA_第3张图片

 

RDMA编程用户手册

手册的CSDN下载链接

RDMA编程用户手册里介绍的API有 初始化、设备操作、Verbs上下文操作、保护域操作、Queue Pair使能、激活Queue Pair操作、事件句柄操作、实验APIs等。

编写测试程序

安装用户态依赖库

$ sudo apt-get install libibverbs-dev

从git上拉取测试示例

$ git clone https://gitee.com/wq897/RDMA-EXAMPLE.git
$ cd RDMA-EXAMPLE/01/ && make

运行服务端程序

$ sudo ./service
 ------------------------------------------------
 Device name : "(null)"
 IB port : 1
 TCP port : 19875
 ------------------------------------------------

waiting on port 19875 for TCP connection
TCP connection was established
searching for IB devices in host
found 1 device(s)
device not specified, using first one found: rxe0
going to send the message: 'SEND operation '
MR was registered with addr=0x55d57ea7c340, lkey=0x49b, rkey=0x49b, flags=0x7
QP was created, QP number=0x13

Local LID = 0x0
Remote address = 0x558d59dbb340
Remote rkey = 0x43d
Remote QP number = 0x13
Remote LID = 0x0
failed to modify QP state to RTR
failed to modify QP state to RTR
failed to connect QPs

test result is 1

运行客户端程序

$ sudo ./service 192.168.3.79
servername=192.168.3.79
 ------------------------------------------------
 Device name : "(null)"
 IB port : 1
 IP : 192.168.3.79
 TCP port : 19875
 ------------------------------------------------

TCP connection was established
searching for IB devices in host
found 1 device(s)
device not specified, using first one found: rxe0
MR was registered with addr=0x558d59dbb340, lkey=0x43d, rkey=0x43d, flags=0x7
QP was created, QP number=0x13

Local LID = 0x0
Remote address = 0x55d57ea7c340
Remote rkey = 0x49b
Remote QP number = 0x13
Remote LID = 0x0
Receive Request was posted
failed to modify QP state to RTR
failed to modify QP state to RTR
failed to connect QPs

test result is 1

上述代码中打印“failed to modify QP state to RTR”错误,debug为ibv_modify_qp() 返回错误码为22错误,即ibv_modify_qp的参数struct ibv_qp_attr attr错误。

你可能感兴趣的:(Linux网络基础,RDMA)