【RDMA】intel 英特尔RDMA 驱动和ibverslib 库安装|流控PFC

目录

关键步骤概览

查已经安装的版本

详细步骤和说明

编译和安装(Building and Installation )

确认RDMA功能(Confirm RDMA Functionality)

iWARP / RoCEv2选择(iWARP/RoCEv2 Selection)

FC|PFC流控制设置(Flow Control Settings)

ECN配置

内存需求

资源限制配置文件

RDMA统计

perftest测试工具

动态追踪

动态调试

使用tcpdump捕获RDMA流量

intel 中文官网

报错处理

IP配置

查看device和网口的对应关系


作者:bandaoyu  地址:https://blog.csdn.net/bandaoyu/article/details/116203690

更新:

intel 已经提供驱动完整包,包含ice lan驱动和irdma内核态rdma驱动,无需分开下载:英特尔® 以太网适配器完整的驱动程序包

1、安装固件,固件在NVMUpdatePackage文件夹

2、安装lan驱动(ice),在PROCGB文件夹

3、安装rdma内核驱动(irdma),在RDMA文件夹(装了irdma后,ibv_devinfo才有效)

4、安装rdma用户库(rdma-core),

5、安装固件,固件在NVMUpdatePackage文件夹(发现安装了ice (可能是ice+irdma)重新之后,才能装固件)

sudo ./nvmupdate64e -u -l -o update.xml -b -c nvmupdate.cfg

----------原教程:----------------------------------------------------

关键步骤概览

(先装ice-->装irdma-->装rdma-core)

1、服务器上安装intel RDMA物理网卡

2、到官网下载驱动适用于 E810 和 X722 英特尔® 以太网控制器的 Linux* RDMA 驱动程序

3、安装相应的 LAN 驱动程序 (在安装 irdma 之前, E810 和 X722 对应的 LAN 驱动程序(ice 或 i40e)都必须从此版本中包含的源代码构建安装在您的系统上。)(在intel 官网搜索ICE)搜索- Intel.com.  如:E810:适用于 E810 系列设备的英特尔® 网络适配器驱动程序 Linux* 下

下载ICE包,解压,(看readme,按readme操作)进入src,make install。(如果因为环境问题安装失败了,改变环境后再试之前,请make clean)

4、安装irdma

(irdma Linux *驱动程序在支持RDMA的英特尔网络设备上启用RDMA功能。)

5、安装固件(如果需要的话)

6、安装依赖(见本文的“报错处理”)安装rdma-core (用户空间的ibvers库,给应用程序编程提供接口)

注意,执行 patch -p2 < /path/to/irdma-/libirdma-27.0.patch 命令时,别落了“<”符号

7、设置网卡驱动加载模式iWARP或RoCEv2 

    ibv_devinfo命令查看网卡模式

transport:                      iWARP (1)

查已经安装的版本

查ice   modinfo ice

[SDS_Admin@rdma65 data]$ modinfo ice
filename:       /lib/modules/5.10.38-21.hl05.el7.x86_64/updates/drivers/net/ethernet/intel/ice/ice.ko
firmware:       intel/ice/ddp/ice.pkg
version:        1.7.16
license:        GPL v2

查irdma   modinfo irdma   

[SDS_Admin@rdma65 data]$ modinfo irdma
filename:       /lib/modules/5.10.38-21.hl05.el7.x86_64/updates/drivers/infiniband/hw/irdma/irdma.ko
version:        1.7.72
license:        Dual BSD/GPL
description:    Intel(R) Ethernet Protocol Driver for RDMA

查ib-core  modinfo ib-core

[SDS_Admin@rdma65 data]$ modinfo ib-core
filename:       /lib/modules/5.10.38-21.hl05.el7.x86_64/kernel/drivers/infiniband/core/ib_core.ko.xz
alias:          rdma-netlink-subsys-4
license:        Dual BSD/GPL
description:    core kernel InfiniBand API

详细步骤和说明

原文:readme.txt:https://downloadmirror.intel.com/30368/eng/README_irdma_1.4.22.txt

==============================================================================
irdma - Linux* RDMA Driver for the E810 and X722 Intel(R) Ethernet Controllers
==============================================================================

--------
目录
--------
- Overview                               概览
- Prerequisites                          前提和依赖
- Supported OS List                      支持是操作系统
- Building and Installation              编译和安装
- Confirm RDMA Functionality             确认RDMA功能
- iWARP/RoCEv2 Selection                 选择iWARP或RoCEv2
- iWARP Port Mapper (iwpmd)    
- Flow Control Settings
- ECN Configuration
- Devlink Configuration
- Memory Requirements                  内存需求
- Resource Profile Limits                 资源限制
- Resource Limits Selector              资源限制选择
- RDMA Statistics                       RDMA统计
- perftest                              性能测试工具
- MPI
- Performance
- Interoperability
- Dynamic Tracing
- Dynamic Debug
- Capturing RDMA Traffic with tcpdump
- Known Issues/Notes

--------
概览(Overview)
--------

irdma Linux*驱动程序使支持RDMA的英特尔网络设备具有RDMA功能。

此驱动程序支持的设备:

The irdma Linux* driver enables RDMA functionality on RDMA-capable Intel
network devices. Devices supported by this driver:

   - Intel(R) Ethernet Controller E810
   - Intel(R) Ethernet Network Adapter X722

E810和X722设备各自支持一组不同的RDMA功能。

The E810 and X722 devices each support a different set of RDMA features.

-E810支持iWARP和RoCEv2 RDMA传输,还支持优先流控制(PFC)和

显式拥塞通知(ECN)。

-X722仅支持iWARP和一组更有限的配置参数。

    - E810 supports both iWARP and RoCEv2 RDMA transports, and also supports
      congestion management features like priority flow control (PFC) and
      explicit congestion notification (ECN).
    - X722 supports only iWARP and a more limited set of configuration
      parameters.

本文档的每个部分都描述了适配器之间的差异。

对于E810和X722,必须在安装irdma之前安装相应的LAN驱动程序(ice或i40e),从本版本中包含的源代码构建,安装在您的系统上。

Differences between adapters are described in each section of this document.

For both E810 and X722, the corresponding LAN driver (ice or i40e) must be
built from source included in this release and installed on your system prior
to installing irdma.

-------------
先决条件(Prerequisites)
-------------

- Compile and install the E810 or X722 LAN PF driver from source included in
  this release. Refer to the ice or i40e driver README for installation
  instructions.
    * For E810 adapters, use the ice driver.
    * For X722 adapters, use the i40e driver.
- For best results, use a fully supported OS from the Supported OS List below.
- For server memory requirements, see the "Memory Requirements" section of this
  document.
- Install required packages. Refer to the "Building" section of the rdma-core

  README for required packages for your OS:
        https://github.com/linux-rdma/rdma-core/blob/v27.0/README.md
    * RHEL 7 and SLES:
        Install all required packages listed in the rdma-core README.
    * RHEL 8:
        Install the required packages for RHEL 7, then install the following
        additional packages:
            dnf install python3-docutils perl-generators
    * Ubuntu:
        Install the required packages listed in the rdma-core README, then
        install the following additional package:
            apt-get install python3-docutils libsystemd-dev

* Note:

以下是可用于获取 rdma-core依赖包的repo文件示例,当然,这些可能不是所需的全部。

The following are sample repo files that can be used to get the dependent packages
for rdma-core. However, these may not be all that is required.

- For SLES
    http://download.opensuse.org/distribution/leap/42.3/repo/oss

- For RHEL 8.1
    http://vault.centos.org/8.1.1911/PowerTools/x86_64/os/

-----------------
支持的操作系统列表(Supported OS List  )
-----------------

    Supported:
        * RHEL 8.3
        * RHEL 7.9
        * SLES 15 SP2
        * SLES 12 SP5
        * Ubuntu 18.04
        * Ubuntu 20.04

    Supported Not Validated:
        * RHEL 8.2
        * RHEL 8.1
        * RHEL 8
        * RHEL 7.8
        * RHEL 7.7
        * RHEL 7.6 + OFED 4.17-1
        * RHEL 7.5 + OFED 4.17-1
        * RHEL 7.4 + OFED 4.17-1
        * SLES 15 SP1
        * SLES 15 + OFED 4.17-1
        * SLES 12 SP 4 + OFED 4.17-1
        * SLES 12 SP 3 + OFED 4.17-1
        * Linux kernel stable 5.10.*
        * Linux kernel longterm 5.4.*, 4.19.*, 4.14.*

-------------------------


编译和安装(Building and Installation )


-------------------------

To build and install the irdma driver and supporting rdma-core libraries:

1.解压缩irdma驱动程序压缩包:
        tar zxf irdma-.tgz

2.构建并安装RDMA驱动程序:
        cd irdma-
        ./build.sh

   By default, the irdma driver is built using in-distro RDMA libraries and
   modules. Optionally, irdma may also be built using OFED modules. See the
   Supported OS List above for a list of OSes that support this option.
   * Note: Intel products are not validated on other vendors' proprietary
           software packages.
   To install irdma using OFED modules:
        - Download OFED-4.17-1.tgz from the OpenFabrics Alliance:
             wget http://openfabrics.org/downloads/OFED/ofed-4.17-1/OFED-4.17-1.tgz
        - Decompress the archive:
             tar xzvf OFED-4.17.1.tgz
        - Install OFED:
             cd OFED-4.17-1
             ./install --all
        - Reboot after installation is complete.
        - Build the irdma driver with the "ofed" option:
             cd /path/to/irdma-
            ./build.sh ofed
        - Continue with the installation steps below.

3.加载驱动程序:
    RHEL and Ubuntu:
        modprobe irdma

       先卸载之前的再加载:

       rmmod irdma; modprobe irdma

    SLES:
        modprobe irdma --allow-unsupported

    Notes:
        - This modprobe step is required only during installation. Normally,
          irdma is autoloaded via a udev rule when ice or i40e is loaded:
             /usr/lib/udev/rules.d/90-rdma-hw-modules.rules
        - For SLES, to automatically allow loading unsupported modules, add the
          following to /etc/modprobe.d/10-unsupported-modules.conf:
              allow_unsupported_modules 1

4.卸载任何先前版本的rdma-core用户空间库。
   For example, in RHEL:
        yum erase rdma-core

    Note: "yum erase rdma-core" will also remove any packages that depend on
          rdma-core, such as perftest or fio. Please re-install them after
          installing rdma-core.

5.Patch,构建和安装rdma-core用户空间库:

    RHEL:

安装依赖:

sudo yum install rpm-build -y &&sudo yum install cmake  -y &&sudo yum install libudev-devel -y &&sudo  yum install libnl3-devel -y &&sudo yum install python-docutils -y &&sudo yum install -y valgrind-devel

  #1 从GitHub下载rdma-core-27.0.tar.gz
        wget https://github.com/linux-rdma/rdma-core/releases/download/v27.0/rdma-core-27.0.tar.gz      #2 Apply patch libirdma-27.0.patch to rdma-core
        tar -xzvf rdma-core-27.0.tar.gz
        cd rdma-core-27.0
        patch -p2 < /path/to/irdma-/libirdma-27.0.patch  #别落了“<” 符号    #3 确保目录rdma-core / redhat和contents 位于“ root”组下
        cd ..
        chgrp -R root rdma-core-27.0/redhat #4 重新打包成适当的名称给building用 ( "tgz" 扩展名代替 "tar.gz")
        tar -zcvf rdma-core-27.0.tgz rdma-core-27.0     #5 构建 rdma-core
        mkdir -p ~/rpmbuild/SOURCES
        mkdir -p ~/rpmbuild/SPECS
        cp rdma-core-27.0.tgz ~/rpmbuild/SOURCES/
        cd ~/rpmbuild/SOURCES
        tar -xzvf rdma-core-27.0.tgz
        cp ~/rpmbuild/SOURCES/rdma-core-27.0/redhat/rdma-core.spec ~/rpmbuild/SPECS/
        cd ~/rpmbuild/SPECS/
        rpmbuild -ba rdma-core.spec #6 安装RPMs
        cd ~/rpmbuild/RPMS/x86_64
        yum install *27.0*.rpm

VERSION=27.0&&cd ..&&chgrp -R root rdma-core-${VERSION}/redhat&&tar -zcvf rdma-core-${VERSION}.tgz rdma-core-${VERSION}&&mkdir -p ~/rpmbuild/SOURCES&&mkdir -p ~/rpmbuild/SPECS&&cp rdma-core-${VERSION}.tgz ~/rpmbuild/SOURCES/&&cd ~/rpmbuild/SOURCES&&tar -xzvf rdma-core-${VERSION}.tgz&&cp ~/rpmbuild/SOURCES/rdma-core-${VERSION}/redhat/rdma-core.spec ~/rpmbuild/SPECS/&&cd ~/rpmbuild/SPECS/&&rpmbuild -ba rdma-core.spec&&cd ~/rpmbuild/RPMS/x86_64&&yum install *${VERSION}*.rpm

确认安装完全: 

Installed: <----------------------------显示已经安装上的
  ibacm.x86_64 0:35.0-1.el7                          infiniband-diags.x86_64 0:35.0-1.el7
  infiniband-diags-compat.x86_64 0:35.0-1.el7        iwpmd.x86_64 0:35.0-1.el7
  libibumad.x86_64 0:35.0-1.el7                      libibverbs.x86_64 0:35.0-1.el7
  libibverbs-utils.x86_64 0:35.0-1.el7               librdmacm.x86_64 0:35.0-1.el7
  librdmacm-utils.x86_64 0:35.0-1.el7                rdma-core-debuginfo.x86_64 0:35.0-1.el7
  srp_daemon.x86_64 0:35.0-1.el7

Complete!
[root@rdma59 x86_64]# ls <----------------------列出要安装的
ibacm-35.0-1.el7.x86_64.rpm                    librdmacm-35.0-1.el7.x86_64.rpm
infiniband-diags-35.0-1.el7.x86_64.rpm         librdmacm-utils-35.0-1.el7.x86_64.rpm
infiniband-diags-compat-35.0-1.el7.x86_64.rpm  rdma-core-35.0-1.el7.x86_64.rpm
iwpmd-35.0-1.el7.x86_64.rpm                    rdma-core-debuginfo-35.0-1.el7.x86_64.rpm
libibumad-35.0-1.el7.x86_64.rpm                rdma-core-devel-35.0-1.el7.x86_64.rpm
libibverbs-35.0-1.el7.x86_64.rpm               srp_daemon-35.0-1.el7.x86_64.rpm
libibverbs-utils-35.0-1.el7.x86_64.rpm
对比是否安装完全

(patch命令的参数:https://www.jb51.net/article/98185.htm,已经打过补丁的可以删掉源码,再重新解压出新源码任何再重新打。)

    SLES:
        # Download rdma-core-27.0.tar.gz from GitHub
        wget https://github.com/linux-rdma/rdma-core/releases/download/v27.0/rdma-core-27.0.tar.gz
        # Apply patch libirdma-27.0.patch to rdma-core
        tar -xzvf rdma-core-27.0.tar.gz
        cd rdma-core-27.0
        patch -p2 < /path/to/irdma-/libirdma-27.0.patch
        cd ..
        # Zip the rdma-core directory into a tar.gz archive
        tar -zcvf rdma-core-27.0.tar.gz rdma-core-27.0
        # Create an empty placeholder baselibs.conf file
        touch /usr/src/packages/SOURCES/baselibs.conf
        # Build rdma-core
        cp rdma-core-27.0.tar.gz /usr/src/packages/SOURCES
        cp rdma-core-27.0/suse/rdma-core.spec /usr/src/packages/SPECS/
        cd /usr/src/packages/SPECS/
        rpmbuild -ba rdma-core.spec --without=curlmini
        cd /usr/src/packages/RPMS/x86_64
        rpm -ivh --force *27.0*.rpm

    Ubuntu:
        To create Debian packages from rdma-core:
        # Download rdma-core-27.0.tar.gz from GitHub
        wget https://github.com/linux-rdma/rdma-core/releases/download/v27.0/rdma-core-27.0.tar.gz
        # Apply patch libirdma-27.0.patch to rdma-core
        tar -xzvf rdma-core-27.0.tar.gz
        cd rdma-core-27.0
        patch -p2 < /path/to/irdma-/libirdma-27.0.patch
        # Build rdma-core
        dh clean --with python3,systemd --builddirectory=build-deb
        dh build --with systemd --builddirectory=build-deb
        sudo dh binary --with python3,systemd --builddirectory=build-deb
        # This creates .deb packages in the parent directory
        # To install the .deb packages
        sudo dpkg -i ../*.deb

6.将以下内容添加到/etc/security/limits.conf:   * soft memlock unlimited
        * hard memlock unlimited
        * soft nofile 1048000
        * hard nofile 1048000
   This avoids any limits on user mode applications as far as pinned memory and number of open files used.

6.安装irdma驱动程序和rdma-core软件包后,reboot服务器。

--------------------------


确认RDMA功能(Confirm RDMA Functionality)


--------------------------

After successful installation, RDMA devices are listed in the output of
"ibv_devices". For example:
    # ibv_devices
    device                 node GUID
    ------              ----------------rdmap175s0f0        40a6b70b6f300000
    rdmap175s0f1        40a6b70b6f310000

Notes:
    - Device names may differ depending on OS or kernel.
    - Node GUID is different for the same device in iWARP vs. RoCEv2 mode.

每个RDMA设备都与一个网络接口关联。 sysfs文件系统
可以帮助说明这些设备之间的关系。例如:

-要显示与“ ens801f0”网络接口关联的RDMA设备,请执行以下操作:
         # ls /sys/class/net/ens801f0/device/infiniband/
         rdmap175s0f0

-显示与“ rdmap175s0f0” RDMA设备关联的网络接口:
         # ls /sys/class/infiniband/rdmap175s0f0/device/net/
         ens801f0

在运行RDMA应用程序之前,请确保所有主机都给与RDMA设备关联的网络接口分配IP地址。 RDMA设备使用对应的网络接口的IP配置。RDMA设备不需要其他配置。

Before running RDMA applications, ensure that all hosts have IP addresses
assigned to the network interface associated with the RDMA device. The RDMA
device uses the IP configuration from the corresponding network interface.
There is no additional configuration required for the RDMA device.

要确认RDMA功能,请运行rping:

1)启动rping服务器:
          rping -sdvVa [server IP address]

 2)启动rping客户端:
          rping -cdvVa [server IP address] -C 10

  3)rping将运行10次迭代(-C 10)并在控制台打印payload 数据。

    Notes:
        - Confirm rping functionality both from each host to itself and between
          hosts. For example:
            * Run rping server and client both on Host A.
            * Run rping server and client both on Host B.
            * Run rping server on Host A and rping client on Host B.
            * Run rping server on Host B and rping client on Host A.
        - When connecting multiple rping clients to a persistent rping server,
          older kernels may experience a crash related to the handling of cm_id
          values in the kernel stack. With E810, this problem typically appears
          in the system log as a kernel oops and stack trace pointing to
          irdma_accept. The issue has been fixed in kernels 5.4.61 and later.
          For patch details, see:

笔记:
-确认rping在主机自身和主机与主机之间通信正常(  - Confirm rping functionality both from each host to itself and between)
例如:
*在主机A上同时运行rping服务器和客户端。
*在主机B上同时运行rping服务器和客户端。
*在主机A上运行rping服务器,在主机B上运行rping客户端。
*在主机B上运行rping服务器,在主机A上运行rping客户端。
-将多个rping客户端连接到永久性rping服务器时,较旧的内核可能会遇到与内核堆栈中处理cm_id的值有关的崩溃。使用E810,通常会出现此问题
在系统日志中作为内核oop和指向的堆栈跟踪irdma_accept。此问题已在内核5.4.61及更高版本中修复。
有关补丁程序的详细信息,请参阅:
          https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/infiniband/core/ucma.c?h=v5.9-rc2&id=7c11910783a1ea17e88777552ef146cace607b3c

----------------------


iWARP / RoCEv2选择(iWARP/RoCEv2 Selection)


----------------------

X722:

X722适配器仅支持iWARP传输。

E810:

E810控制器支持iWARP和RoCEv2传输。默认情况下,irdma驱动程序以iWARP模式加载。可以使用模块参数“roce_ena=1”(对于所有端口)全局选择RoCEv2,或对于使用devlink接口的单个端口。

The E810 controller supports both iWARP and RoCEv2 transports. By default, the
irdma driver is loaded in iWARP mode. RoCEv2 may be selected either globally
(for all ports) using the module parameter "roce_ena=1" or for individual ports
using the devlink interface.

--- 全局选择

临时:

RoCE

- 如果当前已加载irdma驱动程序,请首先将其卸载:
        rmmod irdma
  - 在RoCEv2模式下重新加载驱动程序:
        modprobe irdma roce_ena=1

iWarp

- 如果当前已加载irdma驱动程序,请首先将其卸载:
        rmmod irdma
  - 在RoCEv2模式下重新加载驱动程序:
        modprobe irdma roce_ena=0

永久:

 配置文件自动加载:修改/etc/modprobe.d/irdma.conf添加下面一行: options irdma roce_ena=1

echo "options irdma roce_ena=1" >> /etc/modprobe.d/irdma.conf

手动加载:
  - 如果当前已加载irdma驱动程序,请首先将其卸载:
        rmmod irdma
  - 在RoCEv2模式下重新加载驱动程序:
        modprobe irdma roce_ena=1

--- 分端口选择
E810 interfaces may be configured per interface for iWARP mode (default) or
RoCEv2 via devlink parameter configuration. See the "Devlink Configuration"
section below for instructions on per-port iWARP/RoCEv2 selection.

-------------------------

iWARP端口映射器(iwpmd)

iWARP Port Mapper (iwpmd)
-------------------------
The iWARP port mapper service (iwpmd) coordinates with the host network stack
and manages TCP port space for iWARP applications.

iwpmd is automatically loaded when ice or i40e is loaded via udev rules in
/usr/lib/udev/rules.d/90-iwpmd.rules.

To verify iWARP port mapper status:
    systemctl status iwpmd

---------------------

FC|PFC流控制设置(Flow Control Settings)

---------------------

文档《Intel® Ethernet 800 Series Linux Flow Control》

X722:
The X722 adapter supports only link-level flow control (LFC).

E810:
E810控制器支持链路级流量控制(LFC)和优先级
流量控制(PFC)。使用rocev2模式的E810 时强烈建议启用流量控制。

--- Link Level Flow Control (LFC) (E810 and X722)

To enable link-level flow control on E810 or X722, use "ethtool -A".
For example, to enable LFC in both directions (rx and tx):
    ethtool -A DEVNAME rx on tx on

Confirm the setting with "ethtool -a":
    ethtool -a DEVNAME

Sample output:
    Pause parameters for interface:
    Autonegotiate: on
    RX: on
    TX: on
    RX negotiated:  on
    TX negotiated:  on

Full enablement of LFC requires the switch or link partner be configured for
rx and tx pause frames. Refer to switch vendor documentation for more details.

---优先级流量控制(PFC)(仅限E810)

优先流控制(PFC)在E810上支持两种模式:willing 和non-willing 模式.

E810还有两种数据中心桥接(DCB)模式:software和firmware。

有关软件和固件模式的更多背景信息,请参阅E810 ice驱动程序README。

- For PFC willing mode, firmware DCB is recommended.
- For PFC non-willing mode, software DCB must be used.

注意:E810最多支持4个流量类(TCs),其中一个可以启用PFC。(E810 supports a maximum of 4 traffic classes (TCs), one of which may
      have PFC enabled.)

*** PFC willing mode

在willing模式下,E810“willing”接受来自其链路伙伴的DCB设置。DCB配置在链路伙伴(通常是交换机)上,并且
E810将自动发现DCB设置并将其应用到自己的端口。这简化了更大集群中的DCB配置,并消除了需要在链路两侧独立配置DCB。

在E810上以willing模式启用PFC,请使用Ethtool启用固件(firmware )DCB。
启用固件DCB自动置NIC在willing 模式下:
ethtool --set-priv-flags devname fw-lldp-agent

要确认设置,请使用以下命令:
ethtool  -  show-priv-flags devname

期望输出:
  fw-lldp-agent     :on

注意:启用固件DCB时,E810 NIC可能会遇到适配器范围重置,

因为DCBX willing的配置根据链接伙伴传播(过来的配置)修改,删除了启用RDMA的流量类(TC)。
这通常发生删除与优先级0(RDMA priority 0,默认优先级)关联的TC。重置导致适配器重新初始化而暂时的连接中断。

(Note: When firmware DCB is enabled, the E810 NIC may experience an adapter-wide
      reset when the DCBX willing configuration change propagated from the link
      partner removes an RDMA-enabled traffic class (TC). This typically occurs
      when removing a TC associated with priority 0 (the default priority for
      RDMA). The reset results in a temporary loss of connectivity as the
      adapter re-initializes.)

交换机DCB和PFC配置语法因供应商而异。请您参阅交换机手册获取有关详细信息。Arista交换机配置命令示例:

Switch DCB and PFC configuration syntax varies by vendor. Consult your switch
manual for details. Sample Arista switch configuration commands:


-  示例:在交换机端口21上为优先级0(priority 0)启用PFC
*进入交换机端口21的配置模式:
         switch#configure
         switch(config)#interface ethernet 21/1
 *打开PFC:
         switch(config-if-Et21/1)#priority-flow-control mode on
  *为“no-drop”设置优先级0(即PFC启用):
         switch(config-if-Et21/1)#priority-flow-control priority 0 no-drop
*验证交换机端口PFC配置:
         switch(config-if-Et21/1)#show priority-flow-control
-示例:在交换机端口21上启用DCBX
*在IEEE模式下启用DCBX:
         switch(config-if-Et21/1)#dcbx mode ieee
*显示DCBX设置(包括邻居端口设置):
         switch(config-if-Et21/1)#show dcbx

*** PFC non-willing mode

In non-willing mode, DCB settings must be configured on both E810 and its link
partner. Non-willing mode is software-based. OpenLLDP (lldpad and lldptool) is
recommended.

To enable non-willing PFC on E810:
  1. Disable firmware DCB. Firmware DCB is always willing. If enabled, it
     will override any software settings.
         ethtool --set-priv-flags DEVNAME fw-lldp-agent off
  2. Install OpenLLDP
         yum install lldpad
  3. Start the Open LLDP daemon:
        lldpad -d
  4. Verify functionality by showing current DCB settings on the NIC:
        lldptool -ti
  5. Configure your desired DCB settings, including traffic classes,
     bandwidth allocations, and PFC.
     The following example enables PFC on priority 0, maps all priorities to
     traffic class (TC) 0, and allocates all bandwidth to TC0.
     This simple configuration is suitable for enabling PFC for all traffic,
     which may be useful for back-to-back benchmarking. Datacenters will
     typically use a more complex configuration to ensure quality-of-service
     (QoS).
     a. Enable PFC for priority 0:
           lldptool -Ti -V PFC willing=no enabled=0
     b. Map all priorities to TC0 and allocate all bandwidth to TC0:
           lldptool -Ti -V ETS-CFG willing=no \
           up2tc=0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0 \
           tsa=0:ets,1:strict,2:strict,3:strict,4:strict,5:strict,6:strict,7:strict \
           tcbw=100,0,0,0,0,0,0,0
  6. Verify output of "lldptool -ti ":
        Chassis ID TLV
            MAC: 68:05:ca:a3:89:78
        Port ID TLV
            MAC: 68:05:ca:a3:89:78
        Time to Live TLV
            120
        IEEE 8021QAZ ETS Configuration TLV
            Willing: no
            CBS: not supported
            MAX_TCS: 8
            PRIO_MAP: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
            TC Bandwidth: 100% 0% 0% 0% 0% 0% 0% 0%
            TSA_MAP: 0:ets 1:strict 2:strict 3:strict 4:strict 5:strict 6:strict 7:strict
        IEEE 8021QAZ PFC TLV
            Willing: no
            MACsec Bypass Capable: no
            PFC capable traffic classes: 8
            PFC enabled: 0
        End of LLDPDU LTV
  7. Configure the same settings on the link partner.

完全启用PFC要求为PFC暂停帧配置交换机或链接伙伴。有关更多详细信息,请参阅交换机供应商文档。

---将RDMA流量指向a traffic class


当使用PFC时,可以将业务(流量)定向到一个或多个业务类别- traffic classes(tc)。
因为RDMA流量绕过内核,Linux流量控制方法无法使用tc, cgroups, or egress-qos-map 。取而代之的(方法是)在您的应用程序命令行设置Type of Service  <---L2层流控?
(ToS) 字段。ToS-to-priority 映射是

Linux中的硬编码如下:


  ToS   Priority
  ---   --------
   0       0
   8       2
  24       4
  16       6

然后使用lldptool或switch工具使用ETS将优先级(Priority)映射到traffic classes。(Priorities are then mapped to traffic classes using ETS using lldptool or switch utilities.)

在应用程序中设置ToS 16的示例:(Examples of setting ToS 16 in an application:)
  ucmatose -t 16
  ib_write_bw -t 16

Alternatively, for RoCEv2, ToS may be set for all RoCEv2 traffic using configfs. For example, to set ToS 16 on device rdma, port 1:
  mkdir /sys/kernel/config/rdma_cm/rdma
  echo 16 > /sys/kernel/config/rdma_cm/rdma/ports/1/default_roce_tos

或者,对于Rocev2,可以使用configf为所有Rocev2流量设置ToS。例如,要在设备 rdma,端口1 上设置ToS 16:

mkdir  / sys / kernel / config / rdma_cm / rdma
echo 16> / sys / kernel / config / rdma_cm / rdma / ports / 1 / default_roce_tos
-----------------


ECN配置


-----------------
X722:
Congestion control settings are not supported on X722 adapters.

E810:
The E810 controller supports the following congestion control algorithms:
    - iWARP DCTCP
    - iWARP TCP New Reno plus ECN
    - iWARP TIMELY
    - RoCEv2 DCQCN
    - RoCEv2 DCTCP
    - RoCEv2 TIMELY

Congestion control settings are accessed through configfs. Additional DCQCN
tunings are available through the devlink interface. See the "Devlink
Configuration" section for details.

--- Configuration in configfs

To access congestion control settings:

1. After driver load, change to the irdma configfs directory:
        cd /sys/kernel/config/irdma

2. Create a new directory for each RDMA device you want to configure.
   Note: Use "ibv_devices" for a list of RDMA devices.
   For example, to create configfs entries for the rdmap device:
        mkdir rdmap

3. List the new directory to get its dynamic congestion control knobs and
   values:
        cd rdmap
        for f in *; do echo -n "$f: "; cat "$f"; done;

    If the interface is in iWARP mode, the files have a "iw_" prefix:
        - iw_dctcp_enable
        - iw_ecn_enable
        - iw_timely_enable

    If the interface is in RoCEv2 mode, the files have a "roce_" prefix:
        - roce_dcqcn_enable
        - roce_dctcp_enable
        - roce_timely_enable

4. Enable or disable the desired algorithms.

   To enable an algorithm: echo 1 >
   For example, to add ECN marker processing to the default TCP New Reno iWARP
   congestion control algorithm:
        echo 1 > /sys/kernel/config/irdma/rdmap/iw_ecn_enable

    To disable an algorithm: echo 0 >
    For example:
        echo 0 > /sys/kernel/config/irdma/rdmap/iw_ecn_enable

    To read the current status: cat

    Default values:
        iwarp_dctcp_en: off
        iwarp_timely_en: off
        iwarp_ecn_en: ON

        roce_timely_en: off
        roce_dctcp_en: off
        roce_dcqcn_en: off

5. Remove the configfs directory created above. Without removing these
   directories, the driver will not unload.
          rmdir /sys/kernel/config/irdma/rdmap

---------------------
Devlink配置
---------------------
X722:
Devlink parameter configuration is not supported on the X722 adapters.

E810:
The E810 controller supports devlink configuration for the following controls:
  - iWARP/RoCEv2 per-port selection
  - DCQCN congestion control tunings
  - Fragment count limit

--- Devlink OS support

Devlink dev parameter configuration is a recent Linux capability that requires
both iproute2 tool support as well as kernel support.

The following OS/Kernel versions support devlink dev parameters:
    - RHEL 8 or later
    - SLES 15 SP1 or later
    - Ubuntu 18.04 or later
    - Linux kernel 4.19 or later

iproute2 may need to be updated to add parameter capability to the devlink
configuration. The latest released version can be downloaded and installed
from: https://github.com/shemminger/iproute2/releases

--- Devlink parameter configuration (E810 only)

1.  Get PCIe bus-info of the desired interface using "ethtool -i":
        ethtool -i DEVNAME

    Example:
        # ethtool -i enp175s0f0
        driver: ice
        version: 0.11.7
        firmware-version: 0.50 0x800019de 1.2233.0
        expansion-rom-version:
        bus-info: 0000:af:00.0
        supports-statistics: yes
        supports-test: yes
        supports-eeprom-access: yes
        supports-register-dump: yes
        supports-priv-flags: yes

    bus-info is 0000:af:00.0

2.  Find the devlink name 'ice_rdma.x' in the /sys/devices folder:
        ls /sys/devices/*/*// | grep ice_rdma

    Example:
        ls /sys/devices/*/*/0000:af:00.0/ | grep ice_rdma
        ice_rdma.16

3.  To display available parameters:
        devlink dev param show

    RDMA devlink parameters for E810:
        roce_enable
            Selects RDMA transport: RoCEv2 (true) or iWARP (false)
        resource_limits_selector
            Limits available queue pairs (QPs). See "Resource Limits Selector"
            section for details and values.
        dcqcn_enable
            Enables the DCQCN algorithm for RoCEv2.
            Note: "roce_enable" must also be set to "true".
        dcqcn_cc_cfg_valid
            Indicates that all DCQCN parameters are valid and should be updated
            in registers or QP context.
        dcqcn_min_dec_factor
            The minimum factor by which the current transmit rate can be
            changed when processing a CNP. Value is given as a percentage
            (1-100).
        dcqcn_min_rate
            The minimum value, in Mbits per second, for rate to limit.
        dcqcn_F
            The number of times to stay in each stage of bandwidth recovery.
        dcqcn_T
            The number of microseconds that should elapse before increasing the
            CWND in DCQCN mode.
        dcqcn_B
            The number of bytes to transmit before updating CWND in DCQCN mode.
        dcqcn_rai_factor
            The number of MSS to add to the congestion window in additive
            increase mode.
        dcqcn_hai_factor
            The number of MSS to add to the congestion window in hyperactive
            increase mode.
        dcqcn_rreduce_mperiod
            The minimum time between 2 consecutive rate reductions for a single
            flow. Rate reduction will occur only if a CNP is received during
            the relevant time interval.
        fragment_count_limit
            Set fragment count limit to adjust maximum values for queue depth
            and inline data size.

4.  To set a parameter:
        devlink dev param set platform/ name value cmode driverinit

    Example: Enable RoCEv2, enable DCQCN, and set min_dec_factor=5 on ice_rdma.17:
        devlink dev param set platform/ice_rdma.17 name roce_enable value true cmode driverinit
        devlink dev param set platform/ice_rdma.17 name dcqcn_enable value true cmode driverinit
        devlink dev param set platform/ice_rdma.17 name dcqcn_min_dec_factor value 5 cmode driverinit

5.  Reload the device port with new mode:
        devlink dev reload platform/

    Example:
        devlink dev reload platform/ice_rdma.16

    Note: This does not reload the driver, so other ports are unaffected.

-------------------


内存需求


-------------------

默认的irdma初始化要求每个端口至少有约210 MB(对于E810)或约160 MB(对于X722)的内存。Default irdma initialization requires a minimum of ~210 MB (for E810) or
~160 MB (for X722) of memory per port.

对于内存量受限的服务器,可以通过使用以下资源配置文件设置加载驱动程序来降低E810或X722的可用资源,从而减少所需的内存:

For servers where the amount of memory is constrained, you can decrease the
required memory by lowering the resources available to E810 or X722 by loading
the driver with the following resource profile setting:

    modprobe irdma rsrc_profile=2

要在加载驱动程序时自动应用该设置,请添加以下内容到 /etc/modprobe.d/irdma.conf:

To automatically apply the setting when the driver is loaded, add the following
to /etc/modprobe.d/irdma.conf:

    options irdma rsrc_profile=2

注意:这可能会对性能和扩展产生影响,因为队列对和其他RDMA资源的数量会减少,以便将每个端口的内存使用率降低到大约55 MB(对于E810)或51 MB(对于X722)。

Note: This can have performance and scaling impacts as the number of queue
pairs and other RDMA resources are decreased in order to lower memory usage to
approximately 55 MB (for E810) or 51 MB (for X722) per port.

-----------------------


资源限制配置文件


-----------------------

在默认资源配置文件中,为每个适配器配置的 RDMA 资源如下:

    E810 (2 ports):
        Queue Pairs: 4092
        Completion Queues: 8189
        Memory Regions: 4194302
    X722 (4 ports):
        Queue Pairs: 1020
        Completion Queues: 2045
        Memory Regions: 2097150

For resource profile 2, the configuration is:

对于资源配置文件 2,配置为:

    E810 (2 ports):
        Queue Pairs: 508
        Completion Queues: 1021
        Memory Regions: 524286

    X722 (4 ports):
        Queue Pairs: 252
        Completion Queues: 509
        Memory Regions: 524286

------------------------
资源限制选择器
------------------------

除了资源配置文件,您还可以通过“limits_sel”模块参数限制资源:In addition to resource profile, you can further limit resources via the
"limits_sel" module parameter:

E810:
    modprobe irdma limits_sel=<0-6>
X722:
    modprobe irdma gen1_limits_sel=<0-5>

要在加载驱动程序时自动应用此设置,请添加下面的内容到 /etc/modprobe.d/irdma.conf:

To automatically apply this setting when the driver is loaded, add the
following to /etc/modprobe.d/irdma.conf:


    options irdma limits_sel=

以下值适用于 2 端口 的E810 NIC。

The values below apply to a 2-port E810 NIC.
        0 - Default, up to 4092 QPs
        1 - Minimum, up to 124 QPs
        2 - Up to 1020 QPs
        3 - Up to 2044 QPs
        4 - Up to 16380 QPs
        5 - Up to 65532 QPs
        6 - Maximum, up to 131068 QPs

对于 X722,资源限制选择器默认值为 2。单个端口最多支持 64k QP,一个 4 端口 X722 每个端口支持最多 16k QP。

For X722, the resource limit selector defaults to a value of 2. A single port
supports a maximum of 64k QPs, and a 4-port X722 supports up to 16k QPs per
port.

---------------


RDMA统计


---------------
RDMA protocol statistics for E810 or X722 are found in sysfs. To display all
counters and values:
    cd /sys/class/infiniband/rdmap/hw_counters;
    for f in *; do echo -n "$f: "; cat "$f"; done;

The following counters will increment when RDMA applications are transferring
data over the network in iWARP mode:
    - tcpInSegs
    - tcpOutSegs

Available counters:
    ip4InDiscards       IPv4 packets received and discarded.
    ip4InReasmRqd       IPv4 fragments received by Protocol Engine.
    ip4InMcastOctets    IPv4 multicast octets received.
    ip4InMcastPkts      IPv4 multicast packets received.
    ip4InOctets         IPv4 octets received.
    ip4InPkts           IPv4 packets received.
    ip4InTruncatedPkts  IPv4 packets received and truncated due to insufficient
                          buffering space in UDA RQ.
    ip4OutSegRqd        IPv4 fragments supplied by Protocol Engine to the lower
                          layers for transmission
    ip4OutMcastOctets   IPv4 multicast octets transmitted.
    ip4OutMcastPkts     IPv4 multicast packets transmitted.
    ip4OutNoRoutes      IPv4 datagrams discarded due to routing problem (no hit
                          in ARP table).
    ip4OutOctets        IPv4 octets supplied by the PE to the lower layers for
                           transmission.
    ip4OutPkts          IPv4 packets supplied by the PE to the lower layers for
                          transmission.
    ip6InDiscards       IPv6 packets received and discarded.
    ip6InReasmRqd       IPv6 fragments received by Protocol Engine.
    ip6InMcastOctets    IPv6 multicast octets received.
    ip6InMcastPkts      IPv6 multicast packets received.
    ip6InOctets         IPv6 octets received.
    ip6InPkts           IPv6 packets received.
    ip6InTruncatedPkts  IPv6 packets received and truncated due to insufficient
                          buffering space in UDA RQ.
    ip6OutSegRqd        IPv6 fragments received by Protocol Engine
    ip6OutMcastOctets   IPv6 multicast octets transmitted.
    ip6OutMcastPkts     IPv6 multicast packets transmitted.
    ip6OutNoRoutes      IPv6 datagrams discarded due to routing problem (no hit
                           in ARP table).
    ip6OutOctets        IPv6 octets supplied by the PE to the lower layers for
                           transmission.
    ip6OutPkts          IPv6 packets supplied by the PE to the lower layers for
                           transmission.
    iwInRdmaReads       RDMAP total RDMA read request messages received.
    iwInRdmaSends       RDMAP total RDMA send-type messages received.
    iwInRdmaWrites      RDMAP total RDMA write messages received.
    iwOutRdmaReads      RDMAP total RDMA read request messages sent.
    iwOutRdmaSends      RDMAP total RDMA send-type messages sent.
    iwOutRdmaWrites     RDMAP total RDMA write messages sent.
    iwRdmaBnd           RDMA verbs total bind operations carried out.
    iwRdmaInv           RDMA verbs total invalidate operations carried out.
    RxECNMrkd           Number of packets that have the ECN bits set to
                           indicate congestion
    cnpHandled          Number of Congestion Notification Packets that have
                           been handled by the reaction point.
    cnpIgnored          Number of Congestion Notification Packets that have
                           been ignored by the reaction point.
    rxVlanErrors        Ethernet received packets with incorrect VLAN_ID.
    tcpRetransSegs      Total number of TCP segments retransmitted.
    tcpInOptErrors      TCP segments received with unsupported TCP options or
                           TCP option length errors.
    tcpInProtoErrors    TCP segments received that are dropped by TRX due to
                           TCP protocol errors.
    tcpInSegs           TCP segments received.
    tcpOutSegs          TCP segments transmitted.
    cnpSent             Number of Congestion Notification Packets that have
                           been sent by the reaction point.
    RxUDP               UDP segments received without errors
    TxUDP               UDP segments transmitted without errors

--------


perftest测试工具

perftest包是一组RDMA微基准程序,用于测试使用verbs api的RDMA的带宽和延迟。源码在:https://github.com/linux-rdma/perftest

建议使用perftest-4.4-0.29。

安装步骤见:https://blog.csdn.net/bandaoyu/article/details/115798045

perftest的早期版面向iWARP有一些问题,所以不建议使用4.4-0.4到4.4-0.18版本,现在这些问题已经被解决了。

To run a basic ib_write_bw test:
    1. Start server
           ib_write_bw -R
    2. Start client:
           ib_write_bw -R
    3. Benchmark will run to completion and print performance data on both
       client and server consoles.

基本用法:

server:

ib_write_bw -R -d iwp175s0f0

client:

ib_write_bw -R  -d iwp175s0f0  -i 1 192.169.31.164 -n 1000 -s 4K

运行到完成后打印性能数据。

注意:

笔记:

-iWARP需要“-R”选项,RoCEv2可选。

-在perftest命令行上使用“-d”来使用特定的RDMA device。

  - For ib_read_bw, use "-o 1" for testing with 3rd-party link partners.
  - For ib_send_lat and ib_write lat, use "-I 96" to limit inline data size

  to the supported value.

-iWARP只支持RC连接。

RoCEv2支持RC和UD。

不支持XRC、UC和DC连接类型。

-E810或X722上不支持原子操作。

-----------
MPI测试
-----------
--- Intel MPI
Intel MPI uses the OpenFabrics Interfaces (OFI) framework and libfabric user
space libraries to communicate with network hardware.

* Recommended Intel MPI versions:
    Single-rail: Intel MPI 2019u8
    Multi-rail:  Intel MPI 2019u3

  Note: Intel MPI 2019u4 is not recommended due to known incompatibilities with
        iWARP.

* Recommended libfabric version: libfabric-1.11.0

  The Intel MPI package includes a version of libfabric. This "internal"
  version is automatically installed along with Intel MPI and used by default.
  To use a different ("external") version of libfabric with Intel MPI:
      1. Download libfabric from https://github.com/ofiwg/libfabric.
      2. Build and install it according to the libfabric documentation.
      3. Configure Intel MPI to use a non-internal version of libfabric:
             export I_MPI_OFI_LIBRARY_INTERNAL=0
         or  source /intel64/bin/mpivars.sh -ofi_internal=0
      4. Verify your libfabric version by using the I_MPI_DEBUG environment
         variable on the mpirun command line:
             -genv I_MPI_DEBUG=1
         The libfabric version will appear in the mpirun output.

* Sample command line for a 2-process pingpong test:

     mpirun -l -n 2 -ppn 1 -host myhost1,myhost2 -genv I_MPI_DEBUG=5 \
     -genv FI_VERBS_MR_CACHE_ENABLE=1 -genv FI_VERBS_IFACE= \
     -genv FI_OFI_RXM_USE_SRX=0 -genv FI_PROVIDER='verbs;ofi_rxm' \
     /path/to/IMB-MPI1 Pingpong

  Notes:
   - Example is for libfabrics 1.8 or greater. For earlier versions, use
     "-genv FI_PROVIDER='verbs'"
   - For Intel MPI 2019u6, use "-genv MPIR_CVAR_CH4_OFI_ENABLE_DATA=0".
   - When using Intel MPI, it's recommended to enable only one interface on
     your networking device to avoid MPI application connectivity issues or
     hangs. This issue affects all Intel MPI transports, including TCP and
     RDMA. To avoid the issue, use "ifdown " or "ip link set down
     " to disable all network interfaces on your adapter except for
     the one used for MPI.

--- OpenMPI

* OpenMPI version 4.0.3 is recommended.

-----------
性能
-----------
RDMA performance may be optimized by adjusting system, application, or driver
settings.

- Flow control is required for best performance in RoCEv2 mode and is optional
  in iWARP mode. Both link-level flow control (LFC) and priority flow control
  (PFC) are supported, but PFC is recommended. See the "Flow Control Settings"
  section of this document for configuration details.

- For bandwidth applications, multiple queue pairs (QPs) are required for best
  performance. For example, in the perftest suite, use "-q 8" on the command
  line to run with 8 QP.

- For best results, configure your application to use CPUs on the same NUMA
  node as your adapter. For example:
    * To list CPUs local to your NIC:
        cat /sys/class/infiniband//device/local_cpulist
    * To specify CPUs (e.g., 27-47) when running a perftest application:
        taskset -c 24-47 ib_write_bw
    * To specify CPUs when running an Intel MPI application:
        mpirun -genv I_MPI_PIN_PROCESSOR_LIST=24-47 ./my_prog

- System and BIOS tunings may also improve performance. Settings vary by
  platform - consult your OS and BIOS documentation for details.
  In general:
    * Disable power-saving features such as P-states and C-states
    * Set BIOS CPU power policies to "Performance" or similar
    * Set BIOS CPU workload configuration to "I/O Sensitive" or similar
    * On RHEL 7.*/8.*, use the "latency-performance" tuning profile:
         tuned-adm profile latency-performance

----------------
互通性, 互操性
----------------

--- Mellanox

E810 and X722 support interop with Mellanox RoCEv2-capable adapters.

In tests like ib_send_bw, use -R option to select rdma_cm for connection
establishment. You can also use gid-index with -x option instead of -R:

Example:
    On E810 or X722:  ib_send_bw -F -n 5 -x 0
    On Mellanox:      ib_send_bw -F -n 5 -x

    ...where x specifies the gid index value for RoCEv2.

Look in /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types directory for
port 1.

Note: Using RDMA reads with Mellanox may result in poor performance if there is
      packet loss.

--- Chelsio

X722 supports interop with Chelsio iWARP devices.

Load Chelsio T4/T5 RDMA driver (iw_cxgb4) with parameter "dack_mode" set to 0.

    modprobe iw_cxgb4 dack_mode=0

To automatically apply this setting when the iw_cxgb4 driver is loaded, add the
following to /etc/modprobe.d/iw_cxgb4.conf:
    options iw_cxgb4 dack_mode=0


---------------


动态追踪


---------------
Dynamic tracing is available for irdma's connection manager.
Turn on tracing with the following command:
    echo 1 > /sys/kernel/debug/tracing/events/irdma_cm/enable

To retrieve the trace:
    cat /sys/kernel/debug/tracing/trace

-------------


动态调试


-------------
irdma support Linux dynamic debug.

To enable all dynamic debug messages upon irdma driver load, use the "dyndbg"
module parameter:
    modprobe irdma dyndbg='+p'

Debug messages will then appear in the system log or dmesg.

Enabling dynamic debug can be extremely verbose and is not recommended for
normal operation. For more info on dynamic debug, including tips on how to
refine the debug output, see:
   https://www.kernel.org/doc/html/v4.11/admin-guide/dynamic-debug-howto.html

-----------------------------------


使用tcpdump捕获RDMA流量


-----------------------------------

RDMA通信绕过内核,Linux的tcpdump 通常不可用。您可以通过在交换机上使用端口镜像来使用tcpdump捕获RDMA通信。

1.将3个主机连接到交换机:
 -  2个做RDMA通信主机。
 -  1个做监控流量主机。


2.配置交换机将一个通信主机连接的交换机端口镜像到监控主机连接的交换机端口。
怎么配置镜像请参阅交换机文档。
(Configure the switch to mirror traffic from one compute node's switch port
   to the monitoring host's switch port. Consult your switch documentation
   for syntax.)


3.在监控主机上卸载IRDMA驱动程序:
    # rmmod irdma
irdma驱动处于加载状态,则可能无法正确捕获流量。


4.在监控主机上启动tcpdump 。例如:
    # tcpdump -nXX -i


5.在2通信主机之间进行RDMA通信。 RDMA数据包将出现在
监控主机上的tcpdump上。

-------------------
已知问题/说明
-------------------

X722:
* Support for the Intel(R) Ethernet Connection X722 iWARP RDMA VF driver
(i40iwvf) has been discontinued.

* There may be incompatible drivers in the initramfs image. You can either
update the image or remove the drivers from initramfs.

Specifically, look for i40e, ib_addr, ib_cm, ib_core, ib_mad, ib_sa, ib_ucm,
ib_uverbs, iw_cm, rdma_cm, rdma_ucm in the output of the following command:
  lsinitrd |less
If you see any of those modules, rebuild initramfs with the following command
and include the name of the module in the "" list. For example:
  dracut --force --omit-drivers "i40e ib_addr ib_cm ib_core ib_mad ib_sa
  ib_ucm ib_uverbs iw_cm rdma_cm rdma_ucm"


E810:
* Linux SRIOV for RDMA on E810 is currently not supported.

* RDMA is not supported when E810 is configured for more than 4 ports.

* E810 is limited to 4 traffic classes (TCs), one of which may be enabled for
  priority flow control (PFC).

* When using RoCEv2 on Linux kernel version 5.9 or earlier, some iSER operations
may experience errors related to iSER's handling of work requests. To work
around this issue, set the E810 fragment_count_limit devlink parameter to 13.
Refer to the "Devlink Configuration" section for details on setting devlink
parameters.

X722 and E810:
* Some commands (such as 'tc qdisc add' and 'ethtool -L') will cause the ice
driver to close the associated RDMA interface and reopen it. This will disrupt
RDMA traffic for a few seconds until the RDMA interface is available again.

* NOTE: Installing the ice driver, on RHEL, currently installs ice into initrd.
The implication is that the ice driver will be loaded on boot. The installation
process will also install any currently installed version of irdma into initrd.
This might result in an unintended version of irdma being installed. Depending
on your desired configuration and behavior of ice and irdma please look at the
following instructions to ensure the desired drivers are installed correctly.

    A. Desired that both ice and irdma are loaded on boot (default)
        1. Follow installation procedure for the ice driver
        2. Follow installation procedure for the irdma driver

    B. Desired that only ice driver is loaded on boot
        1. Untar ice driver
        2. Follow installation procedure for ice driver
        3. Untar irdma driver
        4. Follow installation procedure for irdma driver
        5. % dracut --force --omit-drivers "irdma"

    C. Desired that neither ice nor irdma is loaded on boot
        1. Perform all steps in B
        2. % dracut --force --omit-drivers "ice irdma"

-------
支持
-------
For general information, go to the Intel support website at:
http://www.intel.com/support/ or the Intel Wired Networking project
hosted by Sourceforge at: http://sourceforge.net/projects/e1000

If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
to [email protected]

-------
许可
-------
This software is available to you under a choice of one of two
licenses. You may choose to be licensed under the terms of the GNU
General Public License (GPL) Version 2, available from the file
COPYING in the main directory of this source tree, or the
OpenFabrics.org BSD license below:

  Redistribution and use in source and binary forms, with or
  without modification, are permitted provided that the following
  conditions are met:

  - Redistributions of source code must retain the above
    copyright notice, this list of conditions and the following
    disclaimer.

  - Redistributions in binary form must reproduce the above
    copyright notice, this list of conditions and the following
    disclaimer in the documentation and/or other materials
    provided with the distribution.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

----------
商标
----------
Intel is a trademark or registered trademark of Intel Corporation
or its subsidiaries in the United States and/or other countries.

* Other names and brands may be claimed as the property of others

intel 中文官网

以太网产品支持

适用于 E810 和 X722 英特尔® 以太网控制器的 Linux* RDMA 驱动程序

报错处理

  • 找不到rpmbuild命令

yum install rpm-build

  • xxxx 被 rdma-core-27.0-1.el7.x86_64 需要

 cmake >= 2.8.11 被 rdma-core-27.0-1.el7.x86_64 需要

执行cmake --version,发现没有cmake,执行下面命令安装:

yum install cmake 

        libudev-devel 被 rdma-core-27.0-1.el7.x86_64 需要

yum install libudev-devel

        pkgconfig(libnl-3.0) 被 rdma-core-27.0-1.el7.x86_64 需要
        pkgconfig(libnl-route-3.0) 被 rdma-core-27.0-1.el7.x86_64 需要

  yum install libnl3-devel

        /usr/bin/rst2man 被 rdma-core-27.0-1.el7.x86_64 需要

yum install python-docutils

        valgrind-devel 被 rdma-core-27.0-1.el7.x86_64 需要

yum install  valgrind-devel

        systemd-devel 被 rdma-core-27.0-1.el7.x86_64 需要

/PROCGB/Linux/ice-1.3.2

/25_6/RDMA/Linux  

  • Makefile:71: *** Kernel header files not in any of the expected locations.

CentOS 6.5 编译网卡驱动出错解决办法_aino_新浪博客

Cannot allocate memory

固件和irdma驱动不匹配

IP配置

【linux】centOS7配置网口IP|RDMA配置IP|ens、eno、enp网口的区别_bandaoyu的笔记-CSDN博客_ens1f0和eno1的区别

参考:nmcli添加网卡 并且修改设备名字 添加IP地址_运维Carl的博客-CSDN博客_nmcli修改网卡名

查看device和网口的对应关系

Mellanox:

 ibdev2netdev

因特尔

说明书:一个一个查看

“ rdmap175s0f0” RDMA设备关联的网络接口:
         # ls /sys/class/infiniband/rdmap175s0f0/device/net/
         ens801f0

脚本一次看全部:

ibv_devices|awk '{system("echo "$1"\"-->\"`ls /sys/class/infiniband/"$1"/device/net`")}' |& grep -v '/device/net'

ibv_devices|awk '{system("echo "$1"\"-->\"`ls /sys/class/infiniband/"$1"/device/net`")}'

rocep24s0f3-->ens2f3
rocep24s0f1-->ens2f1
rocep24s0f0-->ens2f0
rocep24s0f2-->ens2f2

或:device--eth--IP

#!/bin/bash 

ip route > /tmp/.route
ibv_devices|awk '{system("echo "$1"\"  \"`ls /sys/class/infiniband/"$1"/device/net`")}' |& grep -Ev '/device/net|device|------'>/tmp/.showdev
echo -e "\033[34mauthor:lcx\033[0m"
for i in `cat /tmp/.showdev|awk '{print $2}'`
do
dev=`cat /tmp/.showdev|grep $i`
ethip=`ip route|grep 'link src'|grep $i|awk '{print $9}'`
if [ -z $ethip ] ; then
    ethip="No IP address"
    echo -e $dev":"$ethip 
else
    #echo -e $dev":"$ethip 

    #白底黑字
    echo -e "\033[44;37m$dev":"$ethip\033[0m"
fi

done

效果:

[root@localhost ~]# ibdev2netdev
rocep175s0f2 ens5f2:No IP address
rocep175s0f3 ens5f3:No IP address
rocep175s0f1 ens5f1:No IP address
rocep175s0f0  ens5f0:192.169.31.166

查看GID

function show_gid()
{
        for device in ` ls /sys/class/infiniband/` #注意此处这是两个反引号,表示运行系统命令
        {
          echo "****************"
          echo "Device:"${device}
          for port in ` ls /sys/class/infiniband/${device}/ports/`
            {
                  echo "IB port:"${port}
                   for gid in `ls /sys/class/infiniband/${device}/ports/${port}/gids`
                   {

               GID=`cat /sys/class/infiniband/${device}/ports/${port}/gids/${gid}` #在此处处理文件即可 
               
			   if [[ $GID == *0000:0000:0000:0000:0000:0000:0000:0000* ]]
               then
			   : #do nothing
               #echo "包含"
               else
               #echo "不包含"
			   	 echo "GID"${gid}":"$GID
               fi
		
              }
           }
        }
}

show_gid

查看RDMA 统计:

说明书:

    cd /sys/class/infiniband/rdmap/hw_counters;
    for f in *; do echo -n "$f: "; cat "$f"; done;

脚本:

#!/bin/bash 
set -o errexit
function ibdev2netdev()
{
ip route > /tmp/.route
ibv_devices|awk '{system("echo "$1"\"  \"`ls /sys/class/infiniband/"$1"/device/net`")}' |& grep -Ev '/device/net|device|------'>/tmp/.showdev

for i in `cat /tmp/.showdev|awk '{print $2}'`
do
dev=`cat /tmp/.showdev|grep $i`
ethip=`ip route|grep 'link src'|grep $i|awk '{print $9}'`
if [ -z $ethip ] ; then
    ethip="No IP address"
    echo $dev":"$ethip 
else
    #echo -e $dev":"$ethip 

    #白底黑字
    echo $dev":"$ethip
fi

done
}


if [ -z $1 ]; then

echo -e "\033[31mError:\033[0m"
echo -e "\033[31mPlease Input rdma device name, example: show_stat rocep24s0f2 \033[0m"
echo "slect one evice:"
ibdev2netdev

else

cd /sys/class/infiniband/$1/hw_counters
for f in *
do echo -n "$f: " 
cat "$f"
done
fi

查看RDMA   Discards

说明书:一个一个看

# cd /sys/class/infiniband/irdma-enp175s0f0/hw_counters
# for f in *Discards; do echo -n "$f: "; cat "$f"; done

脚本:一次看完

#!/bin/bash
function show_drop()
{
        for device in `ls /sys/class/infiniband/`
        {
          echo "                               "
          echo ${device}":"
          cd  /sys/class/infiniband/${device}/hw_counters
         
          for f in *Discards
          {
             echo -n "$f: "
             cat "$f"
          }
          cd -
        }
}
 
show_drop

[root@localhost ~]# show_drop

rocep175s0f0
ip4InDiscards: 80
ip6InDiscards: 364

rocep175s0f1
ip4InDiscards: 0
ip6InDiscards: 0

rocep175s0f2
ip4InDiscards: 0
ip6InDiscards: 0

rocep175s0f3
ip4InDiscards: 0

No such file or directory 8356:error:02001003:system library:fopen:No such process:crypto\bio\bss_file.c:7 4:fopen

使用OpenSSL生成证书,构建根证书前,需要构建随机数文件(.rand),命令如下:

[root]openssl 
进入OpenSSL,在输入rand -out private\.rand 1000
OpenSSL> rand -out private\.rand 1000

 

报错如下:

OpenSSL> rand -out private\.rand 1000
Can't open private\.rand for writing, No such file or directory
8356:error:02001003:system library:fopen:No such process:crypto\bio\bss_file.c:7
4:fopen('private\.rand','wb')
8356:error:2006D080:BIO routines:BIO_new_file:no such file:crypto\bio\bss_file.c
:81:
error in rand
 

竟然找不到路径

查看了OpenSSL的cfg文件也没错

无奈,直接使用了绝对路径,命令如下:

OpenSSL> rand -out \etc\pki\CA\private\.rand 1000 

(linux中使用openssl建立CA【图文】_就是这么范_51CTO博客)

patch参数记录

bluespacezero 2019-09-03 16:38:45  1252  收藏
文章标签: linux
版权
在遇到打了一次补丁之后继续运行patch命令时,patch会提示Reversed (or previously applied) patch detected! Assume -R? [n]。对此:

-t:该参数遇到这种情况直接将打过补丁的文件恢复原样,即未打补丁之前的状态
-f:该参数遇到这种情况则继续打补丁,当然一般情况下会报错,毕竟对比不一致了
-N:忽略该文件
————————————————
原文链接:https://blog.csdn.net/Q_AN1314/article/details/100521766

搞定!

原因分析:

因为把openssl放入了path,因此,在cmd命令行下输入openssl后,回车,进入openssl的命令行,默认在%OPENSSL_HOME%\bin\目录下了

执行openssl rand -out XXX命令时,不需要前的openssl,只需要输入rand -out XX即可

问题来了,我的第一个命令:rand -out private\.rand 1000 ,使用的是相对路径(openssl的工作目录是C:\CA),肯定在%OPENSSL_HOME%\bin\目录下找不到

因此,解决方案二:

cd  C:C/CA

openssl -out private\.rand 1000

搞定!(注意命令前必须有openssl)

OpenSSL命令参考:

OpenSSL构建数字证书

你可能感兴趣的:(linux,运维,服务器)