DPDK 之 RSS

DPDK 之 RSS

RSS(receive side scaling)是由微软提出的一种负载分流方法,通过计算网络数据报文中的网络层&传输层二/三/四元组HASH值,取HASH值的最低有效位(LSB)用于索引间接寻址表RETA(Redirection Table),间接寻址表RETA中的保存索引值用于分配数据报文到不同的CPU接收处理。现阶段RSS基本已有硬件实现,通过这项技术能够将网络流量分载到多个CPU上,降低操作系统单个CPU的占用率。

Redirection Table(RETA)

DPDK 之 RSS_第1张图片
INTEL 82576/82599 RETA 为一个包含位宽 4 BITS128 项的索引映射表,通过取固定取HASH值低7位(LSBs),将其映射到RETA表项。输出索引可在运行时动态调整更新来实现网络流量动态的负载均衡,但是更新不能保证与数据报文同步生效。

:
82576/82599 RETA 4位宽的RSS输出索引意味着RSS最大只支持16队列分流,超过16队列之后的队列RSS无法分流。
INTEL XL710 PF RETA 大小 256, 位宽 6 BITS,支持最大 64 队列分流.
INTEL XL710 VF RETA 大小 64, 位宽 4 BITS,支持最大 16 队列分流.

HASH Function

RSS HASH函数一般采用微软托普利兹算法(Microsoft Toeplitz Based Hash),其中Microsoft(MSFT) RSS定义了IPv4/TcpIPv4/TCPIPv6/IPv6几种HASH计算方法,而后INTEL对其进行了扩展,添加了UdpIPv4/UdpIPv6/STcpIPv4/STcpIPv6支持。

Hash for IPv4 with TCP

Input[12] = @12-15, @16-19, @20-21, @22-23
Result = ComputeHash(Input, 12);

Hash for IPv4 with UDP

Input[12] = @12-15, @16-19, @20-21, @22-23.
Result = ComputeHash(Input, 12);

Hash for IPv4 without TCP

Input[8] = @12-15, @16-19
Result = ComputeHash(Input, 8)

Hash for IPv6 with TCP

Input[36] = @8-23, @24-39, @40-41, @42-43
Result = ComputeHash(Input, 36)

Hash for IPv6 with UDP

Input[36] = @8-23, @24-39, @40-41, @42-43
Result = ComputeHash(Input, 36)

Hash for IPv6 without TCP

Input[32] = @8-23, @24-39
Result = ComputeHash(Input, 32)

:
不在以上规则范围内数据报文类型,RSS默认输出索引为0。

RSS Random Key

HASH在计算过程中使用到一个320位(40字节)random secret key作为加密密钥。INTEL 82599网卡中在RSS Random Key Register (RSSRK)寄存器保存这个KEY。
默认RK:

0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa

可通过默认RK与HASH函数结合来验证RSS计算结果:
下表提供的 Toeplitz HASH函数的 IPv4 版本验证数据:

Destination Address :Port Source Address :Port IPv4 only IPv4 with TCP
161.142.100.80 :1766 66.9.149.187 :2794 0x323e8fc2 0x51ccc178
65.69.140.83 :4739 199.92.111.2 :14230 0xd718262a 0xc626b0ea
12.22.207.184 :38024 24.19.198.95 :12898 0xd2d0a5de 0x5c2b394a
209.142.163.6 :2217 38.27.205.30 :48228 0x82989176 0xafc7327f
202.188.127.2 :1303 153.39.163.191 :44251 0x5d1809c5 0x10e828a2

下表包含 Toeplitz HASH函数 IPv6 版本的验证的数据:

Destination Address (Port) Source Address (Port) IPv6 only IPv6 with TCP
3ffe:2501:200:3::1 (1766) 3ffe:2501:200:1fff::7 (2794) 0x2cc18cd5 0x40207d3d
ff02::1 (4739) 3ffe:501:8::260:97ff:fe40:efab (14230) 0x0f0c461c 0xdde51bbf
fe80::200:f8ff:fe21:67cf (38024) 3ffe:1900:4545:3:200:f8ff:fe21:67cf (44251) 0x4b61e985 0x02d1feef


默认RK是为非对称KEY.
Toeplitz 算法可以在BSD源码中找到, 在新版DPDK已有软件实现.

DPDK

DPDK中开启端口RSS需配置rte_eth_conf中的mq_mode字段与rss_hf字段, 并至少配置2 RX QUEUES

struct rte_eth_conf port_conf_default = {
    .rxmode = {
        rx_mode.mq_mode = ETH_MQ_RX_RSS,
    },
    .rx_adv_conf = {
        .rss_conf = {
            .rss_key = NULL,
            .rss_hf = ETH_RSS_IP | ETH_RSS_TCP | ETH_RSS_UDP,
        },
    },
};

testpmd

启用rxonly模式测试RSS

testpmd -c 0xffff --socket-mem=8192,8192 -w 81:00.0 -n 2 -r 2 -- --enable-rx-cksum --rss-ip -rss-udp --rxq=2 --txq=2 -i
...
testpmd > set fwd rxonly
testpmd > set verbose 8
testpmd > start

scapy 发送测试报文:

sendp(Ether()/Dot1Q()/IP(src=RandIP(), dst='192.168.4.2'), iface='eth8', count=5)

在testpmd终端可看到以下类似输出:

port 0/queue 0: received 1 packets
src=00:1E:67:3E:CB:D1 - dst=00:22:AA:EA:4B:B3 - type=0x0800 - length=60 - nb_segs=1 - RSS hash=0x6c8d4e08 - RSS queue=0x0 - VLAN tci=0x1 - hw ptype: L2_ETHER L3_IPV4_EXT_UNKNOWN L4_UDP - sw ptype: L2_ETHER L3_IPV4 L4_UDP - l2_len=14 - l3_len=20 - l4_len=8 - Receive queue=0x0
ol_flags: PKT_RX_VLAN PKT_RX_RSS_HASH PKT_RX_L4_CKSUM_GOOD PKT_RX_IP_CKSUM_GOOD PKT_RX_VLAN_STRIPPED

port 0/queue 0: received 1 packets
src=00:1E:67:3E:CB:D1 - dst=00:22:AA:EA:4B:B3 - type=0x0800 - length=60 - nb_segs=1 - RSS hash=0x68b84a3d - RSS queue=0x0 - VLAN tci=0x1 - hw ptype: L2_ETHER L3_IPV4_EXT_UNKNOWN L4_UDP - sw ptype: L2_ETHER L3_IPV4 L4_UDP - l2_len=14 - l3_len=20 - l4_len=8 - Receive queue=0x0
ol_flags: PKT_RX_VLAN PKT_RX_RSS_HASH PKT_RX_L4_CKSUM_GOOD PKT_RX_IP_CKSUM_GOOD PKT_RX_VLAN_STRIPPED

`

Reference

Introduction to Receive Side Scaling
Intel® 82599 10 Gigabit Ethernet Controller Datasheet - 7.1.2.8 Receive-Side Scaling (RSS)

你可能感兴趣的:(DPDK)