RSS(receive side scaling)是由微软提出的一种负载分流方法,通过计算网络数据报文中的网络层&传输层二/三/四元组HASH值,取HASH值的最低有效位(LSB)用于索引间接寻址表RETA(Redirection Table),间接寻址表RETA中的保存索引值用于分配数据报文到不同的CPU接收处理。现阶段RSS基本已有硬件实现,通过这项技术能够将网络流量分载到多个CPU上,降低操作系统单个CPU的占用率。
INTEL 82576/82599 RETA 为一个包含位宽 4 BITS
的 128
项的索引映射表,通过取固定取HASH值低7位(LSBs),将其映射到RETA表项。输出索引可在运行时动态调整更新来实现网络流量动态的负载均衡,但是更新不能保证与数据报文同步生效。
注:
82576/82599 RETA 4位宽的RSS输出索引意味着RSS最大只支持16队列分流,超过16队列之后的队列RSS无法分流。
INTEL XL710 PF RETA 大小 256, 位宽 6 BITS,支持最大 64 队列分流.
INTEL XL710 VF RETA 大小 64, 位宽 4 BITS,支持最大 16 队列分流.
RSS HASH函数一般采用微软托普利兹算法(Microsoft Toeplitz Based Hash),其中Microsoft(MSFT) RSS定义了IPv4/TcpIPv4/TCPIPv6/IPv6几种HASH计算方法,而后INTEL对其进行了扩展,添加了UdpIPv4/UdpIPv6/STcpIPv4/STcpIPv6支持。
Input[12] = @12-15, @16-19, @20-21, @22-23
Result = ComputeHash(Input, 12);
Input[12] = @12-15, @16-19, @20-21, @22-23.
Result = ComputeHash(Input, 12);
Input[8] = @12-15, @16-19
Result = ComputeHash(Input, 8)
Input[36] = @8-23, @24-39, @40-41, @42-43
Result = ComputeHash(Input, 36)
Input[36] = @8-23, @24-39, @40-41, @42-43
Result = ComputeHash(Input, 36)
Input[32] = @8-23, @24-39
Result = ComputeHash(Input, 32)
注:
不在以上规则范围内数据报文类型,RSS默认输出索引为0。
HASH在计算过程中使用到一个320位(40字节)random secret key
作为加密密钥。INTEL 82599网卡中在RSS Random Key Register
(RSSRK)寄存器保存这个KEY。
默认RK:
0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa
可通过默认RK与HASH函数结合来验证RSS计算结果:
下表提供的 Toeplitz HASH函数的 IPv4 版本验证数据:
Destination Address :Port | Source Address :Port | IPv4 only | IPv4 with TCP |
---|---|---|---|
161.142.100.80 :1766 | 66.9.149.187 :2794 | 0x323e8fc2 | 0x51ccc178 |
65.69.140.83 :4739 | 199.92.111.2 :14230 | 0xd718262a | 0xc626b0ea |
12.22.207.184 :38024 | 24.19.198.95 :12898 | 0xd2d0a5de | 0x5c2b394a |
209.142.163.6 :2217 | 38.27.205.30 :48228 | 0x82989176 | 0xafc7327f |
202.188.127.2 :1303 | 153.39.163.191 :44251 | 0x5d1809c5 | 0x10e828a2 |
下表包含 Toeplitz HASH函数 IPv6 版本的验证的数据:
Destination Address (Port) | Source Address (Port) | IPv6 only | IPv6 with TCP |
---|---|---|---|
3ffe:2501:200:3::1 (1766) | 3ffe:2501:200:1fff::7 (2794) | 0x2cc18cd5 | 0x40207d3d |
ff02::1 (4739) | 3ffe:501:8::260:97ff:fe40:efab (14230) | 0x0f0c461c | 0xdde51bbf |
fe80::200:f8ff:fe21:67cf (38024) | 3ffe:1900:4545:3:200:f8ff:fe21:67cf (44251) | 0x4b61e985 | 0x02d1feef |
注:
默认RK是为非对称KEY.
Toeplitz 算法可以在BSD源码中找到, 在新版DPDK已有软件实现.
DPDK中开启端口RSS需配置rte_eth_conf中的mq_mode字段与rss_hf字段, 并至少配置2 RX QUEUES:
struct rte_eth_conf port_conf_default = {
.rxmode = {
rx_mode.mq_mode = ETH_MQ_RX_RSS,
},
.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
.rss_hf = ETH_RSS_IP | ETH_RSS_TCP | ETH_RSS_UDP,
},
},
};
启用rxonly模式测试RSS:
testpmd -c 0xffff --socket-mem=8192,8192 -w 81:00.0 -n 2 -r 2 -- --enable-rx-cksum --rss-ip -rss-udp --rxq=2 --txq=2 -i
...
testpmd > set fwd rxonly
testpmd > set verbose 8
testpmd > start
scapy 发送测试报文:
sendp(Ether()/Dot1Q()/IP(src=RandIP(), dst='192.168.4.2'), iface='eth8', count=5)
在testpmd终端可看到以下类似输出:
port 0/queue 0: received 1 packets
src=00:1E:67:3E:CB:D1 - dst=00:22:AA:EA:4B:B3 - type=0x0800 - length=60 - nb_segs=1 - RSS hash=0x6c8d4e08 - RSS queue=0x0 - VLAN tci=0x1 - hw ptype: L2_ETHER L3_IPV4_EXT_UNKNOWN L4_UDP - sw ptype: L2_ETHER L3_IPV4 L4_UDP - l2_len=14 - l3_len=20 - l4_len=8 - Receive queue=0x0
ol_flags: PKT_RX_VLAN PKT_RX_RSS_HASH PKT_RX_L4_CKSUM_GOOD PKT_RX_IP_CKSUM_GOOD PKT_RX_VLAN_STRIPPED
port 0/queue 0: received 1 packets
src=00:1E:67:3E:CB:D1 - dst=00:22:AA:EA:4B:B3 - type=0x0800 - length=60 - nb_segs=1 - RSS hash=0x68b84a3d - RSS queue=0x0 - VLAN tci=0x1 - hw ptype: L2_ETHER L3_IPV4_EXT_UNKNOWN L4_UDP - sw ptype: L2_ETHER L3_IPV4 L4_UDP - l2_len=14 - l3_len=20 - l4_len=8 - Receive queue=0x0
ol_flags: PKT_RX_VLAN PKT_RX_RSS_HASH PKT_RX_L4_CKSUM_GOOD PKT_RX_IP_CKSUM_GOOD PKT_RX_VLAN_STRIPPED
`
Introduction to Receive Side Scaling
Intel® 82599 10 Gigabit Ethernet Controller Datasheet - 7.1.2.8 Receive-Side Scaling (RSS)