Reference
InfiniBand Architecture Specification Volume 1, Release 1.3
LRH (local route header) 8Byte
present in all packets of a message
LRH.PktLen = len(LRH ~ ICRC) / 4B
MAX (all header + ICRC + VCRC ) = 126
==> MAX(LRH.PktLen) = (MTU + 126 - 2)/4
当MTU=4096:LRH.PktLen =1055
当MTU=1024:LRH.PktLen =287
…
针对IBA报文:MIN(LRH.PktLen ) = 6 , 对应真实长度即24B TODO: ??
针对RAW报文:MIN(LRH.PktLen ) = 5 , 对应真实长度即20B TODO: ?
Invariant CRC- ICRC - 32b = 4B
Present in all packets of message, if indicated by Link Next Header field (i.e.,not a raw packet).
please refer to Section 7.8.1, “Invariant CRC (ICRC) - 4 Bytes,” on page 207.
Variant CRC- VCRC - 16b = 2B
Present in all packets of message.
please refer to Section 7.8.2, “Variant CRC (VCRC) - 2 Bytes,” on page 209
GRH (global route header) 40Byte
present in all packets of a message, if indicated by LRH LINK Next Header.
GRH的结构定义和IPv6 header定义一致(RFC 2460), 但是IBA.SGID/DGID和IPV6.Source or dest IP addr含义是不同的,且两者间没什么映射关系。
BTH (Base transport header) 12B
Present in all packets of message, if indicated by Link Next Header field in GRH (i.e.,not a raw packet)
1. opcode
code[7:5] 定义RC/UC等场景,code[4:0]定义对应场景下的传输类型。
IB spec文档种有详细各场景的opcode定义、报文结构描述。
2. PSN
SEND/RDMA WRITE/ATOMIC 场景下, next_psn = (curr_psn + 1) modulo 2^24
rdma read 场景下,下一个rdma request的PSN 不是直接+1,而是要预留出来预期read rsp 波报文个数。
curr_psn: PSN of rdma read request
n = the pkt num of expected rdma read rsp
==> next_ psn = (curr_psn + n)modulo 2^24
1. RDETH
Reliable Datagram Extended Transport Header - RDETH - 4 Bytes
Present in every packet of reliable datagram message.
2. DETH
Datagram Extended Transport Header - DETH - 8 Bytes
Present in every packet of datagram request messages
3. XRCETH
XRC Extended Transport Header- XRCETH- 4 Bytes
Present in XRC send, XRC RDMA Read/Write or XRC Atomic Request
4. RETH
RDMA Extended Transport Header - RETH - 16Bytes
Present in first packet of RDMA request message
5. AtomicETH
Atomic Extended Transport Header - AtomicETH - 28 Bytes
Present in Atomic request message
6. AETH
ACK Extended Transport Header - AETH - 4Bytes;
Present in all ACK packets, including first and last packet of message for RDMA Read Response packets.
7. AtomicAckETH
Atomic ACK Extended Transport Header -AtomicAckETH - 8Bytes
Present in all AtomicACK packets
8. ImmDt
Immediate Data - ImmDt - 4 Bytes
Present in last packet of request with immediate data
9. IETH
Invalidate Extended Transport Header - IETH - 4 Bytes
Present in last packet of SEND with Invalidate request
当Send 报文长度超过PMTU时,需要切片发送,通过BTH.OPCODE标识首片、中间片、尾片;
针对切片场景,只允许尾片非PMTU对齐!
Send 报文的rsp 即"ACK",必须携带LRH+BTH+AETH+ICRC+VCRC
一个pkt 占一个PSN, 对端通过PSN来判断接收报文是否按序或是否有丢包发生
rdam write 第一个切片一定要带RETH。RETH 包含{VA, R-KEY, length}用于指示dest buffer
针对切片场景,只允许尾片非PMTU对齐!
Rdma write 报文的rsp 即"ACK",必须携带LRH+BTH+AETH+ICRC+VCRC
一个pkt占一个PSN, 对端通过PSN来判断接收报文是否按序或是否有丢包发生
read request:
一定是一片独立报文,必须携带RETH 包含{VA, R-KEY, length} 用于指示read对应的dest buffer
一个rdma read request 最大支持读2^31 byte data, 由RETH.length指示
发送完RDMA READ request后可以不等read rsp全部回来,就可以发送其他request。
——此时后续其他request的PSN维护特殊,需要空出来read rsp的PSN。please refer to " section 9.7.3.1 Requester Side - Generating PSN on page 302 "
——注意SEND/ RDMA Write 在切片发送时不允许被打断,必须上一个request发送完之后再发送下一个request。
一条链接(a particular QP)种outstanding rdma read request 数量是在建链期间协商的。
RDMA read request 不支持立即数模式,一定不携带immDt。
read rsp:
非中间片都要带AETH
read rsp 切片场景,只允许尾片非PMTU对齐!
一个pkt占一个PSN, 通过PSN来判断接收报文是否按序或是否有丢包发生。
如果发生丢包,可以重发RDMA read request 重新读,但不需要全部重新读,可以重新封装新的RDMA READ request,只重读丢失的那部分数据。“The PSN of the retried RDMA READ must be in the duplicate PSN region. See Section 9.7.1 Packet Sequence Numbers (PSN) on page 294”???待确认