iRDMA Flow Control Introduction

1.0       Introduction

This will introduce Ethernet flow control on Intel® Ethernet 800 Series Network Adapters with RDMA driver - iRDMA, with a focus on best practices for Linux RDMA traffic.

It includes:

  • Background on Ethernet flow control (FC) and Data Center Bridging (DCB).
  • Differences between Link-level Flow Control (LFC) and Priority Flow Control (PFC).
  • Configuration steps for each type on 800 Series Linux hosts.
  • Verification tips.

1.1         QoS/Flow Control Limitations on the 800 Series

  • Although the 800 Series hardware supports eight Traffic Classes (TCs), the maximum supported configuration is four TCs per port. Only one TC can have Priority Flow Control enabled per port.

Number of Adapter Ports

Traffic Class Recommendation

RDMA

1, 2, or 4

Up to four TCs, with one of them enabled with PFC.

Supported

More than 4

No DCB Support

Not Supported

  • In RoCEv2 mode, if no flow control is detected (either LFC or PFC), the driver automatically de-tunes. This is an intentional design to allow RoCEv2 to operate without flow control, but with lower performance.
  • When the 800 Series is in firmware Link Layer Discovery Protocol (LLDP) mode, only three application priorities are supported. Software LLDP supports 32. This refers to the LLDP APP TLV - see man lldptool-app for more info.


     

2.0         Background

                                                                                                   

2.1           Ethernet Flow Control

By design, Ethernet is an unreliable protocol with no guarantee that packets arrive at their destination correctly and in order. Instead, Ethernet relies on upper-layer protocols (such as TCP) or applications to provide reliable service and error correction.

The 802.3x standard introduced flow control to the Ethernet protocol, defining a mechanism for throttling the flow of data between two directly connected full-duplex network devices. If the sender transmits data faster than the receiver can accept it, the overwhelmed receiver can send a pause signal (Xoff or transmit off) to the sender, requesting that the sender stop transmitting data for a specified period of time. The sender resumes transmission either after the timeout period expires or if the receiver indicates that it is ready to accept more data by sending an Xon (transmit on) signal.

Without flow control, data might be lost or need to be re-transmitted by a ULP or application, which can significantly affect performance.

2.2           Flow Control in RDMA Networks

The 800 Series supports both iWARP and RoCEv2 RDMA transports. Flow control is strongly recommended for RoCEv2, but iWARP also benefits.

Base Transport

Flow Control Requirements

iWARP TCP

iWARP runs over TCP, a reliable protocol that implements its own flow control.

TCP's flow control might be relatively slow to respond in a high-performance, low-latency RDMA environment, especially under bursty traffic patterns.

Ethernet flow control is optional, but can be beneficial for iWARP.

iWARP mode requires VLAN to be configured fully to enable PFC.

RoCEv2 UDP

RoCEv2 runs over UDP, an unreliable protocol with no built-in flow control.

RoCEv2 therefore requires a lossless Ethernet network to ensure packet delivery.

If the irdma driver is in RoCEv2 mode and detects no flow control, it automatically de-tunes, causing lower performance.

Flow control is always recommended for RoCEv2.

2.3          Types of Flow Control: LFC vs. PFC

Ethernet standards define two types of flow control:

  • Link-level Flow Control (LFC)
  • Priority Flow Control (PFC)

Both types use Xon/Xoff pause frames to control data transmission. The primary difference is that LFC pauses all traffic on a link, but PFC supports Quality-of-Service (QoS) by defining different traffic priorities that can be indiv

你可能感兴趣的:(linux,kernel)