iRDMA Flow Control Implementations

5.0       Priority Flow Control - Planning and Guidelines

This section covers planning, considerations, and general configuration guidelines for enabling PFC on a network.

5.1         Steps

The steps for enabling PFC on your network include the following:

  1. Set up your network hosts and switches. (Network Host and Switch Setup)
  2. Decide whether to use willing or non-willing DCB mode on the800 Series adapter.(Willing vs. Non-willing DCB Mode)
  3. Choose firmware or software DCB. (Firmware vs. Software DCB)
  4. Decide how to separate and prioritize traffic streams. (Separating and Prioritizing Traffic Streams)
  5. Configure ETS: Map priorities to traffic classes and allocate bandwidth.

(Configuring ETS: Map Priorities to TCs/Allocate Bandwidth)

  1. Configure PFC: Set priorities for drop or no-drop. (Configuring PFC)
  2. Run your application with the right priority. (Run Applications with the Right Priority)

5.1.1        Network Host and Switch Setup

NOTE

 

PFC can be used with or without a switch in the network.

  • If using a switch, you must configure PFC on both the adapter ports and the switch ports. Consult the appropriate switch manual for command syntax.
  • If using adapters back-to-back, configure PFC on both hosts.

Host prerequisites for RDMA are outside the scope of this guide, but in general, you need at a minimum:

  • Two Linux hosts with 800 Series adapters.
  • Supporting 800 Series firmware and software (NVM with RDMA support, ice (Intel® Ethernet) driver, and irdma driver).

If using software in DCB mode, you also need OpenLLDP, which includes the lldpad daemon and lldptool configuration utility.

  • In RHEL, install it with yum (zypper or apt-get might work as well in other operating systems.):

# yum install lldpad

  • Alternatively, install from source from:

GitHub - intel/openlldp

5.1.2          Willing vs. Non-willing DCB Mode

DCB standards like PFC and ETS must be set to either willing or non-willing mode, which determines whether the port is willing to accept configuration settings from its link partner.

Mode

When to Use

Willing

If you want to configure DCB on their switch and let adapters accept settings from the switch ports.

This is the preferred, most common setup.

Non-willing

For back-to-back configurations.

For troubleshooting, testing, and manually tweaking the configuration.

If preferred, configure DCB on all hosts and set the neighboring switch ports to willing (somewhat uncommon and might not be supported by all switches).

5.1.3          Firmware vs. Software DCB

The 800 Series has two options for using DCB: firmware and software.

  • Software DCB runs on the Linux host using OpenLLDP. It supports both willing and non-willing modes.
  • Firmware DCB runs on the 800 Series adapter firmware. It only supports willing mode.

If you plan on using willing mode, software DCB is recommended but not required.

NOTE

 

Only one type of DCB might be active at a time. Enabling firmware DCB overrides the software DCB setting.

DCB Type

When to Use

Willing Mode Setup

Non-willing Mode Setup

Firmware

Willing

Mode

Not supported in firmware DCB.

# ethtool --set-priv-flags <iface> fw-lldpagent on

Software

Willing

Mode

Can be set up in IEEE or CEE modes.

Refer to Software DCB Willing Mode for details.

Software

Non-willing

Mode

# ethtool --set-priv-flags <iface> fw-lldpagent off

# lldptool -Ti <iface> -V PFC willing=yes

# lldptool -Ti <iface> -V ETS willing=yes

# ethtool --set-priv-flags <iface> fw-lldpagent off

# lldptool -Ti <iface> -V PFC willing=no

# lldptool -Ti <iface> -V ETS willing=no

5.1.4        Software DCB Willing Mode

Software DCB can be configured in either IEEE or CEE mode.

For IEEE mode
  1. Disable CEE transmission.

#lldptool -Ti $interface -V CEE-DCBX enableTx=no

  1. Reset the DCBX mode to be auto (start in IEEE DCBX mode) after the next lldpad restart.

#lldptool -Ti $interface -V IEEE-DCBX mode=reset

  1. Configure willing configuration for interface.

#lldptool -Ti $interface -V ETS-CFG enableTx=yes willing=yes

  1. Configure willing recommendation for interface.

#lldptool -Ti $interface -V ETS-REC enableTx=yes

Setting willing=yes for ETS-REC is not logical as it is by definition a recommendation for a willing link partner.

  1. Configure willing PFC for interface.

#lldptool -Ti $interface -V PFC enable=yes willing=yes enableTx=yes

  1. Terminate the first instance of lldpad that was started (e.g., from initrd). Once lldpad -k has been invoked and lldpad has been restarted, subsequent invocations of lldpad -k will not terminate lldpad.

#lldpad -k

  1. Remove lldpad state records from shared memory.

#lldpad -s

  1. Restart service lldpad.

#systemctl restart lldpad.service

  1. Ensure CEE mode enableTx is set to no.

#lldptool -ti $interface -V CEE-DCBX -c

Output:

enableTx=no

  1. Ensure DCBX mode is set to auto.

#lldptool -ti $interface -V IEEE-DCBX -c

Output:

mode=auto

For CEE mode

In CEE, successful negotiation requires the link partner also to be in CEE mode.

  1. Enable CEE transmission.

#lldptool -T -i $interface -V CEE-DCBX enableTx=yes

  1. Reset the DCBX mode to be auto (start in IEEE DCBX mode) after the next lldpad restart.

#lldptool -Ti $interface -V IEEE-DCBX mode=reset

  1. To clean configuration of interface, set willing to off, disable priority group features, and set advertise to off.

#dcbtool sc $interface pg w:0 e:0 a:0

  1. To clean configuration of interface, set willing to off, disable PFC features, and set advertise to off.

#dcbtool sc $interface pfc w:0 e:0 a:0

  1. Configure willing, enable, and advertise configuration for priority group for interface.

#dcbtool sc $interface pg w:1 e:1 a:1

  1. Configure willing, enable, and advertise configuration for PFC for interface.

#dcbtool sc $interface pfc w:1 e:1 a:1

  1. Terminate the first instance of lldpad that was started (for example, from initrd).

Once lldpad -k has been invoked and lldpad has been restarted, subsequent invocations of lldpad -k will not terminate lldpad.

#lldpad -k

  1. Remove lldpad state records from shared memory.

#lldpad -s

  1. Restart service lldpad.

#systemctl restart lldpad.service

  1. Ensure CEE mode enableTx is set to yes.

#lldptool -ti $interface -V CEE-DCBX -c

Output:

enableTx=yes

  1. Ensure DCBX mode is set to cee.

#lldptool -ti $interface -V IEEE-DCBX -c

Output

mode=cee

5.1.5        Separating and Prioritizing Traffic Streams

For networks carrying multiple traffic types, you typically want:

  • One loss-less (no-drop) TC for RDMA traffic.
  • One or more lossy (drop) TCs for LAN traffic.

This can change depending on specific applications.

Example configuration:

Traffic Stream

Loss-less

TC

Priority

Bandwidth

RDMA Traffic

Yes

0

0

50%

LAN Application #1

No

1

2

25%

LAN Application #2

No

2

4

25%

Unused

No

Any1

All Others1

None

Note: 1. Unused priorities can be mapped to any TC (no traffic is being steered to specific priorities). Leaving them mapped to TC 0 is acceptable.

NOTES

 

  • The 800 Series supports a maximum of four TCs per port, one of which can have PFC enabled.
  • Traffic classes must start at zero and must be contiguous (0, 1, 2, 3, ... ).
  • ETS bandwidth allocations must total 100%.
  • Multiple priorities might map to the same TC. For example, TC0 can contain prio=0,1,2,3,4,5,6,7. However, a given priority might map to only a single TC (like prio 0 cannot be in both TC0 and TC1).

你可能感兴趣的:(Linux,kernel,linux,kernel)