Some questions about ASPM

 

  1. 1.    What is ASPM

 

ASPM stands for Active StatePower Management. It is a feature to save power when PCIE link is idle.


  1. 2.    Related Bits about ASPM

  • ASPM support and compliance Bit

The first related bits are in PCIE Express Capability Structure,its name is link capabilities Register (offset 0CH)The ASPM issues on Linux 3.2 kernel_第1张图片


ASPM Optionality Compliance:used to indicate whether it conforms to current specification.

ASPM Support: used to showwhether ASPM is supported.

 

The second relevant bit is inPCIE Root Complex Internal Link Control capabilities with name Root complexLink control Registers:

 

The ASPM issues on Linux 3.2 kernel_第2张图片

It is used to disable/enable ASPM.

 

  • Link Bandwidth Management Bit

This bit is used toindicate PCIE link width/speed changed or re-training occurs.


 The ASPM issues on Linux 3.2 kernel_第3张图片

  1. 3.    Software behavior about ASPM

 

After scanning PCI buses anddevices, kernel will check whether ASPM is supported && enabled and thenbegin do ASPM initialization. During the ASPM initialization, Re-train controlBit will be set to trigger a PCIE link re-training. This behavior will triggerthe Link Bandwidth management Status Bit of Link Status Register (Offset 12H)in PCIE Capability Structure to be set.  For Eos platform, this bit will getpci_link_bandwidth_changed_status (Vendor Specific Information Capabilities: offset 30H)to be set according to followingrule:

 The ASPM issues on Linux 3.2 kernel_第4张图片

Once SMI is triggered,   The SMI handler will pollpci_link_bandwidth_changed_status bit and post warning SEL as following if thebit has been set:

   1 | 03/04/2015 |01:06:38 | PCI-e Device Errors CPU Integrated I/O 0 | Non-Fatal Error Detected| Asserted | bus:0x00 dev:0x01 func:0x00  // Root port of SLOT 3

      ELOG(65) PCI link bandwidth changed status. Bus:00H Dev:01H Fn:00H PS:C0H

   2 | 03/04/2015 |01:06:38 | PCI-e Device Errors CPU Integrated I/O 0 | Non-Fatal Error Detected| Asserted | bus:0x00 dev:0x02 func:0x00 // Root port of SLOT 0

      ELOG(65) PCI link bandwidth changed status. Bus:00H Dev:02H Fn:00H PS:C0H

   3 | 03/04/2015 |01:06:38 | PCI-e Device Errors CPU Integrated I/O 0 | Non-Fatal Error Detected| Asserted | bus:0x00 dev:0x02 func:0x02  // Root port of on-board PMC SAS

      ELOG(65) PCI link bandwidth changed status.Bus:00H Dev:02H Fn:02H PS:C0H

Then the Bit is cleared bySMI handler. While in older platform, although the Link Bandwidth management Status Bitis also set, we never see any SEL/warning/Alert for this Bit.

 

If ASPM is not supported ordisabled, The ASPM initialization should be skipped after PCI scanning duringkernel boot phase.

 

  1. 4.    Concerns about ASPM


  • Do we need enableASPM feature?

Currently, the ASPM isenabled and running, that is why all root port with SLIC inserted has beenre-trained. However, neither older nor new platform has SLOTs which ASPM feature is supported,although I did see some PCIE/intel device has ASPM support.

 

  • Why fail to disable ASPM in kernel?

Per the code in drivers/pci/pcie/aspm.c,ASPM can be forced off with appended “pcie_aspm=off” option in kernel commandline, then there won’t be any PCIE link re-training, however I still find theLink Bandwidth Management Bit is set with the option in kernel command line.The appended option “pcie_aspm=off” doesn’t work well until I changed the codein pcie_aspm_sanity_check() as following:

    /*  

     * If ASPM is disabled thenwe're not going to change

     * the BIOS state. It's safe tocontinue even if it's a

     * pre-1.1 device

     */

 

    if (aspm_disabled)

      return -EINVAL;

     //continue;                                                                                                                                                         ……………………………….


It seems to be a linux kernel, we have filed a bug for that.


  • If ASPM need tobe enabled, the SEL in new platform is not expected, correct? If ASPM is not required, dowe need/have other daemon to monitor related Link Bandwidth Bit?


Take Link width management Bit for example, the bit will be set if linkwidth/speed has changed (this has already been monitored by sms on older and new platform) or Link re-training occurs, should system management software takecare of the re-training case?