First I would like to introduce acpi pci a little, then dig into detail of resource
allocation for pci initial phases, althoug these two are irrelevant.
ACPI PCI device enumeration:
As we know, acpi namespace enumeration and pci enumeration are separately,
because most of times pci device has more information probed from low level
hw, acpi device(aka firmware device) may want to acquire informations from pci device(aka physical
device), there should be a link between acpi device and pci device, this is called
glue.
pci device path:
chenyu@chenyu-Surface-Pro-3:/$ ls -l /sys/devices/pci0000\:00/0000\:00\:02.0/firmware_node
lrwxrwxrwx 1 root root 0 3月 28 12:44 /sys/devices/pci0000:00/0000:00:02.0/firmware_node -> ../../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00
acpi device path:
chenyu@chenyu-Surface-Pro-3:/$ ls -l /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/LNXVIDEO\:00/physical_node
lrwxrwxrwx 1 root root 0 3月 28 15:32 /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/physical_node -> ../../../../pci0000:00/0000:00:02.0
OK, let's come to our topic today, how the resources are allocated.
Why I'm concern the resource allocation? Because there is a
poweroff bug on Mac Pro 11, which hangs the system immediately
after user type 'poweroff'.
https://bugzilla.kernel.org/attachment.cgi?id=208961
https://patchwork.kernel.org/patch/9143637/
https://bugzilla.kernel.org/show_bug.cgi?id=103211
Here's the answer from Yinghai:
On Thu, Apr 7, 2016 at 8:55 PM, Chen, Yu
> Currently someone on Bugzilla reported that
> he can not poweroff(S5) nor suspend to S3 after boot up, so I did some
> test on his machine that, it looks like after
> pcibios_assign_resources, we can not access ACPI PM sleep
> register(PM_SLP) caused the problem, that is, once outw(0x1804), the
> system hangs, however before pcibios_assign_resources, everything is
> OK. So I checked the boot log, it seems that there are many resource allocation failure during this phase, such as:
>
> [ 0.865437] pci 0000:06:06.0: BAR 13: no space for [io size 0x1000]
> [ 0.865439] pci 0000:06:06.0: BAR 13: failed to assign [io size 0x1000]
>
> I don't know if this is related to above issue, and would the pci
> device iospace be reset/invalid if above failure comes out. Could you
> give me some advices on why this would probably happened? Thank you.
>
> Related Bugzilla thread:
> https://bugzilla.kernel.org/show_bug.cgi?id=103211
That allocation failed, as parent bridge only have [0x4000,0x5fff].
and /proc/ioports does not report any overlapping...
0d00-ffff : PCI Bus 0000:00
1800-187f : pnp 00:01
1800-1803 : ACPI PM1a_EVT_BLK
1804-1805 : ACPI PM1a_CNT_BLK
1808-180b : ACPI PM_TMR
1820-182f : ACPI GPE0_BLK
1830-1833 : iTCO_wdt.0.auto
1850-1850 : ACPI PM2_CNT_BLK
1860-187f : iTCO_wdt.0.auto
2000-2fff : PCI Bus 0000:02
3000-303f : 0000:00:02.0
4000-6fff : PCI Bus 0000:05
4000-5fff : PCI Bus 0000:06
4000-4fff : PCI Bus 0000:08
4000-4fff : PCI Bus 0000:09
4000-4fff : PCI Bus 0000:0a
5000-5fff : PCI Bus 0000:3a
efa0-efbf : 0000:00:1f.3
ffff-ffff : pnp 00:01
allocation does assign io port to 00:1c.0
[ 0.273741] pci 0000:00:1c.0: BAR 8: assigned [mem 0x7fa00000-0x7fbfffff]
[ 0.273751] pci 0000:00:1c.0: BAR 9: assigned [mem
0x7fc00000-0x7fdfffff 64bit pref]
[ 0.273756] pci 0000:00:1c.0: BAR 7: assigned [io 0x2000-0x2fff]
so can you use setpci to clear that before access 0x1804 ?
Before started, I'd like to summarise the overall process of pci resource allocation,
first of all,pci devices must declare their own resources to running some test, the resource
includes io resource and memory resource, both of them are filled by BIOS when probing
pci device, and put these requirement in pci device's registers, and later linux will check
each of these requirement, if nonconflict then allocate for them and set to pci device registers,
if conflict then adjust the region scope and set. at last all these resources are maintained in
ioport_resource and iomem_resource, these resource are not only accessed by pci devices,but
may also be accessed by other components such as acpi.
these are 3 phases here,
1. scan the pci tree and probe each pci device's config in pci_device.resource array,
acpi_init->acpi_scan_init->pci_scan_child_bus->pci_scan_slot, and these resources regions's start
addr are probed from pci device config.base_address register[0~5], and the size from pci device bar
initial register(maybe incorrect value), these 6 region may be either mem or io (prefetch or not)
2. sort each pci_device's resource into a resource tree by request_resource, and stored
in each pci_bus.resource, in pci_subsys_init->pcibios_init->pcibios_resource_survey
3. finally assign resources according to step 2, thus reset the pci_device's bar registers
to declaim its resource, in pcibios_assign_resources
First I would like to illustrate a picture of the whole pci structure:
(Copy from http://www.tldp.org/LDP/tlk/dd/pci.html)
I would like to exhibit the pci allocation logs during boot up, which
is mostly in pcibios_assign_resources:
pcibios_assign_resources:pci_assign_unassigned_root_bus_resources
void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
{
__pci_bus_size_bridges(bus, add_list);
/* Depth last, allocate resources and update the hardware. */
__pci_bus_assign_resources(bus, add_list, &fail_head);
}
__pci_bus_size_bridges is to adjust the size of conflict resources,
while __pci_bus_assign_resources is to modify the pci config for conflict resources.
pci_assign_unassigned_root_bus_resources:__pci_bus_size_bridges:pbus_size_io
static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
resource_size_t add_size, struct list_head *realloc_head)
{
struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
IORESOURCE_IO);
if (!b_res)
return;
//so, this function only concerns the ioport which have conflict
size0 = calculate_iosize(size, min_size, size1,
resource_size(b_res), min_align);
size1 = calculate_iosize(size, min_size, 1000 + size1,
resource_size(b_res), min_align);
b_res->start = min_align;
b_res->end = b_res->start + size0 - 1;
b_res->flags |= IORESOURCE_STARTALIGN;
if (size1 > size0 && realloc_head) {
dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window %pR to %pR add_size %llx\n",
b_res, &bus->busn_res,
(unsigned long long)size1-size0);
}
}
We can see infer from the message that, "bridge window" means there is conflict for
this resource, and it tries to find a hole to insert this region(and we don't care about
the start addr, because we only care the size for this region):
[ 0.865300] pci 0000:06:00.0: bridge window [io 0x1000-0x0fff] to [bus 07] add_size 1000
[ 0.865302] pci 0000:06:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 07] add_size 200000 add_align 100000
[ 0.865312] pci 0000:06:04.0: bridge window [io 0x1000-0x0fff] to [bus 39] add_size 1000
[ 0.865313] pci 0000:06:04.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 39] add_size 200000 add_align 100000
[ 0.865314] pci 0000:06:04.0: bridge window [mem 0x00100000-0x000fffff] to [bus 39] add_size 200000 add_align 100000
[ 0.865324] pci 0000:06:06.0: bridge window [io 0x1000-0x0fff] to [bus 6b] add_size 1000
[ 0.865325] pci 0000:06:06.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 6b] add_size 200000 add_align 100000
[ 0.865326] pci 0000:06:06.0: bridge window [mem 0x00100000-0x000fffff] to [bus 6b] add_size 200000 add_align 100000
[ 0.865342] pci 0000:00:1c.0: bridge window [io 0x1000-0x0fff] to [bus 02] add_size 1000
[ 0.865343] pci 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 02] add_size 200000 add_align 100000
[ 0.865344] pci 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff] to [bus 02] add_size 200000 add_align 100000
[ 0.865356] pci 0000:00:1c.0: res[14]=[mem 0x00100000-0x000fffff] res_to_dev_res add_size 200000 min_align 100000
[ 0.865357] pci 0000:00:1c.0: res[14]=[mem 0x00100000-0x002fffff] res_to_dev_res add_size 200000 min_align 100000
[ 0.865358] pci 0000:00:1c.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865359] pci 0000:00:1c.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865360] pci 0000:00:1c.0: res[13]=[io 0x1000-0x0fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.865361] pci 0000:00:1c.0: res[13]=[io 0x1000-0x1fff] res_to_dev_res add_size 1000 min_align 1000
Here's what the real allocation do:
pci_assign_unassigned_root_bus_resources:__pci_bus_assign_resources:
void __pci_bus_assign_resources(const struct pci_bus *bus,
struct list_head *realloc_head,
struct list_head *fail_head)
{
pbus_assign_resources_sorted(bus, realloc_head, fail_head);
list_for_each_entry(dev, &bus->devices, bus_list) {
b = dev->subordinate;
if (!b)
continue;
__pci_bus_assign_resources(b, realloc_head, fail_head);//recursive
switch (dev->class >> 8) {
case PCI_CLASS_BRIDGE_PCI:
pci_setup_bridge(b);
}
}
}
from above logic we know the __pci_bus_assign_resources behaves in the recursive manner.
whenever a bus is encountered, it first allocate resource for it, then goes down the next level to
deal with its children.The actual aloocation is done by pbus_assign_resources_sorted:
pci_assign_unassigned_root_bus_resources:__pci_bus_assign_resources:pbus_assign_resources_sorted
static void pbus_assign_resources_sorted(const struct pci_bus *bus,
struct list_head *realloc_head,
struct list_head *fail_head)
{
struct pci_dev *dev;
LIST_HEAD(head);
list_for_each_entry(dev, &bus->devices, bus_list)
__dev_sort_resources(dev, &head);
__assign_resources_sorted(&head, realloc_head, fail_head);
}
before we process, I want to emphasize that, only conflict resources
are concerned by __dev_sort_resources, thus prepare the resource head list
for __assign_resources_sorted:
pci_assign_unassigned_root_bus_resources:__pci_bus_assign_resources:pbus_assign_resources_sorted:
__dev_sort_resources:pdev_sort_resources
static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
{
for (i = 0; i < PCI_NUM_RESOURCES; i++) {
r = &dev->resource[i];
//only conflict resources, thus no parent resources are interested.
if (!(r->flags) || r->parent)
continue;
tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
tmp->res = r;
//insert the tmp into proper position
//head is the list to be inserted
n = head;
r_align = pci_resource_alignment(dev, r);
//check each entry in the head list,
//find one entry
list_for_each_entry(dev_res, head, list) {
resource_size_t align;
align = pci_resource_alignment(dev_res->dev,
dev_res->res);
if (r_align > align) {
n = &dev_res->list;
break;
}
}
/* Insert it just before n*/
list_add_tail(&tmp->list, n);
}
}
Then we can deal with this resources list:
pci_assign_unassigned_root_bus_resources:__pci_bus_assign_resources:pbus_assign_resources_sorted:
__assign_resources_sorted:assign_requested_resources_sorted:pci_assign_resource
int pci_assign_resource(struct pci_dev *dev, int resno)
{
find_resource(root, new, size, &constraint);
ret = __request_resource(root, new);
//failed?
if (ret < 0) {
dev_info(&dev->dev, "BAR %d: no space for %pR\n", resno, res);
ret = pci_revert_fw_address(res, dev, resno, size);
}
if (ret < 0) {
dev_info(&dev->dev, "BAR %d: failed to assign %pR\n", resno,
res);
return ret;
}
//succeed
dev_info(&dev->dev, "BAR %d: assigned %pR\n", resno, res);
pci_update_resource(dev, resno);
}
find_resource is to find a 'hole' of size under root bus resource tree, then __request_resource tries
to insert this region into the resource tree by:
1. find a parent resource contains this new resource
2. insert this resource into parent's child resource-list(children are not overlap with each other)
So we know, if the resource is succeefully allocated it will print something like
"BAR 7: assigned" and then the pci_dev who owns this resource will update its
pci config registers. Otherwise, it will warn "BAR 7: no space for" and returns
without doing anything:
[ 0.865365] pci 0000:00:1c.0: BAR 14: assigned [mem 0x7fa00000-0x7fbfffff]
[ 0.865372] pci 0000:00:1c.0: BAR 15: assigned [mem 0x7fc00000-0x7fdfffff 64bit pref]
[ 0.865375] pci 0000:00:1c.0: BAR 13: assigned [io 0x2000-0x2fff]
or
[ 0.865429] pci 0000:06:00.0: BAR 13: no space for [io size 0x1000]
[ 0.865431] pci 0000:06:00.0: BAR 13: failed to assign [io size 0x1000]
BTW, here is how pci device updates its pci configs:
pci_assign_unassigned_root_bus_resources:__pci_bus_assign_resources:pbus_assign_resources_sorted:
__assign_resources_sorted:assign_requested_resources_sorted:pci_assign_resource:pci_update_resource
void pci_update_resource(struct pci_dev *dev, int resno)
{
new = region.start | (res->flags & PCI_REGION_FLAG_MASK);
reg = pci_resource_bar(dev, resno, &type);
pci_write_config_dword(dev, reg, new);
}
This is so weird, when we reach here, the bridge should also be able
to reach this point, which from above log we already know:
[ 0.865375] pci 0000:00:1c.0: BAR 13: assigned [io 0x2000-0x2fff]
Thus the resource number(resno) is 13, but from resno judgement in the
pci_resource_bar, it only cares about number smaller than
PCI_BRIDGE_RESOURCES:
int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type)
{
int reg;
if (resno < PCI_ROM_RESOURCE) {
return PCI_BASE_ADDRESS_0 + 4 * resno;
} else if (resno == PCI_ROM_RESOURCE) {
return dev->rom_base_reg;
} else if (resno < PCI_BRIDGE_RESOURCES) {
reg = pci_iov_resource_bar(dev, resno);
}
return reg;
}
enum {
/* #0-5: standard PCI resources */
PCI_STD_RESOURCES,
PCI_STD_RESOURCE_END = 5,
/* #6: expansion ROM resource */
PCI_ROM_RESOURCE,
/* device specific resources */
#ifdef CONFIG_PCI_IOV
PCI_IOV_RESOURCES,
PCI_IOV_RESOURCE_END = PCI_IOV_RESOURCES + PCI_SRIOV_NUM_BARS - 1,
#endif
/* resources assigned to buses behind the bridge */
#define PCI_BRIDGE_RESOURCE_NUM 4
PCI_BRIDGE_RESOURCES,
PCI_BRIDGE_RESOURCE_END = PCI_BRIDGE_RESOURCES +
PCI_BRIDGE_RESOURCE_NUM - 1,
/* total resources associated with a PCI device */
PCI_NUM_RESOURCES,
/* preserve this for compatibility */
DEVICE_COUNT_RESOURCE = PCI_NUM_RESOURCES,
};
So we comes to a conclusion, the assign phase has no effect to pci bridges low level
pci config registers, but only allocate regions for them. Actually, we have:
if (resno < PCI_BRIDGE_RESOURCES)
pci_update_resource(dev, resno);
in pci_assign_resource, only non-bridge conflict resource will touch the
PCI_BASE_ADDRESS_0.
And for ordinary
pci devices, this assign phase make sense, for example, PCI_BASE_ADDRESS_0
is for standard PCI resources.
Paste again:
[ 0.865365] pci 0000:00:1c.0: BAR 14: assigned [mem 0x7fa00000-0x7fbfffff]
[ 0.865372] pci 0000:00:1c.0: BAR 15: assigned [mem 0x7fc00000-0x7fdfffff 64bit pref]
[ 0.865375] pci 0000:00:1c.0: BAR 13: assigned [io 0x2000-0x2fff]
Then let's see how pci bridges are modified:
pci_assign_unassigned_root_bus_resources:__pci_bus_assign_resources:pci_setup_bridge:__pci_setup_bridge
__pci_setup_bridge:pci_setup_bridge_io
dev_info(&bridge->dev, "PCI bridge to %pR\n",
&bus->busn_res);
[ 0.865378] pci 0000:00:01.0: PCI bridge to [bus 01]
static void pci_setup_bridge_io(struct pci_dev *bridge)
{
dev_info(&bridge->dev, " bridge window %pR\n", res);
pci_write_config_word(bridge, PCI_IO_BASE, l);
}
So it seems pci bridge would like to change PCI_IO_BASE at last.
So we finally comes to a conclusion, pci bridge is using PCI_IO_BASE
which controls the resource under this birdge, while ordinary pci device
is using PCI_BASE_ADDRESS_0 to declares what resource region he wants
to occupy.
Besides, I'd like to also mentioned that, either pci_update_resource or pci_setup_bridge_io,
use a trick to update their PCI_BASE_ADDRESS_0 and PCI_IO_BASE, that is,
since the address space is ajacent:
#define PCI_IO_BASE 0x1c /* I/O range behind the bridge */
#define PCI_IO_LIMIT 0x1d
we use one pci_write_config_dword operation
to perform both the base and limit register,
First, I'd like to illustrate a picture of what PCI_IO_BASE are, as mentioned in this function's
comment, it is described in PCI-to-PCI Bridge Architecture Specification rev. 1.1 (1998)
and the doc from TI is more precise:
http://www.ti.com.cn/general/cn/docs/lit/getliterature.tsp?genericPartNumber=pci2250&fileType=pdf
let's take pci_setup_bridge_io for example:
static void pci_setup_bridge_io(struct pci_dev *bridge)
{
struct resource *res;
struct pci_bus_region region;
unsigned long io_mask;
u8 io_base_lo, io_limit_lo;
u16 l;
u32 io_upper16;
io_mask = PCI_IO_RANGE_MASK;
if (bridge->io_window_1k)
io_mask = PCI_IO_1K_RANGE_MASK;
/* Set up the top and bottom of the PCI I/O segment for this bus. */
res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
pcibios_resource_to_bus(bridge->bus, ®ion, res);
if (res->flags & IORESOURCE_IO) {
pci_read_config_word(bridge, PCI_IO_BASE, &l);
io_base_lo = (region.start >> 8) & io_mask;
io_limit_lo = (region.end >> 8) & io_mask;
l = ((u16) io_limit_lo << 8) | io_base_lo;
/* Set up upper 16 bits of I/O base/limit. */
io_upper16 = (region.end & 0xffff0000) | (region.start >> 16);
dev_info(&bridge->dev, " bridge window %pR\n", res);
} else {
/* Clear upper 16 bits of I/O base/limit. */
io_upper16 = 0;
l = 0x00f0;
}
/* Temporarily disable the I/O range before updating PCI_IO_BASE. */
pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff);
/* Update lower 16 bits of I/O base/limit. */
pci_write_config_word(bridge, PCI_IO_BASE, l);
/* Update upper 16 bits of I/O base/limit. */
pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, io_upper16);
}
Why do we right shift 8bit? because the upper 4 bits in IO_BASE(8bit) is corresponding to the address
of bit 12 - 15, thus we have( (addr >> 12)&0xf ) << 4, thus addr>>8.
Here are the logs in detal:
[ 0.865381] pci 0000:00:01.0: bridge window [mem 0xa0b00000-0xa0bfffff]
[ 0.865387] pci 0000:06:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865388] pci 0000:06:00.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865389] pci 0000:06:04.0: res[14]=[mem 0x00100000-0x000fffff] res_to_dev_res add_size 200000 min_align 100000
[ 0.865389] pci 0000:06:04.0: res[14]=[mem 0x00100000-0x002fffff] res_to_dev_res add_size 200000 min_align 100000
[ 0.865390] pci 0000:06:04.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865391] pci 0000:06:04.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865392] pci 0000:06:06.0: res[14]=[mem 0x00100000-0x000fffff] res_to_dev_res add_size 200000 min_align 100000
[ 0.865393] pci 0000:06:06.0: res[14]=[mem 0x00100000-0x002fffff] res_to_dev_res add_size 200000 min_align 100000
[ 0.865394] pci 0000:06:06.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865395] pci 0000:06:06.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[ 0.865396] pci 0000:06:00.0: res[13]=[io 0x1000-0x0fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.865397] pci 0000:06:00.0: res[13]=[io 0x1000-0x1fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.865397] pci 0000:06:04.0: res[13]=[io 0x1000-0x0fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.865398] pci 0000:06:04.0: res[13]=[io 0x1000-0x1fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.865399] pci 0000:06:06.0: res[13]=[io 0x1000-0x0fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.865400] pci 0000:06:06.0: res[13]=[io 0x1000-0x1fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.865402] pci 0000:06:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[ 0.865405] pci 0000:06:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[ 0.865408] pci 0000:06:04.0: BAR 14: no space for [mem size 0x00200000]
[ 0.865410] pci 0000:06:04.0: BAR 14: failed to assign [mem size 0x00200000]
[ 0.865413] pci 0000:06:04.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[ 0.865416] pci 0000:06:04.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[ 0.865419] pci 0000:06:06.0: BAR 14: no space for [mem size 0x00200000]
[ 0.865421] pci 0000:06:06.0: BAR 14: failed to assign [mem size 0x00200000]
[ 0.865423] pci 0000:06:06.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[ 0.865426] pci 0000:06:06.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[ 0.865429] pci 0000:06:00.0: BAR 13: no space for [io size 0x1000]
[ 0.865431] pci 0000:06:00.0: BAR 13: failed to assign [io size 0x1000]
[ 0.865433] pci 0000:06:04.0: BAR 13: no space for [io size 0x1000]
[ 0.865435] pci 0000:06:04.0: BAR 13: failed to assign [io size 0x1000]
[ 0.865437] pci 0000:06:06.0: BAR 13: no space for [io size 0x1000]
[ 0.865439] pci 0000:06:06.0: BAR 13: failed to assign [io size 0x1000]
[ 0.865442] pci 0000:06:06.0: BAR 14: no space for [mem size 0x00200000]
[ 0.865443] pci 0000:06:06.0: BAR 14: failed to assign [mem size 0x00200000]
[ 0.865446] pci 0000:06:06.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[ 0.865449] pci 0000:06:06.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[ 0.865452] pci 0000:06:06.0: BAR 13: no space for [io size 0x1000]
[ 0.865454] pci 0000:06:06.0: BAR 13: failed to assign [io size 0x1000]
[ 0.865456] pci 0000:06:04.0: BAR 14: no space for [mem size 0x00200000]
[ 0.865458] pci 0000:06:04.0: BAR 14: failed to assign [mem size 0x00200000]
[ 0.865460] pci 0000:06:04.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[ 0.865463] pci 0000:06:04.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[ 0.865466] pci 0000:06:04.0: BAR 13: no space for [io size 0x1000]
[ 0.865468] pci 0000:06:04.0: BAR 13: failed to assign [io size 0x1000]
[ 0.865470] pci 0000:06:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[ 0.865473] pci 0000:06:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[ 0.865476] pci 0000:06:00.0: BAR 13: no space for [io size 0x1000]
[ 0.865477] pci 0000:06:00.0: BAR 13: failed to assign [io size 0x1000]
[ 0.865480] pci 0000:06:00.0: PCI bridge to [bus 07]
[ 0.865484] pci 0000:06:00.0: bridge window [mem 0xa0d00000-0xa0dfffff]
[ 0.865490] pci 0000:06:03.0: PCI bridge to [bus 08-38]
[ 0.865493] pci 0000:06:03.0: bridge window [io 0x4000-0x4fff]
[ 0.865497] pci 0000:06:03.0: bridge window [mem 0xa0e00000-0xa4dfffff]
[ 0.865501] pci 0000:06:03.0: bridge window [mem 0xace00000-0xb0dfffff 64bit pref]
[ 0.865506] pci 0000:06:04.0: PCI bridge to [bus 39]
[ 0.865515] pci 0000:06:05.0: PCI bridge to [bus 3a-6a]
[ 0.865518] pci 0000:06:05.0: bridge window [io 0x5000-0x5fff]
[ 0.865522] pci 0000:06:05.0: bridge window [mem 0xa4e00000-0xa8dfffff]
[ 0.865525] pci 0000:06:05.0: bridge window [mem 0xb0e00000-0xb4dfffff 64bit pref]
[ 0.865531] pci 0000:06:06.0: PCI bridge to [bus 6b]
[ 0.865539] pci 0000:05:00.0: PCI bridge to [bus 06-6b]
[ 0.865542] pci 0000:05:00.0: bridge window [io 0x4000-0x5fff]
[ 0.865546] pci 0000:05:00.0: bridge window [mem 0xa0d00000-0xa8dfffff]
[ 0.865549] pci 0000:05:00.0: bridge window [mem 0xace00000-0xb4dfffff 64bit pref]
[ 0.865555] pci 0000:00:01.1: PCI bridge to [bus 05-9b]
[ 0.865557] pci 0000:00:01.1: bridge window [io 0x4000-0x6fff]
[ 0.865560] pci 0000:00:01.1: bridge window [mem 0xa0d00000-0xacdfffff]
[ 0.865562] pci 0000:00:01.1: bridge window [mem 0xace00000-0xb8dfffff 64bit pref]
[ 0.865566] pci 0000:00:1c.0: PCI bridge to [bus 02]
[ 0.865575] pci 0000:00:1c.0: bridge window [io 0x2000-0x2fff]
[ 0.865580] pci 0000:00:1c.0: bridge window [mem 0x7fa00000-0x7fbfffff]
[ 0.865584] pci 0000:00:1c.0: bridge window [mem 0x7fc00000-0x7fdfffff 64bit pref]
[ 0.865591] pci 0000:00:1c.2: PCI bridge to [bus 03]
[ 0.865596] pci 0000:00:1c.2: bridge window [mem 0xa0400000-0xa08fffff]
[ 0.865603] pci 0000:00:1c.3: PCI bridge to [bus 04]
[ 0.865607] pci 0000:00:1c.3: bridge window [mem 0xa0900000-0xa0afffff]
[ 0.865612] pci 0000:00:1c.3: bridge window [mem 0x80000000-0x8fffffff 64bit pref]
We can see from above, there are quite many allocation failures, take the following for example:
[ 0.865433] pci 0000:06:04.0: BAR 13: no space for [io size 0x1000]
[ 0.865435] pci 0000:06:04.0: BAR 13: failed to assign [io size 0x1000]
So why it fails? this is because there is not enough space to insert a region of 0x1000 into
pci device 0000:06:04.0 's parent region node, who's 0000:06:04.0 and his parent resource?
Let defer from /proc/ioport:
0000-0cf7 : PCI Bus 0000:00
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0062-0062 : PNP0C09:00
0062-0062 : EC data
0064-0064 : keyboard
0066-0066 : PNP0C09:00
0066-0066 : EC cmd
0070-0077 : rtc0
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
00f0-00f0 : PNP0C04:00
0300-031f : APP0001:00
0300-031f : applesmc
0410-0415 : ACPI CPU throttle
0800-087f : pnp 00:01
0cf8-0cff : PCI conf1
0d00-ffff : PCI Bus 0000:00
1800-187f : pnp 00:01
1800-1803 : ACPI PM1a_EVT_BLK
1804-1805 : ACPI PM1a_CNT_BLK
1808-180b : ACPI PM_TMR
1820-182f : ACPI GPE0_BLK
1830-1833 : iTCO_wdt.0.auto
1850-1850 : ACPI PM2_CNT_BLK
1860-187f : iTCO_wdt.0.auto
2000-2fff : PCI Bus 0000:02
3000-303f : 0000:00:02.0
4000-6fff : PCI Bus 0000:05
4000-5fff : PCI Bus 0000:06
4000-4fff : PCI Bus 0000:08
5000-5fff : PCI Bus 0000:3a
efa0-efbf : 0000:00:1f.3
ffff-ffff : pnp 00:01
We can see there is a PCI Bus 0000:06 who holds the resource for
4000-5fff, and PCI Bus 0000:08 has occupied 4000-4fff while
PCI Bus 0000:3a has occupied 5000-5fff, since bridge 0000:06:04.0
may not connect to BUS 0000:08 nor 0000:3a, it is impossible
for PCI Bridge 0000:06:04.0 to declair windows under PCI Bus 0000:06,
due to all resources been exhausted, so do 0000:06:06.0 and
0000:06:00.0.
06:04.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge] (prog-if 00 [Normal decode])
06:06.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge] (prog-if 00 [Normal decode])
06:00.0 PCI bridge: Intel Corporation DSL5520 Thunderbolt [Falcon Ridge] (prog-if 00 [Normal decode])
Memory behind bridge: a0d00000-a0dfffff
Then we see there is no "bridge window [io]" showed, which indicates the
success of io region allocation, BTW, the backtrace of bridge setting is
as followed:
[ 0.310266] pci 0000:06:03.0: PCI bridge to [bus 08-38]
[ 0.310270] pci 0000:06:03.0: bridge window [io 0x4000-0x4fff]
[ 0.310303] [
[ 0.310306] [
[ 0.310309] [
[ 0.310313] [
[ 0.310316] [
[ 0.310320] [
[ 0.310324] [
[ 0.310327] [
[ 0.310330] [
[ 0.310333] [
[ 0.310336] [
[ 0.310339] [
[ 0.310342] [
[ 0.310345] [
BTW, we also saw the initial io resource allocation(during pci device probe in acpi_init),
which is before actual resource allocation in pcibios_assign_resources, that is :
[ 0.281883] pci 0000:06:05.0: supports D1 D2
[ 0.281885] pci 0000:06:05.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 0.281961] pci 0000:06:06.0: [8086:156d] type 01 class 0x060400
[ 0.281991] [
[ 0.281994] [
[ 0.281997] [
[ 0.282000] [
[ 0.282003] [
[ 0.282006] [
[ 0.282009] [
[ 0.282012] [
[ 0.282015] [
[ 0.282018] [
[ 0.282021] [
[ 0.282024] [
[ 0.282027] [
[ 0.282031] [
[ 0.282034] [
[ 0.282037] [
[ 0.282040] [
[ 0.282044] [
[ 0.282047] [
[ 0.282050] [
[ 0.282053] [
[ 0.282056] [
[ 0.282059] [
and the backtrace of pci bridge resource declaim:
[ 0.280885] pci 0000:00:1c.4: bridge window [io 0x4000-0x6fff]
[ 0.280915] [
[ 0.280918] [
[ 0.280921] [
[ 0.280924] [
[ 0.280927] [
[ 0.280930] [
[ 0.280933] [
[ 0.280937] [
[ 0.280940] [
[ 0.280943] [
[ 0.280946] [
[ 0.280950] [
[ 0.280953] [
[ 0.280956] [
[ 0.280959] [
[ 0.280962] [
[ 0.280966] [
So it looks like in pci_scan_child_bus we probe pci device and pci bridge separately.
[ 0.865506] pci 0000:06:04.0: PCI bridge to [bus 39]
[ 0.865531] pci 0000:06:06.0: PCI bridge to [bus 6b]
[ 0.865480] pci 0000:06:00.0: PCI bridge to [bus 07]
[ 0.865484] pci 0000:06:00.0: bridge window [mem 0xa0d00000-0xa0dfffff]
then let's take a successful allocation:
[ 0.865375] pci 0000:00:1c.0: BAR 13: assigned [io 0x2000-0x2fff]
According to /por/ioport, we know this PCI bridge 0000:00:1c.0 must
be connect and it is the only connection of PCI Bus 0000:02
According to lspci and io port logs such as bridge window, we can deduce
the pci io resource tree as followed, green text stands for succeed of allocation
for this bridge window, red stands for failure.
0000-0cf7 : PCI Bus 0000:00
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0062-0062 : PNP0C09:00
0062-0062 : EC data
0064-0064 : keyboard
0066-0066 : PNP0C09:00
0066-0066 : EC cmd
0070-0077 : rtc0
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
00f0-00f0 : PNP0C04:00
0300-031f : APP0001:00
0300-031f : applesmc
0410-0415 : ACPI CPU throttle
0800-087f : pnp 00:01
0cf8-0cff : PCI conf1
0d00-ffff : PCI Bus 0000:00
1800-187f : pnp 00:01
1800-1803 : ACPI PM1a_EVT_BLK
1804-1805 : ACPI PM1a_CNT_BLK
1808-180b : ACPI PM_TMR
1820-182f : ACPI GPE0_BLK
1830-1833 : iTCO_wdt.0.auto
1850-1850 : ACPI PM2_CNT_BLK
1860-187f : iTCO_wdt.0.auto
2000-2fff : PCI Bus 0000:02
3000-303f : 0000:00:02.0
4000-6fff : PCI Bus 0000:05
4000-5fff : PCI Bus 0000:06
4000-4fff : PCI Bus 0000:08
5000-5fff : PCI Bus 0000:3a
efa0-efbf : 0000:00:1f.3
ffff-ffff : pnp 00:01
During bootup, we found some pci bridge has invalid io limit value, say,
base:fffff000,limit:0, on a Mac Pro 12, then during pci bridge device resource probing in
pci_read_bridge_bases, then there is no 'bridge window [io xxx] displayed for this
pci bridge, such as 0000:06:00.0:
[ 0.385573] pci 0000:06:00.0: PCI bridge to [bus 07]
[ 0.385586] pci 0000:06:00.0: bridge window [mem 0xc1900000-0xc19fffff]
[ 0.385657] pci 0000:06:03.0: PCI bridge to [bus 08-38]
but later in pcibios_assign_resources, we found there is io request for this
pci bridge which wants to occupy some io resource:
[ 0.451679] pci 0000:06:00.0: bridge window [io 0x1000-0x0fff] to [bus 07] add_size 1000
which means this pci bridge tries to declaim a io resource region of size 0xfff(4k)
So how does this 0xfff come out? The simple answer is, pbus_size_io.
In above function, it checks every pci_device under specific pci bus,which is
the child of pci bridge, (BTW pci bus is pointed by pci_bridge->subordinate),
and check each sub pci device io resource and sum them together to determin
how large this pci bridge should contain a io resource. In above Mac Pro 12 case,
for pci bridge 0000:06:00.0, there are sub pci_device connected to it:
0000:07:00.0, however this pci_device does not have any io resource but only
mem resource:
[ 0.378985] pci 0000:07:00.0: [8086:156c] type 00 class 0x088000
[ 0.379008] pci 0000:07:00.0: reg 0x10: [mem 0xc1900000-0xc193ffff]
[ 0.379022] pci 0000:07:00.0: reg 0x14: [mem 0xc1940000-0xc1940fff]
So , pci bridge 0000:06:00.0 will not allocate any io resource in theory, but according to the log,
0000:06:00.0 is actually trying to declaim a io resource of size 0xfff, this is because,
0000:06:00.0 is a hotplug pci bridge, and the minumal io resource for such kind of pci device
is 0x100:
case PCI_CLASS_BRIDGE_PCI:
pci_bridge_check_ranges(bus);
if (bus->self->is_hotplug_bridge) {
additional_io_size = pci_hotplug_io_size;
additional_mem_size = pci_hotplug_mem_size;
}
/* Fall through */
default:
pbus_size_io(bus, realloc_head ? 0 : additional_io_size,
additional_io_size, realloc_head);
besides, in pbus_size_io, per spec, I/O windows are 4K-aligned:
static resource_size_t window_alignment(struct pci_bus *bus,
unsigned long type)
{
resource_size_t align = 1, arch_align;
if (type & IORESOURCE_MEM)
align = PCI_P2P_DEFAULT_MEM_ALIGN;
else if (type & IORESOURCE_IO) {
/*
* Per spec, I/O windows are 4K-aligned, but some
* bridges have an extension to support 1K alignment.
*/
if (bus->self->io_window_1k)
align = PCI_P2P_DEFAULT_IO_ALIGN_1K;
else
align = PCI_P2P_DEFAULT_IO_ALIGN;
}
So the finally request io resource is 0x1000, thus we see the
[ 0.451679] pci 0000:06:00.0: bridge window [io 0x1000-0x0fff] to [bus 07] add_size 1000
And the resource for this pci bridge has been updated to start = 0x1000, end = 0xfff with valid flag.
That is to say, when coming into pbus_size_io,
struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
IORESOURCE_IO);
will find a uninitialized resource with IO flag, and try to fill in it with adjust region.
for pci agent device, it is resource[3], and for bridge, it becomes resource[13].
Then after the traverse of each device under this bus, finally we set the probed total
resource len to resource[13], here is a log from boot up, which demonstrate this behavior:
[ 0.451922] pci 0000:06:00.0: Start scaning pci bridge
[ 0.451924] Before checking each sub device
[ 0.451927] pci 0000:06:00.0: dump pci device resource for BAR 0, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451931] pci 0000:06:00.0: dump pci device resource for BAR 1, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451935] pci 0000:06:00.0: dump pci device resource for BAR 2, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451940] pci 0000:06:00.0: dump pci device resource for BAR 3, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451944] pci 0000:06:00.0: dump pci device resource for BAR 4, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451949] pci 0000:06:00.0: dump pci device resource for BAR 5, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451953] pci 0000:06:00.0: dump pci device resource for BAR 6, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451957] pci 0000:06:00.0: dump pci device resource for BAR 7, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451962] pci 0000:06:00.0: dump pci device resource for BAR 8, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451966] pci 0000:06:00.0: dump pci device resource for BAR 9, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451971] pci 0000:06:00.0: dump pci device resource for BAR 10, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451975] pci 0000:06:00.0: dump pci device resource for BAR 11, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451979] pci 0000:06:00.0: dump pci device resource for BAR 12, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.451984] pci 0000:06:00.0: dump pci device resource for BAR 13, flags:100,start:0,end:0,[io 0x0000], parent: (null)
[ 0.451988] pci 0000:06:00.0: dump pci device resource for BAR 14, flags:200,start:c1900000,end:c19fffff,[mem 0xc1900000-0xc19fffff], parent:ffff880260a9e6f8
[ 0.451993] pci 0000:06:00.0: dump pci device resource for BAR 15, flags:102201,start:0,end:0,[mem 0x00000000 64bit pref], parent: (null)
[ 0.451997] pci 0000:06:00.0: dump pci device resource for BAR 16, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452001] pci 0000:06:00.0: End scaning pci bridge, size:0,size1:0, min_size:0,min_align:1000, add_size:100, children_add_size:0
[ 0.452006] pci 0000:06:00.0: size0:0, size1:1000
[ 0.452008] Before bridge window request io resource
[ 0.452010] pci 0000:06:00.0: dump pci device resource for BAR 0, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452015] pci 0000:06:00.0: dump pci device resource for BAR 1, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452019] pci 0000:06:00.0: dump pci device resource for BAR 2, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452024] pci 0000:06:00.0: dump pci device resource for BAR 3, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452028] pci 0000:06:00.0: dump pci device resource for BAR 4, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452032] pci 0000:06:00.0: dump pci device resource for BAR 5, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452037] pci 0000:06:00.0: dump pci device resource for BAR 6, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452041] pci 0000:06:00.0: dump pci device resource for BAR 7, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452045] pci 0000:06:00.0: dump pci device resource for BAR 8, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452050] pci 0000:06:00.0: dump pci device resource for BAR 9, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452054] pci 0000:06:00.0: dump pci device resource for BAR 10, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452058] pci 0000:06:00.0: dump pci device resource for BAR 11, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452063] pci 0000:06:00.0: dump pci device resource for BAR 12, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452067] pci 0000:06:00.0: dump pci device resource for BAR 13, flags:100,start:0,end:0,[io 0x0000], parent: (null)
[ 0.452072] pci 0000:06:00.0: dump pci device resource for BAR 14, flags:200,start:c1900000,end:c19fffff,[mem 0xc1900000-0xc19fffff], parent:ffff880260a9e6f8
[ 0.452076] pci 0000:06:00.0: dump pci device resource for BAR 15, flags:102201,start:0,end:0,[mem 0x00000000 64bit pref], parent: (null)
[ 0.452081] pci 0000:06:00.0: dump pci device resource for BAR 16, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452086] pci 0000:06:00.0: bridge window [io 0x1000-0x0fff] to [bus 07] add_size 1000
[ 0.452089] After bridge window request io resource
[ 0.452092] pci 0000:06:00.0: dump pci device resource for BAR 0, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452096] pci 0000:06:00.0: dump pci device resource for BAR 1, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452101] pci 0000:06:00.0: dump pci device resource for BAR 2, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452105] pci 0000:06:00.0: dump pci device resource for BAR 3, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452110] pci 0000:06:00.0: dump pci device resource for BAR 4, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452114] pci 0000:06:00.0: dump pci device resource for BAR 5, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452118] pci 0000:06:00.0: dump pci device resource for BAR 6, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452123] pci 0000:06:00.0: dump pci device resource for BAR 7, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452127] pci 0000:06:00.0: dump pci device resource for BAR 8, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452131] pci 0000:06:00.0: dump pci device resource for BAR 9, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452136] pci 0000:06:00.0: dump pci device resource for BAR 10, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452140] pci 0000:06:00.0: dump pci device resource for BAR 11, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452145] pci 0000:06:00.0: dump pci device resource for BAR 12, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.452149] pci 0000:06:00.0: dump pci device resource for BAR 13, flags:80100,start:1000,end:fff,[io 0x1000-0x0fff], parent: (null)
[ 0.452154] pci 0000:06:00.0: dump pci device resource for BAR 14, flags:200,start:c1900000,end:c19fffff,[mem 0xc1900000-0xc19fffff], parent:ffff880260a9e6f8
[ 0.452158] pci 0000:06:00.0: dump pci device resource for BAR 15, flags:102201,start:0,end:0,[mem 0x00000000 64bit pref], parent: (null)
[ 0.452163] pci 0000:06:00.0: dump pci device resource for BAR 16, flags:0,start:0,end:0,[??? 0x00000000 flags 0x0], parent: (null)
[ 0.455222] pci 0000:06:00.0: res[13]=[io 0x1000-0x0fff] res_to_dev_res add_size 1000 min_align 1000
[ 0.455286] pci 0000:06:00.0: BAR 13: no space for [io size 0x1000]
[ 0.455288] pci 0000:06:00.0: BAR 13: failed to assign [io size 0x1000]
actually the initial pci bridge io resource[13] flag is set to zero because it has invalid
base/limit pair, but later in real assignment in __pci_bus_size_bridges->pci_bridge_check_ranges,
the io resource[13] flag will be add IORESOURCE_IO to it if this pci bridge can access PCI_IO_BASE
register. So later we when we check the io resource status of this pci bridge, we will find a free(uninitialized
)resource[13] to set the resource[13].start with 0x1000, end 0xfff respectively. Finally we need to find a empty hole
in resource tree and allocate this region for this pci bridge, then we can see, because the linux failed to find a resource
with size 0x1000 under 0000:06:00.0's parent pci bus, it refused to assign io resource to this device:
[ 0.453733] pci 0000:06:00.0: BAR 13: no space for [io size 0x1000]
[ 0.453736] pci 0000:06:00.0: BAR 13: failed to assign [io size 0x1000]
and this code is in __pci_bus_assign_resources->assign_requested_resources_sorted->
pci_assign_resource->_pci_assign_resource->pci_bus_alloc_resource->
err = find_resource(root, new, size, &constraint);
if (err >= 0 && __request_resource(root, new))
if (ret < 0) {
dev_info(&dev->dev, "BAR %d: no space for %pR\n", resno, res);
ret = pci_revert_fw_address(res, dev, resno, size);
}
if (ret < 0) {
dev_info(&dev->dev, "BAR %d: failed to assign %pR\n", resno,
res);
return ret;
}
So, if we want to bypass one pci bridge device resource re-allocation, maybe we can simply
add some quirk in pci_bridge_check_ranges.
The following are resource dump on this platform:
[ 0.865618] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
[ 0.865619] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
[ 0.865620] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[ 0.865621] pci_bus 0000:00: resource 7 [mem 0x000c0000-0x000c3fff window]
[ 0.865622] pci_bus 0000:00: resource 8 [mem 0x000c4000-0x000c7fff window]
[ 0.865622] pci_bus 0000:00: resource 9 [mem 0x000c8000-0x000cbfff window]
[ 0.865623] pci_bus 0000:00: resource 10 [mem 0x000cc000-0x000cffff window]
[ 0.865624] pci_bus 0000:00: resource 11 [mem 0x000d0000-0x000d3fff window]
[ 0.865625] pci_bus 0000:00: resource 12 [mem 0x000d4000-0x000d7fff window]
[ 0.865625] pci_bus 0000:00: resource 13 [mem 0x000d8000-0x000dbfff window]
[ 0.865626] pci_bus 0000:00: resource 14 [mem 0x000dc000-0x000dffff window]
[ 0.865627] pci_bus 0000:00: resource 15 [mem 0x000e0000-0x000e3fff window]
[ 0.865627] pci_bus 0000:00: resource 16 [mem 0x000e4000-0x000e7fff window]
[ 0.865628] pci_bus 0000:00: resource 17 [mem 0x000e8000-0x000ebfff window]
[ 0.865629] pci_bus 0000:00: resource 18 [mem 0x000ec000-0x000effff window]
[ 0.865630] pci_bus 0000:00: resource 19 [mem 0x000f0000-0x000fffff window]
[ 0.865630] pci_bus 0000:00: resource 20 [mem 0x7fa00000-0xfeafffff window]
[ 0.865631] pci_bus 0000:00: resource 21 [mem 0xfed40000-0xfed44fff window]
[ 0.865632] pci_bus 0000:01: resource 1 [mem 0xa0b00000-0xa0bfffff]
[ 0.865633] pci_bus 0000:05: resource 0 [io 0x4000-0x6fff]
[ 0.865634] pci_bus 0000:05: resource 1 [mem 0xa0d00000-0xacdfffff]
[ 0.865635] pci_bus 0000:05: resource 2 [mem 0xace00000-0xb8dfffff 64bit pref]
[ 0.865636] pci_bus 0000:06: resource 0 [io 0x4000-0x5fff]
[ 0.865636] pci_bus 0000:06: resource 1 [mem 0xa0d00000-0xa8dfffff]
[ 0.865637] pci_bus 0000:06: resource 2 [mem 0xace00000-0xb4dfffff 64bit pref]
[ 0.865638] pci_bus 0000:07: resource 1 [mem 0xa0d00000-0xa0dfffff]
[ 0.865639] pci_bus 0000:08: resource 0 [io 0x4000-0x4fff]
[ 0.865639] pci_bus 0000:08: resource 1 [mem 0xa0e00000-0xa4dfffff]
[ 0.865640] pci_bus 0000:08: resource 2 [mem 0xace00000-0xb0dfffff 64bit pref]
[ 0.865641] pci_bus 0000:3a: resource 0 [io 0x5000-0x5fff]
[ 0.865642] pci_bus 0000:3a: resource 1 [mem 0xa4e00000-0xa8dfffff]
[ 0.865643] pci_bus 0000:3a: resource 2 [mem 0xb0e00000-0xb4dfffff 64bit pref]
[ 0.865644] pci_bus 0000:02: resource 0 [io 0x2000-0x2fff]
[ 0.865644] pci_bus 0000:02: resource 1 [mem 0x7fa00000-0x7fbfffff]
[ 0.865645] pci_bus 0000:02: resource 2 [mem 0x7fc00000-0x7fdfffff 64bit pref]
[ 0.865646] pci_bus 0000:03: resource 1 [mem 0xa0400000-0xa08fffff]
[ 0.865647] pci_bus 0000:04: resource 1 [mem 0xa0900000-0xa0afffff]
[ 0.865648] pci_bus 0000:04: resource 2 [mem 0x80000000-0x8fffffff 64bit pref]
Above info is dumped by
pci_assign_unassigned_root_bus_resources:pci_bus_dump_resources
OK, so actually there is a bug report that, the poweroff and suspend do not work on Mac Pro 11,
the problem is illustrated below,
https://patchwork.kernel.org/patch/9140867/
https://patchwork.kernel.org/patch/9143637/
So I post a patch to work around:
Currently there are many people reported that they can not
do a poweroff nor a suspend to memory on their Mac Pro 11.
After some investigations it was found that, once the PCI bridge
0000:00:1c.0 reassigns its mm windows([mem 0x7fa00000-0x7fbfffff]
and [mem 0x7fc00000-0x7fdfffff 64bit pref]), the region of ACPI
io resource 0x1804 becomes unaccessible immediately, where the
ACPI Sleep register is located, as a result neither poweroff(S5)
nor suspend to memory(S3) works.
I don't know why setting the base/limit of PCI bridge mem resource
would affect another io resource region, so this quirk just simply
bypass the assignment of these mm resources on 0000:00:1c.0, by
resetting the resource flag to 0 before updating the base/limit registers.
This patch also introduces a new pci fixup phase before the actual bridge
resource assignment.
And the PCI maintainer has some concerns:
Is this device *only* used on the Mac Pro 11? http://pci-ids.ucs.cz
says "8 Series/C220 Series Chipset Family PCI Express Root Port #1",
which sounds pretty generic.
And this is a good point, we might need a quirk. Besides, I also
illustrated the info in detail:
according to the boot logs, the pci bridge
in question has not declaimed any valid device/bridge resource
(both io and mem) during probe(because base=0xfff>limit=0x0),
so I think it has not hardware resource setting at that time(at least
in BIOS)
until it reaches pcibios_assign_resources and it has to allocate a
minimal io/mem resource, then it tries to assign them to
[mem 0x7fa00000 - 0x7fbfffff]
[mem 0x7fc00000-0x7fdfffff 64bit pref]
[io 0x2000-0x2fff],
so if we reset the flag to zero for these mem resource, the pci bridge
will not assign any pci mem windows for it
(in this way
find_free_bus_resource(bus, mask | IORESOURCE_PREFETCH, type)
will not return any free resource, thus bypass the assignment)
According to the boot log at
https://bugzilla.kernel.org/attachment.cgi?id=210141
, we can see there is no bridge windows assign for 0000:00:1c.0 during
early probe:
[ 0.807893] pci 0000:00:1c.0: [8086:8c10] type 01 class 0x060400
[ 0.807949] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
[ 0.831281] pci 0000:00:1c.0: PCI bridge to [bus 02]
and then in pcibios_assign_resources, 0000:00:1c.0 tries to allocate minimal
resource window and then update related base/limit registers:
[ 0.865342] pci 0000:00:1c.0: bridge window [io 0x1000-0x0fff] to
[bus 02] add_size 1000
[ 0.865343] pci 0000:00:1c.0: bridge window [mem
0x00100000-0x000fffff 64bit pref] to [bus 02] add_size 200000 add_align
100000
[ 0.865344] pci 0000:00:1c.0: bridge window [mem
0x00100000-0x000fffff] to [bus 02] add_size 200000 add_align 100000
Bjorn helped to give some clues on how to further debug:
Here are some ideas for debugging this:
1) Apparently the problem is sensitive to programming the prefetchable
memory aperture of 00:1c.0 to [mem 0x7fc00000-0x7fdfffff 64bit
pref]. The only effect of that *should* be that the bridge will
now claim accesses in the aperture, when it didn't before.
We *think* there's nothing else at the address of that aperture,
but if there is an unreported device there, it may stop working. I
would pore over the E820 memory map, EFI memory map, all ACPI _CRS
methods, etc., looking for anything in that area that we haven't
accounted for.
There are lots of anomalies in this system, e.g., (these are from
Bastien's dmesg log at
https://bugzilla.kernel.org/attachment.cgi?id=208961)
BIOS-e820: [mem 0x00000000e00f8000-0x00000000e00f8fff] reserved
PCI: MMCONFIG for domain 0000 [bus 00-9b] at [mem 0xe0000000-0xe9bfffff] (base 0xe0000000)
acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-9b] only partially covers this bridge
pci_bus 0000:00: root bus resource [mem 0x7fa00000-0xfeafffff window]
system 00:04: [mem 0xe0000000-0xefffffff] could not be reserved
ACPI Warning: SystemIO range 0x000000000000EFA0-0x000000000000EFBF conflicts with OpRegion 0x000000000000EFA0-0x000000000000EFAF (\_SB.PCI0.SBUS.SMBI)
The MMCONFIG area appears correctly described in the 00:04 _CRS,
but incorrectly in MCFG. The E820 region appears to be a chunk in
the middle of the MMCONFIG area. The host bridge window
[mem 0x7fa00000-0xfeafffff window] is clearly bogus -- it includes
the MMCONFIG area, which is definitely not a window. I doubt
anything above 0xdfffffff should be included.
I don't know what the ACPI conflict warning is about, but I'd try
to figure it out.
Either the firmware is badly broken, or we're not interpreting
something correctly.
I'd try assigning the 00:1c.0 aperture at the end of the actual
aperture instead of the beginning. I think Windows, and likely
MacOS uses a top-down allocation strategy instead of bottom-up like
Linux does. If that makes poweroff work, there is likely an
unreported device in the [mem 0x7fc00000-0x7fdfffff 64bit pref]
area.
This experimentation could all be done with setpci, without
requiring kernel patches.
2) I would explore exactly what the grub "halt" command does and how
it compares to what Linux is doing. I see the assertion that this
is related to [io 0x1804], but I don't know what that's based on.
Programming the 00:1c.0 prefetchable aperture shouldn't have
anything to do with I/O ports, so if it really is related, there
might be some SMM magic or something where SMM code is doing
something that relates to the memory aperture.
3) The lspci output at https://bugzilla.kernel.org/attachment.cgi?id=219321
(I think this is from MacOS) shows invalid data for devices
starting at 04:00.0. Why? Maybe this is an unrelated artifact,
but it doesn't smell right.
4) If you can hot-add devices under MacOS, look to see what address
space they get assigned. That may tell you what allocation
strategy MacOS uses.
5) 00:1c.0 claims to have a slot that supports hotplug. Is that
actually true? Could you add a device below it? If not, maybe the
problem is that the BIOS should have configured 00:1c.0 so it
doesn't report a slot. If it didn't report a slot, we shouldn't
assign resources to it, since there is no possibility of a device
below it.
Bjorn
> 5) 00:1c.0 claims to have a slot that supports hotplug. Is that
> actually true? Could you add a device below it? If not, maybe the
> problem is that the BIOS should have configured 00:1c.0 so it
> doesn't report a slot. If it didn't report a slot, we shouldn't
> assign resources to it, since there is no possibility of a device
> below it.
Of course, this would only be *part* of the problem, because a hot-added
device somewhere else could still be assigned the space at
[mem 0x7fc00000-0x7fdfffff].
This just smells like an unreported device in there somewhere.
OK, so Bjorn does not like this workaround to be merged upstream, and he gave a lot of debug suggestions,
let's look at them one by one:
First question:
The E820 region appears to be a chunk in
the middle of the MMCONFIG area
BIOS-e820: [mem 0x00000000e00f8000-0x00000000e00f8fff] reserved
PCI: MMCONFIG for domain 0000 [bus 00-9b] at [mem 0xe0000000-0xe9bfffff] (base 0xe0000000)
So what is MMCONFIG?
I have to confess that, I've searched through many articles but haven't find any in detail, but only
get a feeling that, this is a new machanism to read pci device's config space directly by memory.
So as we all know, the legacy method to read pci config is via register cf8 /cfc etc, but if we have
mmconfig enabled, the config space is in memory address, thus we can ioremap them and access it
directly. And the condition is that, the mmconfig space must inside e820 reserved region, otherwise,
we still use the legacy method to read config space.
arch_initcall(pci_arch_init);
static __init int pci_arch_init(void)
{
type = pci_direct_probe();
pci_mmcfg_early_init();
}
In pci_direct_probe, we try to set the ops which is used to access config space to
pci_direct_conf1(legacy access by cf8), then in pc_mmcfg_early_init try to change the ops to mmcfg ops.
void __init pci_mmcfg_early_init(void)
{
if (pci_probe & PCI_PROBE_MMCONF) {
if (pci_mmcfg_check_hostbridge())
known_bridge = 1;
else
acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
__pci_mmcfg_init(1);
set_apei_filter();
}
}
Then in above function, after enter pci_mmcfg_check_hostbridge , we first check
if there is any particular pci device which has customized mmcfg probe callbacks,
this is done by comparing the pre-defined array pci_mmcfg_probes,
static const struct pci_mmcfg_hostbridge_probe pci_mmcfg_probes[] __initconst = {
{ 0, PCI_DEVFN(0, 0), PCI_VENDOR_ID_INTEL,
PCI_DEVICE_ID_INTEL_E7520_MCH, pci_mmcfg_e7520 },
{ 0, PCI_DEVFN(0, 0), PCI_VENDOR_ID_INTEL,
PCI_DEVICE_ID_INTEL_82945G_HB, pci_mmcfg_intel_945 },
{ 0, PCI_DEVFN(0x18, 0), PCI_VENDOR_ID_AMD,
0x1200, pci_mmcfg_amd_fam10h },
{ 0xff, PCI_DEVFN(0, 0), PCI_VENDOR_ID_AMD,
0x1200, pci_mmcfg_amd_fam10h },
{ 0, PCI_DEVFN(0, 0), PCI_VENDOR_ID_NVIDIA,
0x0369, pci_mmcfg_nvidia_mcp55 },
};
Since we don't have these special pci devices, we check if there is any pci bridge already registered in the list of
pci_mmcfg_list , the answer is no, because there is no mmcfg probed yet.
Let's go back to pci_mmcfg_early_init, then we have to enumerating the ACPI table to find appropriate MCFG tables,
which contains the config info. Thus in
acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
so first try acpi table and if fails, fall into sfi to find the table of "MCFG"
if found, invoke pci_parse_mcfg.
static int __init pci_parse_mcfg(struct acpi_table_header *header)
{
struct acpi_table_mcfg *mcfg;
struct acpi_mcfg_allocation *cfg_table;
mcfg = (struct acpi_table_mcfg *)header;
cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
for (i = 0; i < entries; i++) {
cfg = &cfg_table[i];
if (acpi_mcfg_check_entry(mcfg, cfg)) {
free_all_mmcfg();
return -ENODEV;
}
if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number, cfg->address) == NULL) {
pr_warn(PREFIX "no memory for MCFG entries\n");
free_all_mmcfg();
return -ENOMEM;
}
}
}
So this function is to deal with our lovely "MCFG" table, it bypass the table header, then parse
the data load started at mcfg[1]. for each entry in the data load section, it is a structure of
struct acpi_mcfg_allocation, if it is a valid entry, then invoke pci_mmconfig_add to allocate
a new internal structure pci_mmcfg_region to add this
entry into list of &pci_mmcfg_list, and notice, the entry in &pci_mmcfg_list
are sorted by entry.start_bus, from left to right, get bigger. (list_add_tail(new, head) is used to add new before
head, this is useful to implement queue). And how do we determin the region scope of each pci_mmcfg_region?
it is in pci_mmconfig_alloc:
static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
int end, u64 addr)
{
struct pci_mmcfg_region *new;
struct resource *res;
if (addr == 0)
return NULL;
new = kzalloc(sizeof(*new), GFP_KERNEL);
if (!new)
return NULL;
new->address = addr;
new->segment = segment;
new->start_bus = start;
new->end_bus = end;
res = &new->res;
res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
"PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
res->name = new->name;
return new;
}
#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
OK, as pci bus number starts from zero, we know the firmware has reserved 1<<20 bytes thus 1M bytes for each pci bus number.
pr_info(PREFIX
"MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
"(base %#lx)\n",
segment, start, end, &new->res, (unsigned long)addr);
PCI: MMCONFIG for domain 0000 [bus 00-9b] at [mem 0xe0000000-0xe9bfffff] (base 0xe0000000)
OK we are a little far, let's back to pci_mmcfg_early_init, after the data load entry regions(start_bus, end_bus, addr) is added
into pci_mmcfg_list, we need to deal with these list entries, that is to say, if we want to access these configs in these regions,
we need to ioremap these address and provide a virtual address space for them. This is what
__pci_mmcfg_init(1); done for us.
static void __init __pci_mmcfg_init(int early)
{
pci_mmcfg_reject_broken(early);
if (list_empty(&pci_mmcfg_list))
return;
if (pcibios_last_bus < 0) {
const struct pci_mmcfg_region *cfg;
list_for_each_entry(cfg, &pci_mmcfg_list, list) {
if (cfg->segment)
break;
pcibios_last_bus = cfg->end_bus;
}
}
if (pci_mmcfg_arch_init())
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
else {
free_all_mmcfg();
pci_mmcfg_arch_init_failed = true;
}
}
First it is the sanitu check,
static void __init pci_mmcfg_reject_broken(int early)
{
struct pci_mmcfg_region *cfg;
list_for_each_entry(cfg, &pci_mmcfg_list, list) {
if (pci_mmcfg_check_reserved(NULL, cfg, early) == 0) {
pr_info(PREFIX "not using MMCONFIG\n");
free_all_mmcfg();
return;
}
}
}
For early check, e820 map is used to check if this new mmcfg region is inside any of the
E820_RESERVED region, if it not, shrink the mmcfg size and check again, starts checking
again from (start, start + size/2), etc. Until we reached a minumal region size, that is,
16<<20 (16K), since mmcfg should not be smaller than 16k, we return a failure.
In our example, the region of mmcfg is bigger than e820 reserved region:
BIOS-e820: [mem 0x00000000e00f8000-0x00000000e00f8fff] reserved
PCI: MMCONFIG for domain 0000 [bus 00-9b] at [mem 0xe0000000-0xe9bfffff] (base 0xe0000000)
We return error here. But yes, we know the start addr is 0xe00f0000, but how does 0xe9bf ffff come out?
Le
and failed in pci_mmcfg_reject_broken and free all the mmcfg entries in previous
pci_mmcfg_list, and got a warning:
[ 0.218319] PCI: not using MMCONFIG
thus __pci_mmcfg_init terminates, so does pci_mmcfg_early_init. as well as pci_arch_init,
we have to use legacy config opts callbacks, thus via cf8:
[ 0.218320] PCI: Using configuration type 1 for base access
Later we have another chance to probe mmcfg again. That is in :
subsys_initcall(acpi_init);
static int __init acpi_init(void)
{
pci_mmcfg_late_init();
acpi_scan_init();
...
}
void __init pci_mmcfg_late_init(void)
{
/* MMCONFIG disabled */
if ((pci_probe & PCI_PROBE_MMCONF) == 0)
return;
if (known_bridge)
return;
/* MMCONFIG hasn't been enabled yet, try again */
if (pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF) {
acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
__pci_mmcfg_init(0);
}
}
In the comment above, if MMCONFIG hasn't been enabled yet(if mmcfg initialized successfuly,
the pci_probe should be set with PCI_PROBE_MMCONF, and other bits are cleared),
-- in __pci_mmcfg_init, if pci_mmcfg_reject_broken passed, we have:(however we failed
in pci_mmcfg_reject_broken previously):
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
Back to pci_mmcfg_late_init, the acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); will
enumerate the mfg table again, and invoke __pci_mmcfg_init with different param 0, inidicating this
is not early init anymore:
static void __init __pci_mmcfg_init(int early)
{
pci_mmcfg_reject_broken(early);
if (list_empty(&pci_mmcfg_list))
return;
if (pcibios_last_bus < 0) {
const struct pci_mmcfg_region *cfg;
list_for_each_entry(cfg, &pci_mmcfg_list, list) {
if (cfg->segment)
break;
pcibios_last_bus = cfg->end_bus;
}
}
if (pci_mmcfg_arch_init())
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
else {
free_all_mmcfg();
pci_mmcfg_arch_init_failed = true;
}
}
Thus during traversing &pci_mmcfg_list, we check confliction by checking
"ACPI motherboard resources", rather than "E820".The checking callback is
is_acpi_reserved:
static int is_acpi_reserved(u64 start, u64 end, unsigned not_used)
{
struct resource mcfg_res;
mcfg_res.start = start;
mcfg_res.end = end - 1;
mcfg_res.flags = 0;
acpi_get_devices("PNP0C01", find_mboard_resource, &mcfg_res, NULL);
if (!mcfg_res.flags)
acpi_get_devices("PNP0C02", find_mboard_resource, &mcfg_res,
NULL);
return mcfg_res.flags;
}
So this confliction detection is implemented by comparing the motherboard resource owned by
ACPI device "PNP0C01", "PNP0C02", that is, checking the resource under _CRS of "PNP0C01",
compare if mcfg_res in strictly inside any of the _CRS resource region.
After we find the PNP0C01, check the _CRS inside it:
static acpi_status find_mboard_resource(acpi_handle handle, u32 lvl,
void *context, void **rv)
{
struct resource *mcfg_res = context;
acpi_walk_resources(handle, METHOD_NAME__CRS,
check_mcfg_resource, context);
if (mcfg_res->flags)
return AE_CTRL_TERMINATE;
return AE_OK;
}
for each resource in the _CRS, compare mcfg_res with it, if found one of the fixed memory crs region
contains mcfg_res, then return ok and terminates the walk resource, and finally set OK indicator.
thus back to __pci_mmcfg_init, we have passed the check of pci_mmcfg_reject_broken,
we need to ioremap these address in pci_mmcfg_arch_init:
cfg->virt = mcfg_ioremap(cfg);
How to remap? now we have start addr, start_bus, end_bus, thus we need to remap
from addr + start_bus*2M to addr + (end_bus - start_bus)*2M, with nocache attribute:
static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg)
{
void __iomem *addr;
u64 start, size;
int num_buses;
start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus);
num_buses = cfg->end_bus - cfg->start_bus + 1;
size = PCI_MMCFG_BUS_OFFSET(num_buses);
addr = ioremap_nocache(start, size);
if (addr)
addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus);
return addr;
}
notice, the addr mapped by ioremap_nocache, must be accessed by readw/writew, etc, because
the address to be mapped is likely to be pci bus address, and although the pci bus address is the same
for cpu address on x86, on other platforms this might not be the case.
After the remapping, the ops is updated to mmcfg ops:
raw_pci_ext_ops = &pci_mmcfg;
[ 0.238097] PCI: MMCONFIG for domain 0000 [bus 00-9b] at [mem 0xe0000000-0xe9bfffff] (base 0xe0000000)
[ 0.238420] PCI: MMCONFIG at [mem 0xe0000000-0xe9bfffff] reserved in ACPI motherboard resources
Then later we have:
[ 0.238426] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
So for some special platforms, we don't use _CRS, but in our case, we use _CRS by default.
Later we have:
[ 0.243396] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-9b] only partially covers this bridge
OK, this is printed during acpi pci device was enumerated.
Let's back to subsys_initcall(acpi_init);
static int __init acpi_init(void)
{
pci_mmcfg_late_init();
acpi_scan_init();
}
Above warning is printed in acpi_scan_init, and the code path is a little long:
int __init acpi_scan_init(void)
{
acpi_pci_root_init();
acpi_bus_scan(ACPI_ROOT_OBJECT);
}
acpi_pci_root_init :
acpi_scan_add_handler_with_hotplug(&pci_root_handler, "pci_root");
static const struct acpi_device_id root_device_ids[] = {
{"PNP0A03", 0},
{"", 0},
};
static struct acpi_scan_handler pci_root_handler = {
.ids = root_device_ids,
.attach = acpi_pci_root_add,
.detach = acpi_pci_root_remove,
.hotplug = {
.enabled = true,
.scan_dependent = acpi_pci_root_scan_dependent,
},
};
Thus we add pci_root_handler as a handler for "pci_root" device, which means, later in
acpi_bus_scan we will invoke the pci_root_handler once we encountered a "pci_root" device:
acpi_bus_attach(device);
then:
acpi_scan_attach_handler(device);
then:
acpi_pci_root_add,
in acpi_pci_root_add, we get the root bridge's downstream bus range,
by checking _CRS:
/* Check _CRS first, then _BBN. If no _BBN, default to zero. */
root->secondary.flags = IORESOURCE_BUS;
status = try_get_root_bridge_busnr(handle, &root->secondary);
So we know, once we find the root pci bus by PNP0A03, we check the _CRS in this
device, and the _CRS is just the bus region of PNP0A03's root pci bridge 's(PNP0A08) bus range.
Thus the bus range is stored in root->secondary, later we compare the mmcfg region with it:
then:
pci_acpi_scan_root
then:
acpi_pci_root_create
then:
acpi_pci_root_ops.init_info
then:
static int pci_acpi_root_init_info(struct acpi_pci_root_info *ci)
{
return setup_mcfg_map(ci);
}
Thus in setup_mcfg_map we tries to insert
static int setup_mcfg_map(struct acpi_pci_root_info *ci)
{
int result, seg;
struct pci_root_info *info;
struct acpi_pci_root *root = ci->root;
struct device *dev = &ci->bridge->dev;
info = container_of(ci, struct pci_root_info, common);
info->start_bus = (u8)root->secondary.start;
info->end_bus = (u8)root->secondary.end;
info->mcfg_added = false;
seg = info->sd.domain;
/* return success if MMCFG is not in use */
if (raw_pci_ext_ops && raw_pci_ext_ops != &pci_mmcfg)
return 0;
if (!(pci_probe & PCI_PROBE_MMCONF))
return check_segment(seg, dev, "MMCONFIG is disabled,");
result = pci_mmconfig_insert(dev, seg, info->start_bus, info->end_bus,
root->mcfg_addr);
if (result == 0) {
/* enable MMCFG if it hasn't been enabled yet */
if (raw_pci_ext_ops == NULL)
raw_pci_ext_ops = &pci_mmcfg;
info->mcfg_added = true;
} else if (result != -EEXIST)
return check_segment(seg, dev,
"fail to add MMCONFIG information,");
return 0;
}
/* Add MMCFG information for host bridges */
int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr)
{
cfg = pci_mmconfig_lookup(seg, start);
if (cfg) {
if (cfg->end_bus < end)
dev_info(dev, FW_INFO
"MMCONFIG for "
"domain %04x [bus %02x-%02x] "
"only partially covers this bridge\n",
cfg->segment, cfg->start_bus, cfg->end_bus);
mutex_unlock(&pci_mmcfg_lock);
return -EEXIST;
}
}
OK, so check if the start_bus,end_bus depicted by _CRS for root pci bridge is already included in
the mmcfg list, and only the following bus range(start,end) is considered to be valid:
cfg->start_bus <= start && cfg->end_bus >= end, as we have warning:
[ 0.243396] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-9b] only partially covers this bridge
The PNP0A08 is a pci root bridge, whose bug range comes from _CRS of pci root bus PNP0A03.
OK, so this smells like a firmware issue.
OK, back to original bug report,
In order to figure it out, we can bypass setting the memory aperture, then after boot up, set the aperture to another window.
But before that, I also tried with clear the io aperture after system bootup, it looks like take no effect:
# setpci -s 0000:00:1c.0 IO_BASE
20
# setpci -s 0000:00:1c.0 IO_LIMIT
20
# setpci -s 0000:00:1c.0 IO_BASE_UPPER16
0000
# setpci -s 0000:00:1c.0 IO_LIMIT_UPPER16
0000
3.
# setpci -s 0000:00:1c.0 IO_BASE.B=f0
# setpci -s 0000:00:1c.0 IO_LIMIT.B=0
# setpci -s 0000:00:1c.0 IO_BASE_UPPER16.W=0
# setpci -s 0000:00:1c.0 IO_LIMIT_UPPER16.W=0
4.
# setpci -s 0000:00:1c.0 IO_BASE
f0
# setpci -s 0000:00:1c.0 IO_LIMIT
00
# setpci -s 0000:00:1c.0 IO_BASE_UPPER16
0000
# setpci -s 0000:00:1c.0 IO_LIMIT_UPPER16
0000
OK then Bjorn suggested to leave the bridge alone without declaring any aperturns, then we did:
1.
setpci -s 0000:00:1c.0 MEMORY_BASE
setpci -s 0000:00:1c.0 MEMORY_LIMIT
setpci -s 0000:00:1c.0 PREF_MEMORY_BASE
setpci -s 0000:00:1c.0 PREF_MEMORY_LIMIT
2.
setpci -s 0000:00:1c.0 MEMORY_BASE.W=f000
setpci -s 0000:00:1c.0 MEMORY_LIMIT.W=f020
setpci -s 0000:00:1c.0 PREF_MEMORY_BASE.W=f020
setpci -s 0000:00:1c.0 PREF_MEMORY_LIMIT.w=f040
3.
setpci -s 0000:00:1c.0 MEMORY_BASE
setpci -s 0000:00:1c.0 MEMORY_LIMIT
setpci -s 0000:00:1c.0 PREF_MEMORY_BASE
setpci -s 0000:00:1c.0 PREF_MEMORY_LIMIT
So no one should use
[f0000000-f01fffff] and [f0200000 - f03fffff pre]
because neither e820 nor iomem has declaired this region:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000057fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000058000-0x0000000000058fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000059000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000bffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000078d00fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000078d01000-0x0000000078d48fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000078d49000-0x0000000078d5cfff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000078d5d000-0x0000000078d8efff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x0000000078d8f000-0x0000000078e39fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000078e3a000-0x0000000078e8efff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000078e8f000-0x0000000078ecbfff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000078ecc000-0x0000000078efefff] type 20
[ 0.000000] BIOS-e820: [mem 0x0000000078eff000-0x0000000078f87fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000078f88000-0x0000000078fdefff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000078fdf000-0x0000000078ffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000079000000-0x000000007f9fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000e00f8000-0x00000000e00f8fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ffd70000-0x00000000ffd9ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000047f5fffff] usable
00000000-00000fff : reserved
00001000-00057fff : System RAM
00058000-00058fff : reserved
00059000-0009ffff : System RAM
000a0000-000bffff : PCI Bus 0000:00
000a0000-000bffff : reserved
000c0000-000c3fff : PCI Bus 0000:00
000c4000-000c7fff : PCI Bus 0000:00
000c8000-000cbfff : PCI Bus 0000:00
000cc000-000cffff : PCI Bus 0000:00
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000dffff : PCI Bus 0000:00
000e0000-000e3fff : PCI Bus 0000:00
000e4000-000e7fff : PCI Bus 0000:00
000e8000-000ebfff : PCI Bus 0000:00
000ec000-000effff : PCI Bus 0000:00
000f0000-000fffff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-78d00fff : System RAM
01a00000-0226647c : Kernel code
0226647d-02bb843f : Kernel data
02da4000-02fe9fff : Kernel bss
78d01000-78d48fff : ACPI Non-volatile Storage
78d49000-78d5cfff : System RAM
78d5d000-78d8efff : ACPI Tables
78d8f000-78e39fff : System RAM
78e3a000-78e8efff : reserved
78e8f000-78ecbfff : System RAM
78ecc000-78efefff : reserved
78eff000-78f87fff : System RAM
78f88000-78fdefff : reserved
78fdf000-78ffffff : System RAM
79000000-7f9fffff : reserved
7fa00000-feafffff : PCI Bus 0000:00
7fa00000-7fbfffff : PCI Bus 0000:02
7fc00000-7fdfffff : PCI Bus 0000:02
80000000-8fffffff : PCI Bus 0000:04
80000000-8fffffff : 0000:04:00.0
80000000-8fffffff : S2 MEM
90000000-9fffffff : 0000:00:02.0
90000000-91436fff : BOOTFB
a0000000-a03fffff : 0000:00:02.0
a0400000-a08fffff : PCI Bus 0000:03
a0400000-a07fffff : 0000:03:00.0
a0800000-a0807fff : 0000:03:00.0
a0900000-a0afffff : PCI Bus 0000:04
a0900000-a09fffff : 0000:04:00.0
a0900000-a09fffff : ISP IO
a0a00000-a0a0ffff : 0000:04:00.0
a0a00000-a0a0ffff : S2 IO
a0b00000-a0bfffff : PCI Bus 0000:01
a0b00000-a0b01fff : 0000:01:00.0
a0b00000-a0b01fff : ahci
a0c00000-a0c0ffff : 0000:00:14.0
a0c00000-a0c0ffff : xhci-hcd
a0c10000-a0c13fff : 0000:00:03.0
a0c10000-a0c13fff : ICH HD audio
a0c14000-a0c17fff : 0000:00:1b.0
a0c14000-a0c17fff : ICH HD audio
a0c18000-a0c18fff : 0000:00:1f.6
a0c19000-a0c190ff : 0000:00:1f.3
a0c19100-a0c1910f : 0000:00:16.0
a0d00000-acdfffff : PCI Bus 0000:05
a0d00000-a8dfffff : PCI Bus 0000:06
a0d00000-a0dfffff : PCI Bus 0000:07
a0d00000-a0d3ffff : 0000:07:00.0
a0d00000-a0d3ffff : thunderbolt
a0d40000-a0d40fff : 0000:07:00.0
a0e00000-a4dfffff : PCI Bus 0000:08
a0e00000-a12fffff : PCI Bus 0000:09
a0e00000-a12fffff : PCI Bus 0000:0a
a0e00000-a0e0ffff : 0000:0a:00.0
a4e00000-a8dfffff : PCI Bus 0000:3a
ace00000-b8efffff : PCI Bus 0000:05
ace00000-b4efffff : PCI Bus 0000:06
ace00000-b0efffff : PCI Bus 0000:08
ace00000-acefffff : PCI Bus 0000:09
ace00000-acefffff : PCI Bus 0000:0a
ace00000-ace0ffff : 0000:0a:00.0
ace00000-ace0ffff : tg3
ace10000-ace1ffff : 0000:0a:00.0
ace10000-ace1ffff : tg3
b0f00000-b4efffff : PCI Bus 0000:3a
e00f8000-e00f8fff : reserved
fec00000-fec003ff : IOAPIC 0
fed00000-fed03fff : pnp 00:00
fed00000-fed003ff : HPET 0
fed10000-fed17fff : pnp 00:04
fed18000-fed18fff : pnp 00:04
fed19000-fed19fff : pnp 00:04
fed1c000-fed1ffff : reserved
fed1c000-fed1ffff : pnp 00:04
fed1f410-fed1f414 : iTCO_wdt.0.auto
fed20000-fed3ffff : pnp 00:04
fed40000-fed44fff : PCI Bus 0000:00
fed45000-fed8ffff : pnp 00:04
fed90000-fed93fff : pnp 00:04
fed90000-fed90fff : dmar0
fed91000-fed91fff : dmar1
fee00000-feefffff : pnp 00:04
fee00000-fee00fff : Local APIC
fef00000-fef0ffff : APP0001:00
ff000000-ffffffff : INT0800:00
ffd70000-ffd9ffff : reserved
100000000-47f5fffff : System RAM
47f600000-47fffffff : RAM buffer
And the following is my response to Bjorn:
I've grep CRS in the table, there are mainly two kinds of CRS provided:
1st is dynamically allocated, since the problematic mem aperture are not
inside any of the e820 regions, this is not the cause.
2nd is Memory32Fixed resource template, and I do not see any conflict with
this mem aperture.
- show quoted text -
I think firmware has really given some incompatible info to the Linux, and
as you suggested, I've replied in the Bugzilla to try setting the aperture to
other windows than the problematic address, by setpci command, and let's see
what will happen then.
The SystemIO error message above is printed if the ACPI Operation Region conflicts with
any native driver, but since 1804 is very far from efa0, I don't know if this has
impact on that.
And another important info is that, on another similar Macbook Pro 12 on which
poweroff works(the problematic one is MacBook Pro 11), we still see above MMCFG conflict with
e820 and host bridge mem window, as well as SystemIO warnings. But the difference is
that, MacBook Pro 12 does not have this pci bridge(8c10). So I suspect this is
highly related to this pci device.
Related bugzilla on Macbook Pro 12 at:
https://bugzilla.kernel.org/show_bug.cgi?id=101681
dmesg at:
https://bugzilla.kernel.org/attachment.cgi?id=185781
- show quoted text -
Previously I have enabled the grub debug feature, it shows that when halt is invoked, it tries
to halt by filling the ACPI Sleep Register at ioport 0x1804 with S5. And in linux,
we implement poweroff by the same method, i.e., outb _S5 to 0x1804.
Yes, maybe this is related to something like SMM.
- show quoted text -
Mac Os is not using this device, humm, I did not quite understand, how to add
a device below it, do we need to hack the hardware?
Since this issue is likely caused by suspicious device using that address, or
maybe that pci bridge device is actually broken, since it only happens on Mac Pro 11,
I'm still thinking if dmi+quirk would be the simplest workaround to make it work
for now.