Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com --- Documentation/PCI/00-INDEX | 2 + Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 147231f..0780280 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX - this file +acpi-info.txt + - info on how PCI host bridges are represented in ACPI MSI-HOWTO.txt - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. PCIEBUS-HOWTO.txt diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt new file mode 100644 index 0000000..ccbcfda --- /dev/null +++ b/Documentation/PCI/acpi-info.txt @@ -0,0 +1,136 @@ + ACPI considerations for PCI host bridges + +The basic requirement is that the ACPI namespace should describe +*everything* that consumes address space unless there's another +standard way for the OS to find it [1, 2]. For example, windows that +are forwarded to PCI by a PCI host bridge should be described via ACPI +devices, since the OS can't locate the host bridge by itself. PCI +devices *below* the host bridge do not need to be described via ACPI, +because the resources they consume are inside the host bridge windows, +and the OS can discover them via the standard PCI enumeration +mechanism (using config accesses to read and size the BARs). + +This ACPI resource description is done via _CRS methods of devices in +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs: +the OS can read _CRS and figure out what resource is being consumed +even if it doesn't have a driver for the device [3]. That's important +because it means an old OS can work correctly even on a system with +new devices unknown to the OS. The new devices won't do anything, but +the OS can at least make sure no resources conflict with them. + +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI namespace. +If a new table is defined, an old OS needs to operate correctly even +though it ignores the table. _CRS allows that because it is generic +and understood by the old OS; a static table does not. + +If the OS is expected to manage an ACPI device, that device will have +a specific _HID/_CID that tells the OS what driver to bind to it, and +the _CRS tells the OS and the driver where the device's registers are. + +PNP0C02 "motherboard" devices are basically a catch-all. There's no +programming model for them other than "don't use these resources for +anything else." So any address space that is (1) not claimed by some +other ACPI device and (2) should not be assigned by the OS to +something else, should be claimed by a PNP0C02 _CRS method. + +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge. + +ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself. + +The workaround is to describe the bridge registers (including ECAM +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM, +the bridge register space is device-specific anyway, so the generic +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For +ECAM, pci_root.c learns about the space from either MCFG or the _CBA +method. + +Note that the PCIe spec actually does require ECAM unless there's a +standard firmware interface for config access, e.g., the ia64 SAL +interface [7]. One reason is that we want a generic host bridge +driver (pci_root.c), and a generic driver requires a generic way to +access config space. + + +[1] ACPI 6.0, sec 6.1: + For any device that is on a non-enumerable type of bus (for + example, an ISA bus), OSPM enumerates the devices' identifier(s) + and the ACPI system firmware must supply an _HID object ... for + each device to enable OSPM to do that. + +[2] ACPI 6.0, sec 3.7: + The OS enumerates motherboard devices simply by reading through + the ACPI Namespace looking for devices with hardware IDs. + + Each device enumerated by ACPI includes ACPI-defined objects in + the ACPI Namespace that report the hardware resources the device + could occupy [_PRS], an object that reports the resources that are + currently used by the device [_CRS], and objects for configuring + those resources [_SRS]. The information is used by the Plug and + Play OS (OSPM) to configure the devices. + +[3] ACPI 6.0, sec 6.2: + OSPM uses device configuration objects to configure hardware + resources for devices enumerated via ACPI. Device configuration + objects provide information about current and possible resource + requirements, the relationship between shared resources, and + methods for configuring hardware resources. + + When OSPM enumerates a device, it calls _PRS to determine the + resource requirements of the device. It may also call _CRS to + find the current resource settings for the device. Using this + information, the Plug and Play system determines what resources + the device should consume and sets those resources by calling the + device’s _SRS control method. + + In ACPI, devices can consume resources (for example, legacy + keyboards), provide resources (for example, a proprietary PCI + bridge), or do both. Unless otherwise specified, resources for a + device are assumed to be taken from the nearest matching resource + above the device in the device hierarchy. + +[4] ACPI 6.0, sec 6.4.3.5.4: + Extended Address Space Descriptor + General Flags: Bit [0] Consumer/Producer: + 1–This device consumes this resource + 0–This device produces and consumes this resource + +[5] ACPI 6.0, sec 19.6.43: + ResourceUsage specifies whether the Memory range is consumed by + this device (ResourceConsumer) or passed on to child devices + (ResourceProducer). If nothing is specified, then + ResourceConsumer is assumed. + +[6] PCI Firmware 3.0, sec 4.1.2: + If the operating system does not natively comprehend reserving the + MMCFG region, the MMCFG region must be reserved by firmware. The + address range reported in the MCFG table or by _CBA method (see + Section 4.1.3) must be reserved by declaring a motherboard + resource. For most systems, the motherboard resource would appear + at the root of the ACPI namespace (under _SB) in a node with a + _HID of EISAID (PNP0C02), and the resources in this case should + not be claimed in the root PCI bus’s _CRS. The resources can + optionally be returned in Int15 E820 or EFIGetMemoryMap as + reserved memory but must always be reported through ACPI as a + motherboard resource. + +[7] PCI Express 3.0, sec 7.2.2: + For systems that are PC-compatible, or that do not implement a + processor-architecture-specific firmware interface standard that + allows access to the Configuration Space, the ECAM is required as + defined in this section.
Hi Bjorn
Many thanks for putting this together, it really helps!
One thing below..
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 17 November 2016 18:00 To: linux-pci@vger.kernel.org Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux- arm-kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: [PATCH] PCI: Add information about describing PCI in ACPI
Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com
Documentation/PCI/00-INDEX | 2 + Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 147231f..0780280 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX
- this file
+acpi-info.txt
- info on how PCI host bridges are represented in ACPI
MSI-HOWTO.txt
- the Message Signaled Interrupts (MSI) Driver Guide HOWTO and
FAQ. PCIEBUS-HOWTO.txt diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi- info.txt new file mode 100644 index 0000000..ccbcfda --- /dev/null +++ b/Documentation/PCI/acpi-info.txt @@ -0,0 +1,136 @@
ACPI considerations for PCI host bridges
+The basic requirement is that the ACPI namespace should describe +*everything* that consumes address space unless there's another +standard way for the OS to find it [1, 2]. For example, windows that +are forwarded to PCI by a PCI host bridge should be described via ACPI +devices, since the OS can't locate the host bridge by itself. PCI +devices *below* the host bridge do not need to be described via ACPI, +because the resources they consume are inside the host bridge windows, +and the OS can discover them via the standard PCI enumeration +mechanism (using config accesses to read and size the BARs).
+This ACPI resource description is done via _CRS methods of devices in +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs: +the OS can read _CRS and figure out what resource is being consumed +even if it doesn't have a driver for the device [3]. That's important +because it means an old OS can work correctly even on a system with +new devices unknown to the OS. The new devices won't do anything, but +the OS can at least make sure no resources conflict with them.
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI namespace. +If a new table is defined, an old OS needs to operate correctly even +though it ignores the table. _CRS allows that because it is generic +and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that resources described in the MCFG table should also be declared in PNP0C02 devices so that the PNP driver can reserve these resources.
On the other side the PCI Root bridge driver should not reserve such resources.
Well if my understanding is correct I think we have a problem here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp driver as it will try to reserve the resources from the MCFG table...
Maybe we need to rework pci_ecam_create() ?
Thanks
Gab
+If the OS is expected to manage an ACPI device, that device will have +a specific _HID/_CID that tells the OS what driver to bind to it, and +the _CRS tells the OS and the driver where the device's registers are.
+PNP0C02 "motherboard" devices are basically a catch-all. There's no +programming model for them other than "don't use these resources for +anything else." So any address space that is (1) not claimed by some +other ACPI device and (2) should not be assigned by the OS to +something else, should be claimed by a PNP0C02 _CRS method.
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
+The workaround is to describe the bridge registers (including ECAM +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM, +the bridge register space is device-specific anyway, so the generic +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For +ECAM, pci_root.c learns about the space from either MCFG or the _CBA +method.
+Note that the PCIe spec actually does require ECAM unless there's a +standard firmware interface for config access, e.g., the ia64 SAL +interface [7]. One reason is that we want a generic host bridge +driver (pci_root.c), and a generic driver requires a generic way to +access config space.
+[1] ACPI 6.0, sec 6.1:
- For any device that is on a non-enumerable type of bus (for
- example, an ISA bus), OSPM enumerates the devices' identifier(s)
- and the ACPI system firmware must supply an _HID object ... for
- each device to enable OSPM to do that.
+[2] ACPI 6.0, sec 3.7:
- The OS enumerates motherboard devices simply by reading through
- the ACPI Namespace looking for devices with hardware IDs.
- Each device enumerated by ACPI includes ACPI-defined objects in
- the ACPI Namespace that report the hardware resources the device
- could occupy [_PRS], an object that reports the resources that are
- currently used by the device [_CRS], and objects for configuring
- those resources [_SRS]. The information is used by the Plug and
- Play OS (OSPM) to configure the devices.
+[3] ACPI 6.0, sec 6.2:
- OSPM uses device configuration objects to configure hardware
- resources for devices enumerated via ACPI. Device configuration
- objects provide information about current and possible resource
- requirements, the relationship between shared resources, and
- methods for configuring hardware resources.
- When OSPM enumerates a device, it calls _PRS to determine the
- resource requirements of the device. It may also call _CRS to
- find the current resource settings for the device. Using this
- information, the Plug and Play system determines what resources
- the device should consume and sets those resources by calling the
- device’s _SRS control method.
- In ACPI, devices can consume resources (for example, legacy
- keyboards), provide resources (for example, a proprietary PCI
- bridge), or do both. Unless otherwise specified, resources for a
- device are assumed to be taken from the nearest matching resource
- above the device in the device hierarchy.
+[4] ACPI 6.0, sec 6.4.3.5.4:
- Extended Address Space Descriptor
- General Flags: Bit [0] Consumer/Producer:
- 1–This device consumes this resource
- 0–This device produces and consumes this resource
+[5] ACPI 6.0, sec 19.6.43:
- ResourceUsage specifies whether the Memory range is consumed by
- this device (ResourceConsumer) or passed on to child devices
- (ResourceProducer). If nothing is specified, then
- ResourceConsumer is assumed.
+[6] PCI Firmware 3.0, sec 4.1.2:
- If the operating system does not natively comprehend reserving the
- MMCFG region, the MMCFG region must be reserved by firmware. The
- address range reported in the MCFG table or by _CBA method (see
- Section 4.1.3) must be reserved by declaring a motherboard
- resource. For most systems, the motherboard resource would appear
- at the root of the ACPI namespace (under _SB) in a node with a
- _HID of EISAID (PNP0C02), and the resources in this case should
- not be claimed in the root PCI bus’s _CRS. The resources can
- optionally be returned in Int15 E820 or EFIGetMemoryMap as
- reserved memory but must always be reported through ACPI as a
- motherboard resource.
+[7] PCI Express 3.0, sec 7.2.2:
- For systems that are PC-compatible, or that do not implement a
- processor-architecture-specific firmware interface standard that
- allows access to the Configuration Space, the ECAM is required as
- defined in this section.
On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 17 November 2016 18:00
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI namespace. +If a new table is defined, an old OS needs to operate correctly even +though it ignores the table. _CRS allows that because it is generic +and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that resources described in the MCFG table should also be declared in PNP0C02 devices so that the PNP driver can reserve these resources.
Yes.
On the other side the PCI Root bridge driver should not reserve such resources.
Well if my understanding is correct I think we have a problem here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp driver as it will try to reserve the resources from the MCFG table...
Maybe we need to rework pci_ecam_create() ?
I think it's OK as it is.
The pnp/system.c driver does try to reserve PNP0C02 resources, and it marks them as "not busy". That way they appear in /proc/iomem and won't be allocated for anything else, but they can still be requested by drivers, e.g., pci/ecam.c, which will mark them "busy".
This is analogous to what the PCI core does in pci_claim_resource(). This is really a function of the ACPI/PNP *core*, which should reserve all _CRS resources for all devices (not just PNP0C02 devices). But it's done by pnp/system.c, and only for PNP0C02, because there's a bunch of historical baggage there.
You'll also notice that in this case, things are out of order: logically the pnp/system.c reservation should happen first, but in fact the pci/ecam.c request happens *before* the pnp/system.c one. That means the pnp/system.c one might fail and complain "[mem ...] could not be reserved".
Bjorn
Hi Bjorn
-----Original Message----- From: Bjorn Helgaas [mailto:helgaas@kernel.org] Sent: 18 November 2016 17:54 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 17 November 2016 18:00
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms
for
+reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI
namespace.
+If a new table is defined, an old OS needs to operate correctly
even
+though it ignores the table. _CRS allows that because it is
generic
+and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that resources described in the MCFG table should also be declared in PNP0C02
devices
so that the PNP driver can reserve these resources.
Yes.
On the other side the PCI Root bridge driver should not reserve such resources.
Well if my understanding is correct I think we have a problem here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp driver as it will try to reserve the resources from the MCFG table...
Maybe we need to rework pci_ecam_create() ?
I think it's OK as it is.
The pnp/system.c driver does try to reserve PNP0C02 resources, and it marks them as "not busy". That way they appear in /proc/iomem and won't be allocated for anything else, but they can still be requested by drivers, e.g., pci/ecam.c, which will mark them "busy".
This is analogous to what the PCI core does in pci_claim_resource(). This is really a function of the ACPI/PNP *core*, which should reserve all _CRS resources for all devices (not just PNP0C02 devices). But it's done by pnp/system.c, and only for PNP0C02, because there's a bunch of historical baggage there.
You'll also notice that in this case, things are out of order: logically the pnp/system.c reservation should happen first, but in fact the pci/ecam.c request happens *before* the pnp/system.c one. That means the pnp/system.c one might fail and complain "[mem ...] could not be reserved".
Correct me if I am wrong...
So currently we are relying on the fact that pci_ecam_create() is called before the pnp driver. If the pnp driver came first we would end up in pci_ecam_create() failing here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
I am not sure but it seems to me like a bit weak condition to rely on... what about removing the error condition in pci_ecam_create() and logging just a dev_info()?
Thanks
Gab
Bjorn
On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
Hi Bjorn
-----Original Message----- From: Bjorn Helgaas [mailto:helgaas@kernel.org] Sent: 18 November 2016 17:54 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 17 November 2016 18:00
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms
for
+reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI
namespace.
+If a new table is defined, an old OS needs to operate correctly
even
+though it ignores the table. _CRS allows that because it is
generic
+and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that resources described in the MCFG table should also be declared in PNP0C02
devices
so that the PNP driver can reserve these resources.
Yes.
On the other side the PCI Root bridge driver should not reserve such resources.
Well if my understanding is correct I think we have a problem here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp driver as it will try to reserve the resources from the MCFG table...
Maybe we need to rework pci_ecam_create() ?
I think it's OK as it is.
The pnp/system.c driver does try to reserve PNP0C02 resources, and it marks them as "not busy". That way they appear in /proc/iomem and won't be allocated for anything else, but they can still be requested by drivers, e.g., pci/ecam.c, which will mark them "busy".
This is analogous to what the PCI core does in pci_claim_resource(). This is really a function of the ACPI/PNP *core*, which should reserve all _CRS resources for all devices (not just PNP0C02 devices). But it's done by pnp/system.c, and only for PNP0C02, because there's a bunch of historical baggage there.
You'll also notice that in this case, things are out of order: logically the pnp/system.c reservation should happen first, but in fact the pci/ecam.c request happens *before* the pnp/system.c one. That means the pnp/system.c one might fail and complain "[mem ...] could not be reserved".
Correct me if I am wrong...
So currently we are relying on the fact that pci_ecam_create() is called before the pnp driver. If the pnp driver came first we would end up in pci_ecam_create() failing here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
I am not sure but it seems to me like a bit weak condition to rely on... what about removing the error condition in pci_ecam_create() and logging just a dev_info()?
Huh. I'm confused. I *thought* it would be safe to reverse the order, which would effectively be this:
system_pnp_probe reserve_resources_of_dev reserve_range request_mem_region([mem 0xb0000000-0xb1ffffff]) ... pci_ecam_create request_resource_conflict([mem 0xb0000000-0xb1ffffff])
but I experimented with the patch below on qemu, and it failed as you predicted:
** res test ** requested [mem 0xa0000000-0xafffffff] can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with ECAM PNP [mem 0xa0000000-0xafffffff]
I expected the request_resource_conflict() to succeed since it's completely contained in the "ECAM PNP" region. But I guess I don't understand kernel/resource.c well enough.
I'm not sure we need to fix anything yet, since we currently do the ecam.c request before the system.c one, and any change there would be a long ways off. If/when that *does* change, I think the correct fix would be to change ecam.c so its request succeeds (by changing the way it does the request, fixing kernel/resource.c, or whatever) rather than to reduce the log level and ignore the failure.
Bjorn
diff --git a/arch/x86/pci/init.c b/arch/x86/pci/init.c index adb62aa..5a35638 100644 --- a/arch/x86/pci/init.c +++ b/arch/x86/pci/init.c @@ -7,6 +7,8 @@ in the right sequence from here. */ static __init int pci_arch_init(void) { + struct resource *res, *conflict; + static struct resource cfg; #ifdef CONFIG_PCI_DIRECT int type = 0;
@@ -39,6 +41,26 @@ static __init int pci_arch_init(void)
dmi_check_skip_isa_align();
+ printk("\n** res test **\n"); + + res = request_mem_region(0xa0000000, 0x10000000, "ECAM PNP"); + printk("requested %pR\n", res); + if (!res) + return 0; + res->flags &= ~IORESOURCE_BUSY; + + cfg.start = 0xa0000000; + cfg.end = 0xafffffff; + cfg.flags = IORESOURCE_MEM | IORESOURCE_BUSY; + cfg.name = "PCI ECAM"; + + conflict = request_resource_conflict(&iomem_resource, &cfg); + if (conflict) + printk("can't claim ECAM area %pR: conflict with %s %pR\n", + &cfg, conflict->name, conflict); + + printk("\n"); + return 0; } arch_initcall(pci_arch_init);
Hi Bjorn
-----Original Message----- From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 21 November 2016 16:47 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
Hi Bjorn
-----Original Message----- From: Bjorn Helgaas [mailto:helgaas@kernel.org] Sent: 18 November 2016 17:54 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in
ACPI
On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 17 November 2016 18:00
+Static tables like MCFG, HPET, ECDT, etc., are *not*
mechanisms
for
+reserving address space! The static tables are for things the
OS
+needs to know early in boot, before it can parse the ACPI
namespace.
+If a new table is defined, an old OS needs to operate
correctly
even
+though it ignores the table. _CRS allows that because it is
generic
+and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that
resources
described in the MCFG table should also be declared in PNP0C02
devices
so that the PNP driver can reserve these resources.
Yes.
On the other side the PCI Root bridge driver should not reserve
such
resources.
Well if my understanding is correct I think we have a problem
here:
http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp
driver
as it will try to reserve the resources from the MCFG table...
Maybe we need to rework pci_ecam_create() ?
I think it's OK as it is.
The pnp/system.c driver does try to reserve PNP0C02 resources, and
it
marks them as "not busy". That way they appear in /proc/iomem and won't be allocated for anything else, but they can still be
requested
by drivers, e.g., pci/ecam.c, which will mark them "busy".
This is analogous to what the PCI core does in
pci_claim_resource().
This is really a function of the ACPI/PNP *core*, which should
reserve
all _CRS resources for all devices (not just PNP0C02 devices). But it's done by pnp/system.c, and only for PNP0C02, because there's a bunch of historical baggage there.
You'll also notice that in this case, things are out of order: logically the pnp/system.c reservation should happen first, but in fact the pci/ecam.c request happens *before* the pnp/system.c one. That means the pnp/system.c one might fail and complain "[mem ...] could not be reserved".
Correct me if I am wrong...
So currently we are relying on the fact that pci_ecam_create() is
called
before the pnp driver. If the pnp driver came first we would end up in pci_ecam_create()
failing
here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
I am not sure but it seems to me like a bit weak condition to rely
on...
what about removing the error condition in pci_ecam_create() and
logging
just a dev_info()?
Huh. I'm confused. I *thought* it would be safe to reverse the order, which would effectively be this:
system_pnp_probe reserve_resources_of_dev reserve_range request_mem_region([mem 0xb0000000-0xb1ffffff]) ... pci_ecam_create request_resource_conflict([mem 0xb0000000-0xb1ffffff])
but I experimented with the patch below on qemu, and it failed as you predicted:
** res test ** requested [mem 0xa0000000-0xafffffff] can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with ECAM PNP [mem 0xa0000000-0xafffffff]
I expected the request_resource_conflict() to succeed since it's completely contained in the "ECAM PNP" region. But I guess I don't understand kernel/resource.c well enough.
I think it fails because effectively the PNP driver is populating the iomem_resource resource tree and therefore pci_ecam_create() finds that it cannot add the cfg resource to the same hierarchy as it is already there...
I'm not sure we need to fix anything yet, since we currently do the ecam.c request before the system.c one, and any change there would be a long ways off. If/when that *does* change, I think the correct fix would be to change ecam.c so its request succeeds (by changing the way it does the request, fixing kernel/resource.c, or whatever) rather than to reduce the log level and ignore the failure.
Well in my mind I didn't want just to make the error disappear... If all the resources should be reserved by the PNP driver then ideally we could take away request_resource_conflict() from pci_ecam_create(), but this would make buggy some systems with an already shipped BIOS that relied on pci_ecam_create() reservation rather than PNP reservation.
Just removing the error condition and converting dev_err() into dev_info() seems to me like accommodating already shipped BIOS images and flagging a reservation that is already done by somebody else without compromising the functionality of the PCI Root bridge driver (so far the only reason why I can see the error condition there is to catch a buggy MCFG with overlapping addresses; so if this is the case maybe we need to have a different diagnostic check to make sure that the MCFG table is alright)
BTW if you think that so far we can keep this as it is I am ok.
Many Thanks
Gab
Bjorn
diff --git a/arch/x86/pci/init.c b/arch/x86/pci/init.c index adb62aa..5a35638 100644 --- a/arch/x86/pci/init.c +++ b/arch/x86/pci/init.c @@ -7,6 +7,8 @@ in the right sequence from here. */ static __init int pci_arch_init(void) {
- struct resource *res, *conflict;
- static struct resource cfg;
#ifdef CONFIG_PCI_DIRECT int type = 0;
@@ -39,6 +41,26 @@ static __init int pci_arch_init(void)
dmi_check_skip_isa_align();
- printk("\n** res test **\n");
- res = request_mem_region(0xa0000000, 0x10000000, "ECAM PNP");
- printk("requested %pR\n", res);
- if (!res)
return 0;
- res->flags &= ~IORESOURCE_BUSY;
- cfg.start = 0xa0000000;
- cfg.end = 0xafffffff;
- cfg.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- cfg.name = "PCI ECAM";
- conflict = request_resource_conflict(&iomem_resource, &cfg);
- if (conflict)
printk("can't claim ECAM area %pR: conflict with %s %pR\n",
&cfg, conflict->name, conflict);
- printk("\n");
- return 0;
} arch_initcall(pci_arch_init);
-- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote:
Hi Bjorn
-----Original Message----- From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 21 November 2016 16:47 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
Hi Bjorn
-----Original Message----- From: Bjorn Helgaas [mailto:helgaas@kernel.org] Sent: 18 November 2016 17:54 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in
ACPI
On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 17 November 2016 18:00
+Static tables like MCFG, HPET, ECDT, etc., are *not*
mechanisms
for
+reserving address space! The static tables are for things the
OS
+needs to know early in boot, before it can parse the ACPI
namespace.
+If a new table is defined, an old OS needs to operate
correctly
even
+though it ignores the table. _CRS allows that because it is
generic
+and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that
resources
described in the MCFG table should also be declared in PNP0C02
devices
so that the PNP driver can reserve these resources.
Yes.
On the other side the PCI Root bridge driver should not reserve
such
resources.
Well if my understanding is correct I think we have a problem
here:
http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp
driver
as it will try to reserve the resources from the MCFG table...
Maybe we need to rework pci_ecam_create() ?
I think it's OK as it is.
The pnp/system.c driver does try to reserve PNP0C02 resources, and
it
marks them as "not busy". That way they appear in /proc/iomem and won't be allocated for anything else, but they can still be
requested
by drivers, e.g., pci/ecam.c, which will mark them "busy".
This is analogous to what the PCI core does in
pci_claim_resource().
This is really a function of the ACPI/PNP *core*, which should
reserve
all _CRS resources for all devices (not just PNP0C02 devices). But it's done by pnp/system.c, and only for PNP0C02, because there's a bunch of historical baggage there.
You'll also notice that in this case, things are out of order: logically the pnp/system.c reservation should happen first, but in fact the pci/ecam.c request happens *before* the pnp/system.c one. That means the pnp/system.c one might fail and complain "[mem ...] could not be reserved".
Correct me if I am wrong...
So currently we are relying on the fact that pci_ecam_create() is
called
before the pnp driver. If the pnp driver came first we would end up in pci_ecam_create()
failing
here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
I am not sure but it seems to me like a bit weak condition to rely
on...
what about removing the error condition in pci_ecam_create() and
logging
just a dev_info()?
Huh. I'm confused. I *thought* it would be safe to reverse the order, which would effectively be this:
system_pnp_probe reserve_resources_of_dev reserve_range request_mem_region([mem 0xb0000000-0xb1ffffff]) ... pci_ecam_create request_resource_conflict([mem 0xb0000000-0xb1ffffff])
but I experimented with the patch below on qemu, and it failed as you predicted:
** res test ** requested [mem 0xa0000000-0xafffffff] can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with ECAM PNP [mem 0xa0000000-0xafffffff]
I expected the request_resource_conflict() to succeed since it's completely contained in the "ECAM PNP" region. But I guess I don't understand kernel/resource.c well enough.
I think it fails because effectively the PNP driver is populating the iomem_resource resource tree and therefore pci_ecam_create() finds that it cannot add the cfg resource to the same hierarchy as it is already there...
Right. I'm just surprised because the PNP reservation is marked "not busy", and a driver (e.g., ECAM) should still be able to request the resource.
I'm not sure we need to fix anything yet, since we currently do the ecam.c request before the system.c one, and any change there would be a long ways off. If/when that *does* change, I think the correct fix would be to change ecam.c so its request succeeds (by changing the way it does the request, fixing kernel/resource.c, or whatever) rather than to reduce the log level and ignore the failure.
Well in my mind I didn't want just to make the error disappear... If all the resources should be reserved by the PNP driver then ideally we could take away request_resource_conflict() from pci_ecam_create(), but this would make buggy some systems with an already shipped BIOS that relied on pci_ecam_create() reservation rather than PNP reservation.
I don't want remove the request from ecam.c. Ideally, there should be TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or whatever it is, and a second from ecam.c. The first is the generic one saying "this region is consumed by a piece of hardware, so don't put anything else here." The second is the driver-specific one saying "PCI ECAM owns this region, nobody else can use it."
This is the same way we handle PCI BAR resources. Here are two examples from my laptop. The first (00:08.0) only has one line: it has a BAR that consumes address space, but I don't have a driver for it loaded. The second (00:16.0) does have a driver loaded, so it has a second line showing that the driver owns the space:
f124a000-f124afff : 0000:00:08.0 # from PCI core
f124d000-f124dfff : 0000:00:16.0 # from PCI core f124d000-f124dfff : mei_me # from mei_me driver
Just removing the error condition and converting dev_err() into dev_info() seems to me like accommodating already shipped BIOS images and flagging a reservation that is already done by somebody else without compromising the functionality of the PCI Root bridge driver (so far the only reason why I can see the error condition there is to catch a buggy MCFG with overlapping addresses; so if this is the case maybe we need to have a different diagnostic check to make sure that the MCFG table is alright)
Ideally I think we should end up with this:
a0000000-afffffff : pnp 00:01 a0000000-afffffff : PCI ECAM
Realistically right now we'll probably end up with only the "PCI ECAM" line in /proc/iomem and a warning from system.c about not being able to reserve the space.
If we ever change things to do the generic PNP reservation first, then we should fix things so ecam.c can claim the space without an error.
Hi Bjorn
-----Original Message----- From: Bjorn Helgaas [mailto:helgaas@kernel.org] Sent: 21 November 2016 20:10 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote:
Hi Bjorn
-----Original Message----- From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: 21 November 2016 16:47 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI in
ACPI
On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
Hi Bjorn
-----Original Message----- From: Bjorn Helgaas [mailto:helgaas@kernel.org] Sent: 18 November 2016 17:54 To: Gabriele Paoloni Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linaro-acpi@lists.linaro.org Subject: Re: [PATCH] PCI: Add information about describing PCI
in
ACPI
On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni
wrote:
> -----Original Message----- > From: linux-kernel-owner@vger.kernel.org [mailto:linux-
kernel-
> owner@vger.kernel.org] On Behalf Of Bjorn Helgaas > Sent: 17 November 2016 18:00
> +Static tables like MCFG, HPET, ECDT, etc., are *not*
mechanisms
for
> +reserving address space! The static tables are for things
the
OS
> +needs to know early in boot, before it can parse the ACPI
namespace.
> +If a new table is defined, an old OS needs to operate
correctly
even
> +though it ignores the table. _CRS allows that because it
is
generic
> +and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that
resources
described in the MCFG table should also be declared in
PNP0C02
devices
so that the PNP driver can reserve these resources.
Yes.
On the other side the PCI Root bridge driver should not
reserve
such
resources.
Well if my understanding is correct I think we have a problem
here:
http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp
driver
as it will try to reserve the resources from the MCFG
table...
Maybe we need to rework pci_ecam_create() ?
I think it's OK as it is.
The pnp/system.c driver does try to reserve PNP0C02 resources,
and
it
marks them as "not busy". That way they appear in /proc/iomem
and
won't be allocated for anything else, but they can still be
requested
by drivers, e.g., pci/ecam.c, which will mark them "busy".
This is analogous to what the PCI core does in
pci_claim_resource().
This is really a function of the ACPI/PNP *core*, which should
reserve
all _CRS resources for all devices (not just PNP0C02 devices).
But
it's done by pnp/system.c, and only for PNP0C02, because
there's a
bunch of historical baggage there.
You'll also notice that in this case, things are out of order: logically the pnp/system.c reservation should happen first, but
in
fact the pci/ecam.c request happens *before* the pnp/system.c
one.
That means the pnp/system.c one might fail and complain "[mem
...]
could not be reserved".
Correct me if I am wrong...
So currently we are relying on the fact that pci_ecam_create() is
called
before the pnp driver. If the pnp driver came first we would end up in pci_ecam_create()
failing
here: http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
I am not sure but it seems to me like a bit weak condition to
rely
on...
what about removing the error condition in pci_ecam_create() and
logging
just a dev_info()?
Huh. I'm confused. I *thought* it would be safe to reverse the order, which would effectively be this:
system_pnp_probe reserve_resources_of_dev reserve_range request_mem_region([mem 0xb0000000-0xb1ffffff]) ... pci_ecam_create request_resource_conflict([mem 0xb0000000-0xb1ffffff])
but I experimented with the patch below on qemu, and it failed as
you
predicted:
** res test ** requested [mem 0xa0000000-0xafffffff] can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with
ECAM
PNP [mem 0xa0000000-0xafffffff]
I expected the request_resource_conflict() to succeed since it's completely contained in the "ECAM PNP" region. But I guess I don't understand kernel/resource.c well enough.
I think it fails because effectively the PNP driver is populating the iomem_resource resource tree and therefore pci_ecam_create() finds
that
it cannot add the cfg resource to the same hierarchy as it is already there...
Right. I'm just surprised because the PNP reservation is marked "not busy", and a driver (e.g., ECAM) should still be able to request the resource.
Yes unfortunately pci_ecam_create() is not flexible on the conflict as pci_request_regions(): http://lxr.free-electrons.com/source/kernel/resource.c#L1155 if the conflict resource is not busy pci_request_regions() will create a child resource under the conflict sibling and mark it as busy...
or at least this is my understanding...
I'm not sure we need to fix anything yet, since we currently do the ecam.c request before the system.c one, and any change there would
be
a long ways off. If/when that *does* change, I think the correct
fix
would be to change ecam.c so its request succeeds (by changing the
way
it does the request, fixing kernel/resource.c, or whatever) rather than to reduce the log level and ignore the failure.
Well in my mind I didn't want just to make the error disappear... If all the resources should be reserved by the PNP driver then
ideally
we could take away request_resource_conflict() from
pci_ecam_create(),
but this would make buggy some systems with an already shipped BIOS that relied on pci_ecam_create() reservation rather than PNP
reservation.
I don't want remove the request from ecam.c. Ideally, there should be TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or whatever it is, and a second from ecam.c. The first is the generic one saying "this region is consumed by a piece of hardware, so don't put anything else here." The second is the driver-specific one saying "PCI ECAM owns this region, nobody else can use it."
This is the same way we handle PCI BAR resources. Here are two examples from my laptop. The first (00:08.0) only has one line: it has a BAR that consumes address space, but I don't have a driver for it loaded. The second (00:16.0) does have a driver loaded, so it has a second line showing that the driver owns the space:
f124a000-f124afff : 0000:00:08.0 # from PCI core
f124d000-f124dfff : 0000:00:16.0 # from PCI core f124d000-f124dfff : mei_me # from mei_me driver
Just removing the error condition and converting dev_err() into dev_info() seems to me like accommodating already shipped BIOS images and flagging a reservation that is already done by somebody else without compromising the functionality of the PCI Root bridge driver (so far the only reason why I can see the error condition there is to catch a buggy MCFG with overlapping addresses; so if this is the case maybe we need to have a different diagnostic check to make sure that the MCFG table is alright)
Ideally I think we should end up with this:
a0000000-afffffff : pnp 00:01 a0000000-afffffff : PCI ECAM
I think that for PCIe device drivers it works ok because it is guaranteed that their own pci_request_regions() is called always after pci_claim_resource() of the bridge that is on top of them... I.e. pci_claim_resource() reserves the resources as not busy and pci_request_regions() will create a child busy resource
Realistically right now we'll probably end up with only the "PCI ECAM" line in /proc/iomem and a warning from system.c about not being able to reserve the space.
If we ever change things to do the generic PNP reservation first, then we should fix things so ecam.c can claim the space without an error.
Maybe the patch below could be a sort of solution...effectively pci_ecam should succeed in reserving a busy resource under the conflict resource in case of PNP driver allocating a non BUSY resource first...
--- drivers/pci/ecam.c | 16 +++++----------- drivers/pci/host/pci-thunder-ecam.c | 2 +- include/linux/pci-ecam.h | 2 +- 3 files changed, 7 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c index 43ed08d..999b6ef 100644 --- a/drivers/pci/ecam.c +++ b/drivers/pci/ecam.c @@ -66,16 +66,10 @@ struct pci_config_window *pci_ecam_create(struct device *dev, } bsz = 1 << ops->bus_shift;
- cfg->res.start = cfgres->start; - cfg->res.end = cfgres->end; - cfg->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; - cfg->res.name = "PCI ECAM"; - - conflict = request_resource_conflict(&iomem_resource, &cfg->res); - if (conflict) { + cfg->res = request_mem_region(cfgres->start, resource_size(cfgres), "PCI ECAM"); + if (!cfg->res) { err = -EBUSY; - dev_err(dev, "can't claim ECAM area %pR: address conflict with %s %pR\n", - &cfg->res, conflict->name, conflict); + dev_err(dev, "can't claim ECAM area %pR\n", &cfg->res); goto err_exit; }
@@ -126,8 +120,8 @@ void pci_ecam_free(struct pci_config_window *cfg) if (cfg->win) iounmap(cfg->win); } - if (cfg->res.parent) - release_resource(&cfg->res); + if (cfg->res->parent) + release_region(cfg->res->start, resource_size(cfg->res)); kfree(cfg); }
diff --git a/drivers/pci/host/pci-thunder-ecam.c b/drivers/pci/host/pci-thunder-ecam.c index d50a3dc..2e48d9d 100644 --- a/drivers/pci/host/pci-thunder-ecam.c +++ b/drivers/pci/host/pci-thunder-ecam.c @@ -117,7 +117,7 @@ static int thunder_ecam_p2_config_read(struct pci_bus *bus, unsigned int devfn, * the config space access window. Since we are working with * the high-order 32 bits, shift everything down by 32 bits. */ - node_bits = (cfg->res.start >> 32) & (1 << 12); + node_bits = (cfg->res->start >> 32) & (1 << 12);
v |= node_bits; set_val(v, where, size, val); diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h index 7adad20..f30a4ea 100644 --- a/include/linux/pci-ecam.h +++ b/include/linux/pci-ecam.h @@ -36,7 +36,7 @@ struct pci_ecam_ops { * use ECAM. */ struct pci_config_window { - struct resource res; + struct resource *res; struct resource busr; void *priv; struct pci_ecam_ops *ops;
On Thu, Nov 17, 2016 at 6:59 PM, Bjorn Helgaas bhelgaas@google.com wrote:
Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com
Looks great overall, but I have a few comments (below).
Documentation/PCI/00-INDEX | 2 + Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 147231f..0780280 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX - this file +acpi-info.txt
- info on how PCI host bridges are represented in ACPI
MSI-HOWTO.txt - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. PCIEBUS-HOWTO.txt diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt new file mode 100644 index 0000000..ccbcfda --- /dev/null +++ b/Documentation/PCI/acpi-info.txt @@ -0,0 +1,136 @@
ACPI considerations for PCI host bridges
+The basic requirement is that the ACPI namespace should describe +*everything* that consumes address space unless there's another +standard way for the OS to find it [1, 2]. For example, windows that +are forwarded to PCI by a PCI host bridge should be described via ACPI +devices, since the OS can't locate the host bridge by itself. PCI +devices *below* the host bridge do not need to be described via ACPI, +because the resources they consume are inside the host bridge windows, +and the OS can discover them via the standard PCI enumeration +mechanism (using config accesses to read and size the BARs).
+This ACPI resource description is done via _CRS methods of devices in
To be painfully precise, those need not be methods. They may be static objects too.
+the ACPI namespace [2]. _CRS methods are like generalized PCI BARs: +the OS can read _CRS and figure out what resource is being consumed +even if it doesn't have a driver for the device [3]. That's important +because it means an old OS can work correctly even on a system with +new devices unknown to the OS. The new devices won't do anything, but +the OS can at least make sure no resources conflict with them.
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI namespace. +If a new table is defined, an old OS needs to operate correctly even +though it ignores the table. _CRS allows that because it is generic +and understood by the old OS; a static table does not.
+If the OS is expected to manage an ACPI device, that device will have
I'm not very keen on using the term "ACPI device" in documentation as it is not particularly well defined. As a rule, I prefer to talk about "non-discoverable devices described via ACPI" or similar.
Accordingly, I'd change the above line to something like "If the OS is expected to manage a non-discoverable device described via ACPI, that device will have".
+a specific _HID/_CID that tells the OS what driver to bind to it, and +the _CRS tells the OS and the driver where the device's registers are.
+PNP0C02 "motherboard" devices are basically a catch-all. There's no +programming model for them other than "don't use these resources for +anything else." So any address space that is (1) not claimed by some +other ACPI device and (2) should not be assigned by the OS to
Here I'd say "any address space that is (1) not claimed by _CRS under any other device object in the ACPI namespace and (2) ...".
+something else, should be claimed by a PNP0C02 _CRS method.
Thanks, Rafael
On Sat, Nov 19, 2016 at 12:02:24AM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 17, 2016 at 6:59 PM, Bjorn Helgaas bhelgaas@google.com wrote:
Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com
Looks great overall, but I have a few comments (below).
Thanks a lot for taking a look, Rafael! I applied all your suggestions.
Bjorn
On 17 November 2016 at 17:59, Bjorn Helgaas bhelgaas@google.com wrote:
Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com
Documentation/PCI/00-INDEX | 2 + Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 147231f..0780280 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX - this file +acpi-info.txt
- info on how PCI host bridges are represented in ACPI
MSI-HOWTO.txt - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. PCIEBUS-HOWTO.txt diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt new file mode 100644 index 0000000..ccbcfda --- /dev/null +++ b/Documentation/PCI/acpi-info.txt @@ -0,0 +1,136 @@
ACPI considerations for PCI host bridges
+The basic requirement is that the ACPI namespace should describe +*everything* that consumes address space unless there's another +standard way for the OS to find it [1, 2]. For example, windows that +are forwarded to PCI by a PCI host bridge should be described via ACPI +devices, since the OS can't locate the host bridge by itself. PCI +devices *below* the host bridge do not need to be described via ACPI, +because the resources they consume are inside the host bridge windows, +and the OS can discover them via the standard PCI enumeration +mechanism (using config accesses to read and size the BARs).
+This ACPI resource description is done via _CRS methods of devices in +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs: +the OS can read _CRS and figure out what resource is being consumed +even if it doesn't have a driver for the device [3]. That's important +because it means an old OS can work correctly even on a system with +new devices unknown to the OS. The new devices won't do anything, but +the OS can at least make sure no resources conflict with them.
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI namespace. +If a new table is defined, an old OS needs to operate correctly even +though it ignores the table. _CRS allows that because it is generic +and understood by the old OS; a static table does not.
+If the OS is expected to manage an ACPI device, that device will have +a specific _HID/_CID that tells the OS what driver to bind to it, and +the _CRS tells the OS and the driver where the device's registers are.
+PNP0C02 "motherboard" devices are basically a catch-all. There's no +programming model for them other than "don't use these resources for +anything else." So any address space that is (1) not claimed by some +other ACPI device and (2) should not be assigned by the OS to +something else, should be claimed by a PNP0C02 _CRS method.
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
Is that universally true? Or is it still possible to do the right thing here on new ACPI architectures such as arm64?
+The workaround is to describe the bridge registers (including ECAM +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM, +the bridge register space is device-specific anyway, so the generic +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For +ECAM, pci_root.c learns about the space from either MCFG or the _CBA +method.
+Note that the PCIe spec actually does require ECAM unless there's a +standard firmware interface for config access, e.g., the ia64 SAL +interface [7]. One reason is that we want a generic host bridge +driver (pci_root.c), and a generic driver requires a generic way to +access config space.
+[1] ACPI 6.0, sec 6.1:
- For any device that is on a non-enumerable type of bus (for
- example, an ISA bus), OSPM enumerates the devices' identifier(s)
- and the ACPI system firmware must supply an _HID object ... for
- each device to enable OSPM to do that.
+[2] ACPI 6.0, sec 3.7:
- The OS enumerates motherboard devices simply by reading through
- the ACPI Namespace looking for devices with hardware IDs.
- Each device enumerated by ACPI includes ACPI-defined objects in
- the ACPI Namespace that report the hardware resources the device
- could occupy [_PRS], an object that reports the resources that are
- currently used by the device [_CRS], and objects for configuring
- those resources [_SRS]. The information is used by the Plug and
- Play OS (OSPM) to configure the devices.
+[3] ACPI 6.0, sec 6.2:
- OSPM uses device configuration objects to configure hardware
- resources for devices enumerated via ACPI. Device configuration
- objects provide information about current and possible resource
- requirements, the relationship between shared resources, and
- methods for configuring hardware resources.
- When OSPM enumerates a device, it calls _PRS to determine the
- resource requirements of the device. It may also call _CRS to
- find the current resource settings for the device. Using this
- information, the Plug and Play system determines what resources
- the device should consume and sets those resources by calling the
- device’s _SRS control method.
- In ACPI, devices can consume resources (for example, legacy
- keyboards), provide resources (for example, a proprietary PCI
- bridge), or do both. Unless otherwise specified, resources for a
- device are assumed to be taken from the nearest matching resource
- above the device in the device hierarchy.
+[4] ACPI 6.0, sec 6.4.3.5.4:
- Extended Address Space Descriptor
- General Flags: Bit [0] Consumer/Producer:
1–This device consumes this resource
0–This device produces and consumes this resource
+[5] ACPI 6.0, sec 19.6.43:
- ResourceUsage specifies whether the Memory range is consumed by
- this device (ResourceConsumer) or passed on to child devices
- (ResourceProducer). If nothing is specified, then
- ResourceConsumer is assumed.
+[6] PCI Firmware 3.0, sec 4.1.2:
- If the operating system does not natively comprehend reserving the
- MMCFG region, the MMCFG region must be reserved by firmware. The
- address range reported in the MCFG table or by _CBA method (see
- Section 4.1.3) must be reserved by declaring a motherboard
- resource. For most systems, the motherboard resource would appear
- at the root of the ACPI namespace (under _SB) in a node with a
- _HID of EISAID (PNP0C02), and the resources in this case should
- not be claimed in the root PCI bus’s _CRS. The resources can
- optionally be returned in Int15 E820 or EFIGetMemoryMap as
- reserved memory but must always be reported through ACPI as a
- motherboard resource.
+[7] PCI Express 3.0, sec 7.2.2:
- For systems that are PC-compatible, or that do not implement a
- processor-architecture-specific firmware interface standard that
- allows access to the Configuration Space, the ECAM is required as
- defined in this section.
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
On 17 November 2016 at 17:59, Bjorn Helgaas bhelgaas@google.com wrote:
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
Is that universally true? Or is it still possible to do the right thing here on new ACPI architectures such as arm64?
That's a very good question. I had thought that the ACPI spec had given up on Consumer/Producer completely, but I was wrong. In the 6.0 spec, the Consumer/Producer bit is still documented in the Extended Address Space Descriptor (sec 6.4.3.5.4). It is documented as "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
Linux looks at the producer_consumer bit in acpi_decode_space(), which I think is used for all these descriptors (QWord, DWord, Word, and Extended). This doesn't quite follow the spec -- we probably should ignore it except for Extended. In any event, acpi_decode_space() sets IORESOURCE_WINDOW for Producer descriptors, but we don't test IORESOURCE_WINDOW in the PCI host bridge code.
x86 and ia64 supply their own pci_acpi_root_prepare_resources() functions that call acpi_pci_probe_root_resources(), which parses _CRS and looks at producer_consumer. Then they do a little arch-specific stuff on the result.
On arm64 we use acpi_pci_probe_root_resources() directly, with no arch-specific stuff.
On all three arches, we ignore the Consumer/Producer bit, so all the resources are treated as Producers, e.g., as bridge windows.
I think we *could* implement an arm64 version of pci_acpi_root_prepare_resources() that would pay attention to the Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec compliant, we would have to use Extended descriptors for all bridge windows, even if they would fit in a DWord or QWord.
Should we do that? I dunno. I'd like to hear your opinion(s).
It *would* be nice to have bridge registers in the bridge _CRS. That would eliminate the need for looking up the HISI0081/PNP0C02 devices to find the bridge registers. Avoiding that lookup is only a temporary advantage -- the next round of bridges are supposed to fully implement ECAM, and then we won't need to know where the registers are.
Apart from the lookup, there's still some advantage in describing the registers in the PNP0A03 device instead of an unrelated PNP0C02 device, because it makes /proc/iomem more accurate and potentially makes host bridge hotplug cleaner. We would have to enhance the host bridge driver to do the reservations currently done by pnp/system.c.
There's some value in doing it the same way as on x86, even though that way is somewhat broken.
Whatever we decide, I think it's very important to get it figured out ASAP because it affects the ECAM quirks that we're trying to merge in v4.10.
+The workaround is to describe the bridge registers (including ECAM +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM, +the bridge register space is device-specific anyway, so the generic +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For +ECAM, pci_root.c learns about the space from either MCFG or the _CBA +method.
+Note that the PCIe spec actually does require ECAM unless there's a +standard firmware interface for config access, e.g., the ia64 SAL +interface [7]. One reason is that we want a generic host bridge +driver (pci_root.c), and a generic driver requires a generic way to +access config space.
+[1] ACPI 6.0, sec 6.1:
- For any device that is on a non-enumerable type of bus (for
- example, an ISA bus), OSPM enumerates the devices' identifier(s)
- and the ACPI system firmware must supply an _HID object ... for
- each device to enable OSPM to do that.
+[2] ACPI 6.0, sec 3.7:
- The OS enumerates motherboard devices simply by reading through
- the ACPI Namespace looking for devices with hardware IDs.
- Each device enumerated by ACPI includes ACPI-defined objects in
- the ACPI Namespace that report the hardware resources the device
- could occupy [_PRS], an object that reports the resources that are
- currently used by the device [_CRS], and objects for configuring
- those resources [_SRS]. The information is used by the Plug and
- Play OS (OSPM) to configure the devices.
+[3] ACPI 6.0, sec 6.2:
- OSPM uses device configuration objects to configure hardware
- resources for devices enumerated via ACPI. Device configuration
- objects provide information about current and possible resource
- requirements, the relationship between shared resources, and
- methods for configuring hardware resources.
- When OSPM enumerates a device, it calls _PRS to determine the
- resource requirements of the device. It may also call _CRS to
- find the current resource settings for the device. Using this
- information, the Plug and Play system determines what resources
- the device should consume and sets those resources by calling the
- device’s _SRS control method.
- In ACPI, devices can consume resources (for example, legacy
- keyboards), provide resources (for example, a proprietary PCI
- bridge), or do both. Unless otherwise specified, resources for a
- device are assumed to be taken from the nearest matching resource
- above the device in the device hierarchy.
+[4] ACPI 6.0, sec 6.4.3.5.4:
- Extended Address Space Descriptor
- General Flags: Bit [0] Consumer/Producer:
1–This device consumes this resource
0–This device produces and consumes this resource
+[5] ACPI 6.0, sec 19.6.43:
- ResourceUsage specifies whether the Memory range is consumed by
- this device (ResourceConsumer) or passed on to child devices
- (ResourceProducer). If nothing is specified, then
- ResourceConsumer is assumed.
+[6] PCI Firmware 3.0, sec 4.1.2:
- If the operating system does not natively comprehend reserving the
- MMCFG region, the MMCFG region must be reserved by firmware. The
- address range reported in the MCFG table or by _CBA method (see
- Section 4.1.3) must be reserved by declaring a motherboard
- resource. For most systems, the motherboard resource would appear
- at the root of the ACPI namespace (under _SB) in a node with a
- _HID of EISAID (PNP0C02), and the resources in this case should
- not be claimed in the root PCI bus’s _CRS. The resources can
- optionally be returned in Int15 E820 or EFIGetMemoryMap as
- reserved memory but must always be reported through ACPI as a
- motherboard resource.
+[7] PCI Express 3.0, sec 7.2.2:
- For systems that are PC-compatible, or that do not implement a
- processor-architecture-specific firmware interface standard that
- allows access to the Configuration Space, the ECAM is required as
- defined in this section.
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
-- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23 November 2016 at 01:06, Bjorn Helgaas helgaas@kernel.org wrote:
On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
On 17 November 2016 at 17:59, Bjorn Helgaas bhelgaas@google.com wrote:
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
Is that universally true? Or is it still possible to do the right thing here on new ACPI architectures such as arm64?
That's a very good question. I had thought that the ACPI spec had given up on Consumer/Producer completely, but I was wrong. In the 6.0 spec, the Consumer/Producer bit is still documented in the Extended Address Space Descriptor (sec 6.4.3.5.4). It is documented as "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
Linux looks at the producer_consumer bit in acpi_decode_space(), which I think is used for all these descriptors (QWord, DWord, Word, and Extended). This doesn't quite follow the spec -- we probably should ignore it except for Extended. In any event, acpi_decode_space() sets IORESOURCE_WINDOW for Producer descriptors, but we don't test IORESOURCE_WINDOW in the PCI host bridge code.
x86 and ia64 supply their own pci_acpi_root_prepare_resources() functions that call acpi_pci_probe_root_resources(), which parses _CRS and looks at producer_consumer. Then they do a little arch-specific stuff on the result.
On arm64 we use acpi_pci_probe_root_resources() directly, with no arch-specific stuff.
On all three arches, we ignore the Consumer/Producer bit, so all the resources are treated as Producers, e.g., as bridge windows.
I think we *could* implement an arm64 version of pci_acpi_root_prepare_resources() that would pay attention to the Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec compliant, we would have to use Extended descriptors for all bridge windows, even if they would fit in a DWord or QWord.
Should we do that? I dunno. I'd like to hear your opinion(s).
Yes, I think we should. If the spec allows for a way for a PNP0A03 device to describe all of its resources unambiguously, we should not be relying on workarounds that were designed for another architecture in another decade (for, presumably, another OS)
Just for my understanding, we will need to use extended descriptors for all consumed *and* produced regions, even though dword/qword are implicitly produced-only, due to the fact that the bit is ignored?
It *would* be nice to have bridge registers in the bridge _CRS. That would eliminate the need for looking up the HISI0081/PNP0C02 devices to find the bridge registers. Avoiding that lookup is only a temporary advantage -- the next round of bridges are supposed to fully implement ECAM, and then we won't need to know where the registers are.
Apart from the lookup, there's still some advantage in describing the registers in the PNP0A03 device instead of an unrelated PNP0C02 device, because it makes /proc/iomem more accurate and potentially makes host bridge hotplug cleaner. We would have to enhance the host bridge driver to do the reservations currently done by pnp/system.c.
There's some value in doing it the same way as on x86, even though that way is somewhat broken.
Whatever we decide, I think it's very important to get it figured out ASAP because it affects the ECAM quirks that we're trying to merge in v4.10.
I agree. What exactly is the impact for the quirks mechanism as proposed?
On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
On 23 November 2016 at 01:06, Bjorn Helgaas helgaas@kernel.org wrote:
On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
On 17 November 2016 at 17:59, Bjorn Helgaas bhelgaas@google.com wrote:
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
Is that universally true? Or is it still possible to do the right thing here on new ACPI architectures such as arm64?
That's a very good question. I had thought that the ACPI spec had given up on Consumer/Producer completely, but I was wrong. In the 6.0 spec, the Consumer/Producer bit is still documented in the Extended Address Space Descriptor (sec 6.4.3.5.4). It is documented as "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
Linux looks at the producer_consumer bit in acpi_decode_space(), which I think is used for all these descriptors (QWord, DWord, Word, and Extended). This doesn't quite follow the spec -- we probably should ignore it except for Extended. In any event, acpi_decode_space() sets IORESOURCE_WINDOW for Producer descriptors, but we don't test IORESOURCE_WINDOW in the PCI host bridge code.
x86 and ia64 supply their own pci_acpi_root_prepare_resources() functions that call acpi_pci_probe_root_resources(), which parses _CRS and looks at producer_consumer. Then they do a little arch-specific stuff on the result.
On arm64 we use acpi_pci_probe_root_resources() directly, with no arch-specific stuff.
On all three arches, we ignore the Consumer/Producer bit, so all the resources are treated as Producers, e.g., as bridge windows.
I think we *could* implement an arm64 version of pci_acpi_root_prepare_resources() that would pay attention to the Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec compliant, we would have to use Extended descriptors for all bridge windows, even if they would fit in a DWord or QWord.
Should we do that? I dunno. I'd like to hear your opinion(s).
Yes, I think we should. If the spec allows for a way for a PNP0A03 device to describe all of its resources unambiguously, we should not be relying on workarounds that were designed for another architecture in another decade (for, presumably, another OS)
That was the idea I floated at LPC16. We can override the acpi_pci_root_ops prepare_resources() function pointer with a function that checks IORESOURCE_WINDOW and filters resources accordingly (and specific quirk "drivers" may know how to intepret resources that aren't IORESOURCE_WINDOW - ie they can use it to describe the PCI ECAM config space quirk region in their _CRS).
In a way that's something that makes sense anyway because given that we are starting from a clean slate on ARM64 considering resources that are not IORESOURCE_WINDOW as host bridge windows is just something we are inheriting from x86, it is not really ACPI specs compliant (is it ?).
Just for my understanding, we will need to use extended descriptors for all consumed *and* produced regions, even though dword/qword are implicitly produced-only, due to the fact that the bit is ignored?
That's something that has to be clarified within the ASWG ie why the consumer bit is ignored for *some* descriptors and not for others.
As things stand unfortunately the answer seems yes (I do not know why).
It *would* be nice to have bridge registers in the bridge _CRS. That would eliminate the need for looking up the HISI0081/PNP0C02 devices to find the bridge registers. Avoiding that lookup is only a temporary advantage -- the next round of bridges are supposed to fully implement ECAM, and then we won't need to know where the registers are.
Apart from the lookup, there's still some advantage in describing the registers in the PNP0A03 device instead of an unrelated PNP0C02 device, because it makes /proc/iomem more accurate and potentially makes host bridge hotplug cleaner. We would have to enhance the host bridge driver to do the reservations currently done by pnp/system.c.
There's some value in doing it the same way as on x86, even though that way is somewhat broken.
Whatever we decide, I think it's very important to get it figured out ASAP because it affects the ECAM quirks that we're trying to merge in v4.10.
I agree. What exactly is the impact for the quirks mechanism as proposed?
The impact is that we could just use the PNP0A03 _CRS to report the PCI ECAM config space quirk region through a consumer resource keeping in mind what I say above (actually I think that's what was done on APM firmware initially, for the records).
Lorenzo
On Wed, Nov 23, 2016 at 4:30 AM, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
On 23 November 2016 at 01:06, Bjorn Helgaas helgaas@kernel.org wrote:
On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
On 17 November 2016 at 17:59, Bjorn Helgaas bhelgaas@google.com wrote:
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
Is that universally true? Or is it still possible to do the right thing here on new ACPI architectures such as arm64?
That's a very good question. I had thought that the ACPI spec had given up on Consumer/Producer completely, but I was wrong. In the 6.0 spec, the Consumer/Producer bit is still documented in the Extended Address Space Descriptor (sec 6.4.3.5.4). It is documented as "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
Linux looks at the producer_consumer bit in acpi_decode_space(), which I think is used for all these descriptors (QWord, DWord, Word, and Extended). This doesn't quite follow the spec -- we probably should ignore it except for Extended. In any event, acpi_decode_space() sets IORESOURCE_WINDOW for Producer descriptors, but we don't test IORESOURCE_WINDOW in the PCI host bridge code.
x86 and ia64 supply their own pci_acpi_root_prepare_resources() functions that call acpi_pci_probe_root_resources(), which parses _CRS and looks at producer_consumer. Then they do a little arch-specific stuff on the result.
On arm64 we use acpi_pci_probe_root_resources() directly, with no arch-specific stuff.
On all three arches, we ignore the Consumer/Producer bit, so all the resources are treated as Producers, e.g., as bridge windows.
I think we *could* implement an arm64 version of pci_acpi_root_prepare_resources() that would pay attention to the Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec compliant, we would have to use Extended descriptors for all bridge windows, even if they would fit in a DWord or QWord.
Should we do that? I dunno. I'd like to hear your opinion(s).
Yes, I think we should. If the spec allows for a way for a PNP0A03 device to describe all of its resources unambiguously, we should not be relying on workarounds that were designed for another architecture in another decade (for, presumably, another OS)
That was the idea I floated at LPC16. We can override the acpi_pci_root_ops prepare_resources() function pointer with a function that checks IORESOURCE_WINDOW and filters resources accordingly (and specific quirk "drivers" may know how to intepret resources that aren't IORESOURCE_WINDOW - ie they can use it to describe the PCI ECAM config space quirk region in their _CRS).
In a way that's something that makes sense anyway because given that we are starting from a clean slate on ARM64 considering resources that are not IORESOURCE_WINDOW as host bridge windows is just something we are inheriting from x86, it is not really ACPI specs compliant (is it ?).
Just for my understanding, we will need to use extended descriptors for all consumed *and* produced regions, even though dword/qword are implicitly produced-only, due to the fact that the bit is ignored?
That's something that has to be clarified within the ASWG ie why the consumer bit is ignored for *some* descriptors and not for others.
As things stand unfortunately the answer seems yes (I do not know why).
It *would* be nice to have bridge registers in the bridge _CRS. That would eliminate the need for looking up the HISI0081/PNP0C02 devices to find the bridge registers. Avoiding that lookup is only a temporary advantage -- the next round of bridges are supposed to fully implement ECAM, and then we won't need to know where the registers are.
Apart from the lookup, there's still some advantage in describing the registers in the PNP0A03 device instead of an unrelated PNP0C02 device, because it makes /proc/iomem more accurate and potentially makes host bridge hotplug cleaner. We would have to enhance the host bridge driver to do the reservations currently done by pnp/system.c.
There's some value in doing it the same way as on x86, even though that way is somewhat broken.
Whatever we decide, I think it's very important to get it figured out ASAP because it affects the ECAM quirks that we're trying to merge in v4.10.
I agree. What exactly is the impact for the quirks mechanism as proposed?
The impact is that we could just use the PNP0A03 _CRS to report the PCI ECAM config space quirk region through a consumer resource keeping in mind what I say above (actually I think that's what was done on APM firmware initially, for the records).
Just to clarify: APM firmware initially has a _CSR region to declare the controller register region. We don't know that we need to declare the reserved space for ECAM until Bjorn pointed out recently (with the usage of PNP0C02).
I really like this idea about declaring ECAM space and any additional spaces required for ECAM quirk inside PNP0A03 _CRS. For the firmware that already shipped, the quirk will need to add additional resources (for ECAM and other needed regions) into the root-bus. If we decided to go with this, do we still have time to make additional adjustment for the current ECAM quirk and the foundation patches before v4.10-rc1?
Lorenzo _______________________________________________ Linaro-acpi mailing list Linaro-acpi@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-acpi
Regards, Duc Dang.
On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
On 23 November 2016 at 01:06, Bjorn Helgaas helgaas@kernel.org wrote:
On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
On 17 November 2016 at 17:59, Bjorn Helgaas bhelgaas@google.com wrote:
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
Is that universally true? Or is it still possible to do the right thing here on new ACPI architectures such as arm64?
That's a very good question. I had thought that the ACPI spec had given up on Consumer/Producer completely, but I was wrong. In the 6.0 spec, the Consumer/Producer bit is still documented in the Extended Address Space Descriptor (sec 6.4.3.5.4). It is documented as "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
Linux looks at the producer_consumer bit in acpi_decode_space(), which I think is used for all these descriptors (QWord, DWord, Word, and Extended). This doesn't quite follow the spec -- we probably should ignore it except for Extended. In any event, acpi_decode_space() sets IORESOURCE_WINDOW for Producer descriptors, but we don't test IORESOURCE_WINDOW in the PCI host bridge code.
x86 and ia64 supply their own pci_acpi_root_prepare_resources() functions that call acpi_pci_probe_root_resources(), which parses _CRS and looks at producer_consumer. Then they do a little arch-specific stuff on the result.
On arm64 we use acpi_pci_probe_root_resources() directly, with no arch-specific stuff.
On all three arches, we ignore the Consumer/Producer bit, so all the resources are treated as Producers, e.g., as bridge windows.
I think we *could* implement an arm64 version of pci_acpi_root_prepare_resources() that would pay attention to the Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec compliant, we would have to use Extended descriptors for all bridge windows, even if they would fit in a DWord or QWord.
Should we do that? I dunno. I'd like to hear your opinion(s).
Yes, I think we should. If the spec allows for a way for a PNP0A03 device to describe all of its resources unambiguously, we should not be relying on workarounds that were designed for another architecture in another decade (for, presumably, another OS)
Just for my understanding, we will need to use extended descriptors for all consumed *and* produced regions, even though dword/qword are implicitly produced-only, due to the fact that the bit is ignored?
From an ACPI spec point of view, I would say QWord/DWord/Word
descriptors are implicitly *consumer*-only because ResourceConsumer is the default and they don't have a bit to indicate otherwise.
The current code assumes all PNP0A03 resources are producers. If we implement an arm64 pci_acpi_root_prepare_resources() that pays attention to the Consumer/Producer bit, we would have to:
- Reserve all producer regions in the iomem/ioport trees. This is already done via pci_acpi_root_add_resources(), but we might need a new check to handle consumers differently.
- Reserve all consumer regions. This corresponds to what pnp/system.c does for PNP0C02 devices. This is similar to the producer regions, but I think the consumer ones should be marked IORESOURCE_BUSY.
- Use every producer (IORESOURCE_WINDOW) as a host bridge window.
I think it's a bug that acpi_decode_space() looks at producer_consumer for QWord/DWord/Word descriptors, but I think QWord/DWord/Word descriptors for consumed regions should be safe, as long as they don't set the Consumer/Producer bit.
On Wed, Nov 23, 2016 at 09:06:33AM -0600, Bjorn Helgaas wrote:
On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
On 23 November 2016 at 01:06, Bjorn Helgaas helgaas@kernel.org wrote:
On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
On 17 November 2016 at 17:59, Bjorn Helgaas bhelgaas@google.com wrote:
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
Is that universally true? Or is it still possible to do the right thing here on new ACPI architectures such as arm64?
That's a very good question. I had thought that the ACPI spec had given up on Consumer/Producer completely, but I was wrong. In the 6.0 spec, the Consumer/Producer bit is still documented in the Extended Address Space Descriptor (sec 6.4.3.5.4). It is documented as "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
Linux looks at the producer_consumer bit in acpi_decode_space(), which I think is used for all these descriptors (QWord, DWord, Word, and Extended). This doesn't quite follow the spec -- we probably should ignore it except for Extended. In any event, acpi_decode_space() sets IORESOURCE_WINDOW for Producer descriptors, but we don't test IORESOURCE_WINDOW in the PCI host bridge code.
x86 and ia64 supply their own pci_acpi_root_prepare_resources() functions that call acpi_pci_probe_root_resources(), which parses _CRS and looks at producer_consumer. Then they do a little arch-specific stuff on the result.
On arm64 we use acpi_pci_probe_root_resources() directly, with no arch-specific stuff.
On all three arches, we ignore the Consumer/Producer bit, so all the resources are treated as Producers, e.g., as bridge windows.
I think we *could* implement an arm64 version of pci_acpi_root_prepare_resources() that would pay attention to the Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec compliant, we would have to use Extended descriptors for all bridge windows, even if they would fit in a DWord or QWord.
Should we do that? I dunno. I'd like to hear your opinion(s).
Yes, I think we should. If the spec allows for a way for a PNP0A03 device to describe all of its resources unambiguously, we should not be relying on workarounds that were designed for another architecture in another decade (for, presumably, another OS)
Just for my understanding, we will need to use extended descriptors for all consumed *and* produced regions, even though dword/qword are implicitly produced-only, due to the fact that the bit is ignored?
From an ACPI spec point of view, I would say QWord/DWord/Word descriptors are implicitly *consumer*-only because ResourceConsumer is the default and they don't have a bit to indicate otherwise.
The current code assumes all PNP0A03 resources are producers. If we implement an arm64 pci_acpi_root_prepare_resources() that pays attention to the Consumer/Producer bit, we would have to:
Reserve all producer regions in the iomem/ioport trees. This is already done via pci_acpi_root_add_resources(), but we might need a new check to handle consumers differently.
Reserve all consumer regions. This corresponds to what pnp/system.c does for PNP0C02 devices. This is similar to the producer regions, but I think the consumer ones should be marked IORESOURCE_BUSY.
Use every producer (IORESOURCE_WINDOW) as a host bridge window.
I think it's a bug that acpi_decode_space() looks at producer_consumer for QWord/DWord/Word descriptors, but I think QWord/DWord/Word descriptors for consumed regions should be safe, as long as they don't set the Consumer/Producer bit.
I'm going to post a couple very lightly-tested patches that should make us ignore the Consumer/Producer bit for QWord/DWord/Word. I'd appreciate any discussion about whether that's the right approach.
Bjorn
Hi, Bjorn
Thanks for the documentation. It really helps!
However I have a question below.
From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Subject: [PATCH] PCI: Add information about describing PCI in ACPI
Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com
Documentation/PCI/00-INDEX | 2 + Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 147231f..0780280 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX
- this file
+acpi-info.txt
- info on how PCI host bridges are represented in ACPI
MSI-HOWTO.txt
- the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
PCIEBUS-HOWTO.txt diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt new file mode 100644 index 0000000..ccbcfda --- /dev/null +++ b/Documentation/PCI/acpi-info.txt @@ -0,0 +1,136 @@
ACPI considerations for PCI host bridges
+The basic requirement is that the ACPI namespace should describe +*everything* that consumes address space unless there's another +standard way for the OS to find it [1, 2]. For example, windows that +are forwarded to PCI by a PCI host bridge should be described via ACPI +devices, since the OS can't locate the host bridge by itself. PCI +devices *below* the host bridge do not need to be described via ACPI, +because the resources they consume are inside the host bridge windows, +and the OS can discover them via the standard PCI enumeration +mechanism (using config accesses to read and size the BARs).
+This ACPI resource description is done via _CRS methods of devices in +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs: +the OS can read _CRS and figure out what resource is being consumed +even if it doesn't have a driver for the device [3]. That's important +because it means an old OS can work correctly even on a system with +new devices unknown to the OS. The new devices won't do anything, but +the OS can at least make sure no resources conflict with them.
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI namespace. +If a new table is defined, an old OS needs to operate correctly even +though it ignores the table. _CRS allows that because it is generic +and understood by the old OS; a static table does not.
The entire document doesn't talk about the details of _CBA. There is only one line below mentioned _CBA as an example.
+If the OS is expected to manage an ACPI device, that device will have +a specific _HID/_CID that tells the OS what driver to bind to it, and +the _CRS tells the OS and the driver where the device's registers are.
+PNP0C02 "motherboard" devices are basically a catch-all. There's no +programming model for them other than "don't use these resources for +anything else." So any address space that is (1) not claimed by some +other ACPI device and (2) should not be assigned by the OS to +something else, should be claimed by a PNP0C02 _CRS method.
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. In principle, this would +be all the windows they forward down to the PCI bus, as well as the +bridge registers themselves. The bridge registers include things like +secondary/subordinate bus registers that determine the bus range below +the bridge, window registers that describe the apertures, etc. These +are all device-specific, non-architected things, so the only way a +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which +contain the device-specific details. These bridge registers also +include ECAM space, since it is consumed by the bridge.
+ACPI defined a Producer/Consumer bit that was intended to distinguish +the bridge apertures from the bridge registers [4, 5]. However, +BIOSes didn't use that bit correctly, and the result is that OSes have +to assume that everything in a PCI host bridge _CRS is a window. That +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08 +device itself.
+The workaround is to describe the bridge registers (including ECAM +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM, +the bridge register space is device-specific anyway, so the generic +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For +ECAM, pci_root.c learns about the space from either MCFG or the _CBA +method.
Should the relationship of MCFG and _CBA be covered in this document?
Thanks and best regards Lv
+Note that the PCIe spec actually does require ECAM unless there's a +standard firmware interface for config access, e.g., the ia64 SAL +interface [7]. One reason is that we want a generic host bridge +driver (pci_root.c), and a generic driver requires a generic way to +access config space.
+[1] ACPI 6.0, sec 6.1:
- For any device that is on a non-enumerable type of bus (for
- example, an ISA bus), OSPM enumerates the devices' identifier(s)
- and the ACPI system firmware must supply an _HID object ... for
- each device to enable OSPM to do that.
+[2] ACPI 6.0, sec 3.7:
- The OS enumerates motherboard devices simply by reading through
- the ACPI Namespace looking for devices with hardware IDs.
- Each device enumerated by ACPI includes ACPI-defined objects in
- the ACPI Namespace that report the hardware resources the device
- could occupy [_PRS], an object that reports the resources that are
- currently used by the device [_CRS], and objects for configuring
- those resources [_SRS]. The information is used by the Plug and
- Play OS (OSPM) to configure the devices.
+[3] ACPI 6.0, sec 6.2:
- OSPM uses device configuration objects to configure hardware
- resources for devices enumerated via ACPI. Device configuration
- objects provide information about current and possible resource
- requirements, the relationship between shared resources, and
- methods for configuring hardware resources.
- When OSPM enumerates a device, it calls _PRS to determine the
- resource requirements of the device. It may also call _CRS to
- find the current resource settings for the device. Using this
- information, the Plug and Play system determines what resources
- the device should consume and sets those resources by calling the
- device’s _SRS control method.
- In ACPI, devices can consume resources (for example, legacy
- keyboards), provide resources (for example, a proprietary PCI
- bridge), or do both. Unless otherwise specified, resources for a
- device are assumed to be taken from the nearest matching resource
- above the device in the device hierarchy.
+[4] ACPI 6.0, sec 6.4.3.5.4:
- Extended Address Space Descriptor
- General Flags: Bit [0] Consumer/Producer:
- 1–This device consumes this resource
- 0–This device produces and consumes this resource
+[5] ACPI 6.0, sec 19.6.43:
- ResourceUsage specifies whether the Memory range is consumed by
- this device (ResourceConsumer) or passed on to child devices
- (ResourceProducer). If nothing is specified, then
- ResourceConsumer is assumed.
+[6] PCI Firmware 3.0, sec 4.1.2:
- If the operating system does not natively comprehend reserving the
- MMCFG region, the MMCFG region must be reserved by firmware. The
- address range reported in the MCFG table or by _CBA method (see
- Section 4.1.3) must be reserved by declaring a motherboard
- resource. For most systems, the motherboard resource would appear
- at the root of the ACPI namespace (under _SB) in a node with a
- _HID of EISAID (PNP0C02), and the resources in this case should
- not be claimed in the root PCI bus’s _CRS. The resources can
- optionally be returned in Int15 E820 or EFIGetMemoryMap as
- reserved memory but must always be reported through ACPI as a
- motherboard resource.
+[7] PCI Express 3.0, sec 7.2.2:
- For systems that are PC-compatible, or that do not implement a
- processor-architecture-specific firmware interface standard that
- allows access to the Configuration Space, the ECAM is required as
- defined in this section.
-- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 23, 2016 at 03:23:35AM +0000, Zheng, Lv wrote:
Hi, Bjorn
Thanks for the documentation. It really helps!
However I have a question below.
From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Bjorn Helgaas Subject: [PATCH] PCI: Add information about describing PCI in ACPI
Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com
Documentation/PCI/00-INDEX | 2 + Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 147231f..0780280 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX
- this file
+acpi-info.txt
- info on how PCI host bridges are represented in ACPI
MSI-HOWTO.txt
- the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
PCIEBUS-HOWTO.txt diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt new file mode 100644 index 0000000..ccbcfda --- /dev/null +++ b/Documentation/PCI/acpi-info.txt @@ -0,0 +1,136 @@
ACPI considerations for PCI host bridges
+The basic requirement is that the ACPI namespace should describe +*everything* that consumes address space unless there's another +standard way for the OS to find it [1, 2]. For example, windows that +are forwarded to PCI by a PCI host bridge should be described via ACPI +devices, since the OS can't locate the host bridge by itself. PCI +devices *below* the host bridge do not need to be described via ACPI, +because the resources they consume are inside the host bridge windows, +and the OS can discover them via the standard PCI enumeration +mechanism (using config accesses to read and size the BARs).
+This ACPI resource description is done via _CRS methods of devices in +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs: +the OS can read _CRS and figure out what resource is being consumed +even if it doesn't have a driver for the device [3]. That's important +because it means an old OS can work correctly even on a system with +new devices unknown to the OS. The new devices won't do anything, but +the OS can at least make sure no resources conflict with them.
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS +needs to know early in boot, before it can parse the ACPI namespace. +If a new table is defined, an old OS needs to operate correctly even +though it ignores the table. _CRS allows that because it is generic +and understood by the old OS; a static table does not.
The entire document doesn't talk about the details of _CBA. There is only one line below mentioned _CBA as an example.
Yes, that's a good point. I'll add some more details about MCFG and _CBA.
Bjorn