Here's another stab at this writeup. I'd appreciate any comments!
Changes from v1 to v2: - Consumer/Producer is defined for Extended Address Space descriptors; should be ignored for QWord/DWord/Word Address Space descriptors - New arches may use Extended Address Space descriptors in PNP0A03 for bridge registers, including ECAM (if the arch adds support for this) - Add more details about MCFG and _CBA (Lv's suggestion) - Incorporate Rafael's suggestions
---
Bjorn Helgaas (1): PCI: Add information about describing PCI in ACPI
Documentation/PCI/00-INDEX | 2 Documentation/PCI/acpi-info.txt | 180 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
Add a writeup about how PCI host bridges should be described in ACPI using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas bhelgaas@google.com --- Documentation/PCI/00-INDEX | 2 Documentation/PCI/acpi-info.txt | 180 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX index 147231f..0780280 100644 --- a/Documentation/PCI/00-INDEX +++ b/Documentation/PCI/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX - this file +acpi-info.txt + - info on how PCI host bridges are represented in ACPI MSI-HOWTO.txt - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. PCIEBUS-HOWTO.txt diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt new file mode 100644 index 0000000..06b877f --- /dev/null +++ b/Documentation/PCI/acpi-info.txt @@ -0,0 +1,180 @@ + ACPI considerations for PCI host bridges + +The basic requirement is that the ACPI namespace should describe +*everything* that consumes address space unless there's another standard +way for the OS to find it [1, 2]. For example, windows that are forwarded +to PCI by a PCI host bridge should be described via ACPI devices, since the +OS can't locate the host bridge by itself. PCI devices *below* the host +bridge do not need to be described via ACPI, because the resources they +consume are inside the host bridge windows, and the OS can discover them +via the standard PCI enumeration mechanism (using config accesses to read +and size the BARs). + +This ACPI resource description is done via _CRS objects of devices in the +ACPI namespace [2]. The _CRS is like a generalized PCI BAR: the OS can +read _CRS and figure out what resource is being consumed even if it doesn't +have a driver for the device [3]. That's important because it means an old +OS can work correctly even on a system with new devices unknown to the OS. +The new devices won't do anything, but the OS can at least make sure no +resources conflict with them. + +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for +reserving address space! The static tables are for things the OS needs to +know early in boot, before it can parse the ACPI namespace. If a new table +is defined, an old OS needs to operate correctly even though it ignores the +table. _CRS allows that because it is generic and understood by the old +OS; a static table does not. + +If the OS is expected to manage a non-discoverable device described via +ACPI, that device will have a specific _HID/_CID that tells the OS what +driver to bind to it, and the _CRS tells the OS and the driver where the +device's registers are. + +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. This includes all the windows +they forward down to the PCI bus, as well as bridge registers that are not +forwarded to PCI. The bridge registers include things like secondary/ +subordinate bus registers that determine the bus range below the bridge, +window registers that describe the apertures, etc. These are all +device-specific, non-architected things, so the only way a PNP0A03/PNP0A08 +driver can manage them is via _PRS/_CRS/_SRS, which contain the +device-specific details. The bridge registers also include ECAM space, +since it is consumed by the bridge. + +ACPI defines a Consumer/Producer bit to distinguish the bridge registers +("Consumer") from the bridge apertures ("Producer") [4, 5], but early +BIOSes didn't use that bit correctly. The result is that the current ACPI +spec defines Consumer/Producer only for the relatively new Extended Address +Space descriptors; the bit should be ignored in the older QWord/DWord/Word +Address Space descriptors. Consequently, OSes have to assume all +QWord/DWord/Word descriptors are windows. + +Prior to the addition of Extended Address Space descriptors, the failure of +Consumer/Producer meant there was no way to describe bridge registers in +the PNP0A03/PNP0A08 device itself. The workaround was to describe the +bridge registers (including ECAM space) in PNP0C02 catch-all devices [6]. +With the exception of ECAM, the bridge register space is device-specific +anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need to +know about it. + +New architectures should be able to use "Consumer" Extended Address Space +descriptors in the PNP0A03 device for bridge registers, including ECAM, +although a strict interpretation of [6] might prohibit this. Old x86 and +ia64 kernels assume all address space descriptors, including "Consumer" +Extended Address Space ones, are windows, so it would not be safe to +describe bridge registers this way on those architectures. + +PNP0C02 "motherboard" devices are basically a catch-all. There's no +programming model for them other than "don't use these resources for +anything else." So a PNP0C02 _CRS should claim any address space that is +(1) not claimed by _CRS under any other device object in the ACPI namespace +and (2) should not be assigned by the OS to something else. + +The PCIe spec requires the Enhanced Configuration Access Method (ECAM) +unless there's a standard firmware interface for config access, e.g., the +ia64 SAL interface [7]. A host bridge consumes ECAM memory address space +and converts memory accesses into PCI configuration accesses. The spec +defines the ECAM address space layout and functionality; only the base of +the address space is device-specific. An ACPI OS learns the base address +from either the static MCFG table or a _CBA method in the PNP0A03 device. + +The MCFG table must describe the ECAM space of non-hot pluggable host +bridges [8]. Since MCFG is a static table and can't be updated by hotplug, +a _CBA method in the PNP0A03 device describes the ECAM space of a +hot-pluggable host bridge [9]. Note that for both MCFG and _CBA, the base +address always corresponds to bus 0, even if the bus range below the bridge +(which is reported via _CRS) doesn't start at 0. + + +[1] ACPI 6.0, sec 6.1: + For any device that is on a non-enumerable type of bus (for example, an + ISA bus), OSPM enumerates the devices' identifier(s) and the ACPI + system firmware must supply an _HID object ... for each device to + enable OSPM to do that. + +[2] ACPI 6.0, sec 3.7: + The OS enumerates motherboard devices simply by reading through the + ACPI Namespace looking for devices with hardware IDs. + + Each device enumerated by ACPI includes ACPI-defined objects in the + ACPI Namespace that report the hardware resources the device could + occupy [_PRS], an object that reports the resources that are currently + used by the device [_CRS], and objects for configuring those resources + [_SRS]. The information is used by the Plug and Play OS (OSPM) to + configure the devices. + +[3] ACPI 6.0, sec 6.2: + OSPM uses device configuration objects to configure hardware resources + for devices enumerated via ACPI. Device configuration objects provide + information about current and possible resource requirements, the + relationship between shared resources, and methods for configuring + hardware resources. + + When OSPM enumerates a device, it calls _PRS to determine the resource + requirements of the device. It may also call _CRS to find the current + resource settings for the device. Using this information, the Plug and + Play system determines what resources the device should consume and + sets those resources by calling the device’s _SRS control method. + + In ACPI, devices can consume resources (for example, legacy keyboards), + provide resources (for example, a proprietary PCI bridge), or do both. + Unless otherwise specified, resources for a device are assumed to be + taken from the nearest matching resource above the device in the device + hierarchy. + +[4] ACPI 6.0, sec 6.4.3.5.1, 2, 3, 4: + QWord/DWord/Word Address Space Descriptor (.1, .2, .3) + General Flags: Bit [0] Ignored + + Extended Address Space Descriptor (.4) + General Flags: Bit [0] Consumer/Producer: + 1–This device consumes this resource + 0–This device produces and consumes this resource + +[5] ACPI 6.0, sec 19.6.43: + ResourceUsage specifies whether the Memory range is consumed by + this device (ResourceConsumer) or passed on to child devices + (ResourceProducer). If nothing is specified, then + ResourceConsumer is assumed. + +[6] PCI Firmware 3.0, sec 4.1.2: + If the operating system does not natively comprehend reserving the + MMCFG region, the MMCFG region must be reserved by firmware. The + address range reported in the MCFG table or by _CBA method (see Section + 4.1.3) must be reserved by declaring a motherboard resource. For most + systems, the motherboard resource would appear at the root of the ACPI + namespace (under _SB) in a node with a _HID of EISAID (PNP0C02), and + the resources in this case should not be claimed in the root PCI bus’s + _CRS. The resources can optionally be returned in Int15 E820 or + EFIGetMemoryMap as reserved memory but must always be reported through + ACPI as a motherboard resource. + +[7] PCI Express 3.0, sec 7.2.2: + For systems that are PC-compatible, or that do not implement a + processor-architecture-specific firmware interface standard that allows + access to the Configuration Space, the ECAM is required as defined in + this section. + +[8] PCI Firmware 3.0, sec 4.1.2: + The MCFG table is an ACPI table that is used to communicate the base + addresses corresponding to the non-hot removable PCI Segment Groups + range within a PCI Segment Group available to the operating system at + boot. This is required for the PC-compatible systems. + + The MCFG table is only used to communicate the base addresses + corresponding to the PCI Segment Groups available to the system at + boot. + +[9] PCI Firmware 3.0, sec 4.1.3: + The _CBA (Memory mapped Configuration Base Address) control method is + an optional ACPI object that returns the 64-bit memory mapped + configuration base address for the hot plug capable host bridge. The + base address returned by _CBA is processor-relative address. The _CBA + control method evaluates to an Integer. + + This control method appears under a host bridge object. When the _CBA + method appears under an active host bridge object, the operating system + evaluates this structure to identify the memory mapped configuration + base address corresponding to the PCI Segment Group for the bus number + range specified in _CRS method. An ACPI name space object that contains + the _CBA method must also contain a corresponding _SEG method.
On 11/29/2016 04:39 PM, Bjorn Helgaas wrote:
+New architectures should be able to use "Consumer" Extended Address Space +descriptors in the PNP0A03 device for bridge registers, including ECAM, +although a strict interpretation of [6] might prohibit this. Old x86 and +ia64 kernels assume all address space descriptors, including "Consumer" +Extended Address Space ones, are windows, so it would not be safe to +describe bridge registers this way on those architectures.
<snip>
+[6] PCI Firmware 3.0, sec 4.1.2:
<snip>
Thanks for the revised writeup, Bjorn. It's great. I'm trying to get the above clarified explicitly in terms of the spec, and in terms of what other Operating Systems would like to see as general preference.
To your point about second generation ARM (server) systems: we're actually on generation 3+ now and finally getting to the point where people are listening. A great many times over the past few years, people have had to be sat on until they did what was needed. Fortunately, we are going to finally have upstream kernels (and distros based upon them) that boot out of the box on compliant hardware and will be able to point people at the usual "upstream first" messaging we've been pushing.
I had originally fallen for the SoC koolaid that PCIe was not essential, and was convinced fairly early that this was nonsense. But it has taken a few years for everyone else to get onto that bandwagon. First you give them exactly what they know and love (a 1-2 socket Xeon class machine with lots of PCIe lanes), then you go and fix the design to give them what they actually need (which logically enumerates as PCIe but isn't) ;)
Jon.
On Tue, Dec 13, 2016 at 04:09:39AM -0500, Jon Masters wrote:
On 11/29/2016 04:39 PM, Bjorn Helgaas wrote:
+New architectures should be able to use "Consumer" Extended Address Space +descriptors in the PNP0A03 device for bridge registers, including ECAM, +although a strict interpretation of [6] might prohibit this. Old x86 and +ia64 kernels assume all address space descriptors, including "Consumer" +Extended Address Space ones, are windows, so it would not be safe to +describe bridge registers this way on those architectures.
<snip>
+[6] PCI Firmware 3.0, sec 4.1.2:
<snip>
Thanks for the revised writeup, Bjorn. It's great. I'm trying to get the above clarified explicitly in terms of the spec, and in terms of what other Operating Systems would like to see as general preference.
Any feedback on this? I'd like to post a revised version soon for v4.11.
Bjorn
Bjorn, this email was marked as spam, because:
It has a from address in google.com but has failed google.com's required tests for authentication
in particular, it looks like you used a non-google smtp server (kernel.org) to send the email, so there is no DKIM hash (or perhaps google just uses some other non-standard marker for "this actually came from google"). So gmail marks it as spam because dmarc fails:
dmarc=fail (p=REJECT dis=NONE) header.from=google.com
Just to let you know. If you use your google.com email, you do need to go through the google smtp server.
This may or may not be new - I didn't go and look at old messages of yours, but it is possible that google.com enabled dmarc/dkim recently.
Linus
On Tue, Nov 29, 2016 at 1:39 PM, Bjorn Helgaas bhelgaas@google.com wrote:
Here's another stab at this writeup. I'd appreciate any comments!
On Tue, Nov 29, 2016 at 03:39:11PM -0800, Linus Torvalds wrote:
Bjorn, this email was marked as spam, because:
It has a from address in google.com but has failed google.com's required tests for authentication
in particular, it looks like you used a non-google smtp server (kernel.org) to send the email, so there is no DKIM hash (or perhaps google just uses some other non-standard marker for "this actually came from google"). So gmail marks it as spam because dmarc fails:
dmarc=fail (p=REJECT dis=NONE) header.from=google.com
Just to let you know. If you use your google.com email, you do need to go through the google smtp server.
This may or may not be new - I didn't go and look at old messages of yours, but it is possible that google.com enabled dmarc/dkim recently.
Argh, thanks for letting me know. Looks like I've had this broken for a long time, but I didn't notice. I think I have it fixed so git will record the author as bhelgaas@google.com, but git/stgit will send email from helgaas@kernel.org via the kernel.org smtp server.
On Tue, Nov 29, 2016 at 03:39:48PM -0600, Bjorn Helgaas wrote:
Here's another stab at this writeup. I'd appreciate any comments!
Changes from v1 to v2:
- Consumer/Producer is defined for Extended Address Space descriptors; should be ignored for QWord/DWord/Word Address Space descriptors
- New arches may use Extended Address Space descriptors in PNP0A03 for bridge registers, including ECAM (if the arch adds support for this)
- Add more details about MCFG and _CBA (Lv's suggestion)
- Incorporate Rafael's suggestions
Bjorn Helgaas (1): PCI: Add information about describing PCI in ACPI
Documentation/PCI/00-INDEX | 2 Documentation/PCI/acpi-info.txt | 180 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
It's very late in the cycle, but I'm considering trying to squeeze this into v4.9 on the grounds that:
- It's only a documentation change and can't break anything, and
- Distributing it more widely may help the arm64 firmware ecosystem
But I don't want to disseminate misleading or incorrect information, so if it needs clarification or wordsmithing, or even just maturation, I'll wait until v4.10.
The Consumer/Producer stuff, in particular, doesn't seem 100% settled yet. Your thoughts, and especially your improvements, are welcome!
Bjorn
On Thursday, December 01, 2016 04:36:04 PM Bjorn Helgaas wrote:
On Tue, Nov 29, 2016 at 03:39:48PM -0600, Bjorn Helgaas wrote:
Here's another stab at this writeup. I'd appreciate any comments!
Changes from v1 to v2:
- Consumer/Producer is defined for Extended Address Space descriptors; should be ignored for QWord/DWord/Word Address Space descriptors
- New arches may use Extended Address Space descriptors in PNP0A03 for bridge registers, including ECAM (if the arch adds support for this)
- Add more details about MCFG and _CBA (Lv's suggestion)
- Incorporate Rafael's suggestions
Bjorn Helgaas (1): PCI: Add information about describing PCI in ACPI
Documentation/PCI/00-INDEX | 2 Documentation/PCI/acpi-info.txt | 180 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
It's very late in the cycle, but I'm considering trying to squeeze this into v4.9 on the grounds that:
It's only a documentation change and can't break anything, and
Distributing it more widely may help the arm64 firmware ecosystem
But I don't want to disseminate misleading or incorrect information, so if it needs clarification or wordsmithing, or even just maturation, I'll wait until v4.10.
The Consumer/Producer stuff, in particular, doesn't seem 100% settled yet. Your thoughts, and especially your improvements, are welcome!
Well, what's the drawback if it doesn't go into 4.9?
Thanks, Rafael
On Thu, Dec 01, 2016 at 11:37:39PM +0100, Rafael J. Wysocki wrote:
On Thursday, December 01, 2016 04:36:04 PM Bjorn Helgaas wrote:
On Tue, Nov 29, 2016 at 03:39:48PM -0600, Bjorn Helgaas wrote:
Here's another stab at this writeup. I'd appreciate any comments!
Changes from v1 to v2:
- Consumer/Producer is defined for Extended Address Space descriptors; should be ignored for QWord/DWord/Word Address Space descriptors
- New arches may use Extended Address Space descriptors in PNP0A03 for bridge registers, including ECAM (if the arch adds support for this)
- Add more details about MCFG and _CBA (Lv's suggestion)
- Incorporate Rafael's suggestions
Bjorn Helgaas (1): PCI: Add information about describing PCI in ACPI
Documentation/PCI/00-INDEX | 2 Documentation/PCI/acpi-info.txt | 180 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
It's very late in the cycle, but I'm considering trying to squeeze this into v4.9 on the grounds that:
It's only a documentation change and can't break anything, and
Distributing it more widely may help the arm64 firmware ecosystem
But I don't want to disseminate misleading or incorrect information, so if it needs clarification or wordsmithing, or even just maturation, I'll wait until v4.10.
The Consumer/Producer stuff, in particular, doesn't seem 100% settled yet. Your thoughts, and especially your improvements, are welcome!
Well, what's the drawback if it doesn't go into 4.9?
Only that it's not as easily accessible. ARM64 ACPI firmware is brand new. Neither the firmware nor the kernel developers, nor even the hardware designers, have the benefit of all the x86/ia64 history, so I wrote this to try to come to a common understanding of what Linux expects.
The first generation of ARM64 hardware is already in the field, and it has teething problems in hardware, firmware, and kernel. For example, the current MCFG quirk situation: the ECAM hardware doesn't work quite per spec, the ACPI firmware doesn't describe the address space completely, and we don't really have consensus on how the firmware should communicate register space to the kernel.
We're hoping the second generation can fix some of these problems, and I think this is the time to try to influence that.
Bjorn
On Fri, Dec 2, 2016 at 12:27 AM, Bjorn Helgaas helgaas@kernel.org wrote:
On Thu, Dec 01, 2016 at 11:37:39PM +0100, Rafael J. Wysocki wrote:
On Thursday, December 01, 2016 04:36:04 PM Bjorn Helgaas wrote:
On Tue, Nov 29, 2016 at 03:39:48PM -0600, Bjorn Helgaas wrote:
Here's another stab at this writeup. I'd appreciate any comments!
Changes from v1 to v2:
- Consumer/Producer is defined for Extended Address Space descriptors; should be ignored for QWord/DWord/Word Address Space descriptors
- New arches may use Extended Address Space descriptors in PNP0A03 for bridge registers, including ECAM (if the arch adds support for this)
- Add more details about MCFG and _CBA (Lv's suggestion)
- Incorporate Rafael's suggestions
Bjorn Helgaas (1): PCI: Add information about describing PCI in ACPI
Documentation/PCI/00-INDEX | 2 Documentation/PCI/acpi-info.txt | 180 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
It's very late in the cycle, but I'm considering trying to squeeze this into v4.9 on the grounds that:
It's only a documentation change and can't break anything, and
Distributing it more widely may help the arm64 firmware ecosystem
But I don't want to disseminate misleading or incorrect information, so if it needs clarification or wordsmithing, or even just maturation, I'll wait until v4.10.
The Consumer/Producer stuff, in particular, doesn't seem 100% settled yet. Your thoughts, and especially your improvements, are welcome!
Well, what's the drawback if it doesn't go into 4.9?
Only that it's not as easily accessible. ARM64 ACPI firmware is brand new. Neither the firmware nor the kernel developers, nor even the hardware designers, have the benefit of all the x86/ia64 history, so I wrote this to try to come to a common understanding of what Linux expects.
The first generation of ARM64 hardware is already in the field, and it has teething problems in hardware, firmware, and kernel. For example, the current MCFG quirk situation: the ECAM hardware doesn't work quite per spec, the ACPI firmware doesn't describe the address space completely, and we don't really have consensus on how the firmware should communicate register space to the kernel.
We're hoping the second generation can fix some of these problems, and I think this is the time to try to influence that.
Well, I would be super-careful if I were you, then. :-)
I'm not sure if squeezing it into 4.9.0 buys you anything here. If you get it into 4.10-rc, you can request -stable to pick it up (at least in principle) and then it will show up in 4.9.y at one point which should suffice I suppose?
Thanks, Rafael
On Fri, Dec 02, 2016 at 01:28:50AM +0100, Rafael J. Wysocki wrote:
On Fri, Dec 2, 2016 at 12:27 AM, Bjorn Helgaas helgaas@kernel.org wrote:
On Thu, Dec 01, 2016 at 11:37:39PM +0100, Rafael J. Wysocki wrote:
On Thursday, December 01, 2016 04:36:04 PM Bjorn Helgaas wrote:
On Tue, Nov 29, 2016 at 03:39:48PM -0600, Bjorn Helgaas wrote:
Here's another stab at this writeup. I'd appreciate any comments!
Changes from v1 to v2:
- Consumer/Producer is defined for Extended Address Space descriptors; should be ignored for QWord/DWord/Word Address Space descriptors
- New arches may use Extended Address Space descriptors in PNP0A03 for bridge registers, including ECAM (if the arch adds support for this)
- Add more details about MCFG and _CBA (Lv's suggestion)
- Incorporate Rafael's suggestions
Bjorn Helgaas (1): PCI: Add information about describing PCI in ACPI
Documentation/PCI/00-INDEX | 2 Documentation/PCI/acpi-info.txt | 180 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/PCI/acpi-info.txt
It's very late in the cycle, but I'm considering trying to squeeze this into v4.9 on the grounds that:
It's only a documentation change and can't break anything, and
Distributing it more widely may help the arm64 firmware ecosystem
But I don't want to disseminate misleading or incorrect information, so if it needs clarification or wordsmithing, or even just maturation, I'll wait until v4.10.
The Consumer/Producer stuff, in particular, doesn't seem 100% settled yet. Your thoughts, and especially your improvements, are welcome!
Well, what's the drawback if it doesn't go into 4.9?
Only that it's not as easily accessible. ARM64 ACPI firmware is brand new. Neither the firmware nor the kernel developers, nor even the hardware designers, have the benefit of all the x86/ia64 history, so I wrote this to try to come to a common understanding of what Linux expects.
The first generation of ARM64 hardware is already in the field, and it has teething problems in hardware, firmware, and kernel. For example, the current MCFG quirk situation: the ECAM hardware doesn't work quite per spec, the ACPI firmware doesn't describe the address space completely, and we don't really have consensus on how the firmware should communicate register space to the kernel.
We're hoping the second generation can fix some of these problems, and I think this is the time to try to influence that.
Well, I would be super-careful if I were you, then. :-)
I'm not sure if squeezing it into 4.9.0 buys you anything here. If you get it into 4.10-rc, you can request -stable to pick it up (at least in principle) and then it will show up in 4.9.y at one point which should suffice I suppose?
You're right, there's no real hurry, and the rate of change is a good indication that we need to let things settle out for a while.