Back in September 2014, a meeting was held at Linaro Connect where we discussed what issues remained before the arm64 ACPI core patches could be merged into the kernel, creating the TODO list below. I should have published this list sooner; I got focused on trying to resolve some of the issues instead.
We have made some progress on all of these items. But, I want to make sure we haven't missed something. Since this list was compiled by only the people in the room at Connect, it is probable we have. I, for one, do not yet claim omniscience.
So, I want to ask the ARM and ACPI communities:
-- Is this list correct?
-- Is this list complete?
Below is what we currently know about; very brief notes on status are included. The TL;DR versions of the TODO list and the current status can be found at:
https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/CoreUpstreamNotes
and I'll do my best to kept that up to date.
Thanks. Any and all feedback is greatly appreciated.
TODO List for ACPI on arm64: ============================ 1. Define how Aarch64 OS identifies itself to firmware. * Problem: * _OSI method is demonstrably unreliable. On x86 Linux claims to be Windows * Proposal to use _OSC method as replacement is complicated and creates an explosion of combinations * Solution: * Draft and propose OS identification rules to ABST and ASWG for inclusion in ACPI spec. * Draft and propose recommended practice for current ACPI 5.1 spec platforms. * Status: Little progress, still under investigation
2. Linux must choose DT booting by default when offered both ACPI and DT on arm64 * DONE
3. Linux UEFI/ACPI testing tools must be made available * Problem: * Hardware/Firmware vendors do not have tools to test Linux compatibility. * Common problems go undetected if not tested for. * Solution: * Port FWTS tool and LuvOS distribution to AArch64 * Make LuvOS images readily available * Require hardware vendors to actively test against old and new kernels. * Status: LuvOS and FWTS ported to arm64 and running; patches being mainlined; additional test cases being written.
4. Set clear expectations for those providing ACPI for use with Linux * Problem: * Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance * Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do * Solution: document the expectations, and iterate as needed. Enforce when we must. * Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
5. Demonstrate the ACPI core patches work * Problem: how can we be sure the patches work? * Solution: verify the patches on arm64 server platforms * Status: * ACPI core works on at least the Foundation model, Juno, APM Mustang, and AMD Seattle * FWTS results will be published as soon as possible
6. How does the kernel handle_DSD usage? * Problem: * _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined? * How do we ensure DT and _DSD bindings remain consistent with each other? * Solution: public documentation for all bindings, and a process for defining them * Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
7. Why is ACPI required? * Problem: * arm64 maintainers still haven't been convinced that ACPI is necessary. * Why do hardware and OS vendors say ACPI is required? * Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
8. Platform support patches need review * Problem: the core Aarch64 patches have been reviewed and are in good shape, but there is not yet a good example of server platform support patches that use them. * Solution: Post *good* patches for multiple ACPI platforms * Status: first version for AMD Seattle has been posted to the public linaro-acpi mailing list for initial review (thanks, Arnd), refined versions to be posted to broader lists after a few iterations for basic cleanup
On Monday 15 December 2014 19:18:16 Al Stone wrote:
Below is what we currently know about; very brief notes on status are included. The TL;DR versions of the TODO list and the current status can be found at:
https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/CoreUpstreamNotes
and I'll do my best to kept that up to date.
Thanks. Any and all feedback is greatly appreciated.
Thanks for keeping the list up to date!
TODO List for ACPI on arm64:
- Define how Aarch64 OS identifies itself to firmware.
- Problem:
- _OSI method is demonstrably unreliable. On x86 Linux claims to be Windows
- Proposal to use _OSC method as replacement is complicated and creates an explosion of combinations
- Solution:
- Draft and propose OS identification rules to ABST and ASWG for inclusion in ACPI spec.
- Draft and propose recommended practice for current ACPI 5.1 spec platforms.
- Status: Little progress, still under investigation
I must have missed the problem with _OSC, it sounded like it was good enough, but I have no clue about the details.
Linux must choose DT booting by default when offered both ACPI and DT on arm64
- DONE
Linux UEFI/ACPI testing tools must be made available
- Problem:
- Hardware/Firmware vendors do not have tools to test Linux compatibility.
- Common problems go undetected if not tested for.
- Solution:
- Port FWTS tool and LuvOS distribution to AArch64
- Make LuvOS images readily available
- Require hardware vendors to actively test against old and new kernels.
- Status: LuvOS and FWTS ported to arm64 and running; patches being mainlined; additional test cases being written.
Ah, nice!
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
After seeing the AMD patches, I would add to this point that we need to find a way to come up with shared bindings for common hardware such as the ARM pl011/pl022/pl061/pl330 IP blocks or the designware i2c/spi/usb3/ahci blocks.
What I remember from this item on your list is actually a different problem: We need to define more clearly what kinds of machines we would expect ACPI support for and which machines we would not.
Fine-grained clock support is one such example: if anybody needs to expose that through a clock driver in the kernel, we need to be very clear that we will not support that kind of machine through ACPI, so we don't get developers building BIOS images that will never be supported. Other examples would be non-compliant PCI hosts or machines that are not covered by SBSA.
- Demonstrate the ACPI core patches work
- Problem: how can we be sure the patches work?
- Solution: verify the patches on arm64 server platforms
- Status:
- ACPI core works on at least the Foundation model, Juno, APM Mustang, and AMD Seattle
- FWTS results will be published as soon as possible
I think the problem is to a lesser degree the verification of the patches, but to have a patch set that demonstrates /how/ everything can work, and what the possible limitations are. I would not worry about any bugs that might keep the system from working properly, as long as you can show that there is a plan to make that work.
Out of the four platforms you list, I think we have concluded that three of them are not appropriate for use with ACPI, but in order to do that, we needed to review the patches and pinpoint specific issues so we could avoid just exchanging different opinions on the matter of it "works" or not.
- How does the kernel handle_DSD usage?
- Problem:
- _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined?
- How do we ensure DT and _DSD bindings remain consistent with each other?
- Solution: public documentation for all bindings, and a process for defining them
- Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
I think we also need to make a decision here on whether we want to use PRP0001 devices on ARM64 servers, and to what degree. I would prefer if we could either make them required for any devices that already have a DT binding and that are not part of the official ACPI spec, or we decide to not use them at all and make any PRP0001 usage a testcase failure.
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
- Platform support patches need review
- Problem: the core Aarch64 patches have been reviewed and are in good shape, but there is not yet a good example of server platform support patches that use them.
- Solution: Post *good* patches for multiple ACPI platforms
- Status: first version for AMD Seattle has been posted to the public linaro-acpi mailing list for initial review (thanks, Arnd), refined versions to be posted to broader lists after a few iterations for basic cleanup
Arnd
On Tue, Dec 16, 2014 at 11:27:48AM +0000, Arnd Bergmann wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
TODO List for ACPI on arm64:
- Define how Aarch64 OS identifies itself to firmware.
- Problem:
- _OSI method is demonstrably unreliable. On x86 Linux claims to be Windows
- Proposal to use _OSC method as replacement is complicated and creates an explosion of combinations
- Solution:
- Draft and propose OS identification rules to ABST and ASWG for inclusion in ACPI spec.
- Draft and propose recommended practice for current ACPI 5.1 spec platforms.
- Status: Little progress, still under investigation
I must have missed the problem with _OSC, it sounded like it was good enough, but I have no clue about the details.
Neither do I. It's also not entirely clear whether per-device _OSC are arbitrary or vendors need to go through some kind of approving process (as with the new _DSD process).
- Linux must choose DT booting by default when offered both ACPI and DT on arm64
- DONE
I'm fine with this but just a clarification note here. We can't have acpi=off in the absence of DT because the system may not be able to report the error (no PC standard with some fixed 8250 port to fall back to). That's unless we print the error from the EFI_STUB before exiting boot services (e.g. "No Device Tree and no acpi=on; cannot boot").
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
After seeing the AMD patches, I would add to this point that we need to find a way to come up with shared bindings for common hardware such as the ARM pl011/pl022/pl061/pl330 IP blocks or the designware i2c/spi/usb3/ahci blocks.
What I remember from this item on your list is actually a different problem: We need to define more clearly what kinds of machines we would expect ACPI support for and which machines we would not.
Fine-grained clock support is one such example: if anybody needs to expose that through a clock driver in the kernel, we need to be very clear that we will not support that kind of machine through ACPI, so we don't get developers building BIOS images that will never be supported. Other examples would be non-compliant PCI hosts or machines that are not covered by SBSA.
Another example is SMP booting. The ACPI 5.1 spec mentions the parking protocol but I can't find a reference to the latest document. In the meantime, we stick to PSCI.
From a recent more private discussion I learnt that hardware or firmware
people are not keen on reading a Linux arm-acpi.txt document, probably thinking it is too Linux specific. The proposal is to (longer term) move parts of this document (which are not Linux specific) into SBBR or maybe a new document for run-time requirements. Linux will still keep a statement about compliance with such documents and other Linux-specific aspects.
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
It's an "industry standard", you shouldn't question any further ;).
On 12/16/2014 08:27 AM, Catalin Marinas wrote:
On Tue, Dec 16, 2014 at 11:27:48AM +0000, Arnd Bergmann wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
TODO List for ACPI on arm64:
[snip..]
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
[snip...]
Another example is SMP booting. The ACPI 5.1 spec mentions the parking protocol but I can't find a reference to the latest document. In the meantime, we stick to PSCI.
Hrm. A bug in the spec.
Every external document mentioned in the ACPI spec is supposed to have a link that will eventually get you to the source document. All links in the spec should point here http://www.uefi.org/acpi which in turn has links to the authoritative original documents. However, it looks like the parking protocol document pointed to (the "Multiprocessor Startup" link) may not be the most recent version. The reference in the spec to the protocol (Table 5-61, Section 5.2.12.14)) also appears to be useless (it points to http://infocenter.arm.com/help/index.jsp which doesn't have the document either). I've filed a change request with ASWG to fix this.
That being said, the early systems still don't provide PSCI. They will at some point in the future, but not now. Regardless, I think it's reasonable for us to say that if you want ACPI support, PSCI must be used for secondary CPU startup. People can hack something up to get the parking protocol to work on development branches if they want, but I personally see no need to get that into the kernel -- and it needs to be said explicitly in arm-acpi.txt.
On Wed, Dec 17, 2014 at 12:03:06AM +0000, Al Stone wrote:
On 12/16/2014 08:27 AM, Catalin Marinas wrote:
On Tue, Dec 16, 2014 at 11:27:48AM +0000, Arnd Bergmann wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
TODO List for ACPI on arm64:
[snip..]
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
[snip...]
Another example is SMP booting. The ACPI 5.1 spec mentions the parking protocol but I can't find a reference to the latest document. In the meantime, we stick to PSCI.
Hrm. A bug in the spec.
Every external document mentioned in the ACPI spec is supposed to have a link that will eventually get you to the source document. All links in the spec should point here http://www.uefi.org/acpi which in turn has links to the authoritative original documents. However, it looks like the parking protocol document pointed to (the "Multiprocessor Startup" link) may not be the most recent version. The reference in the spec to the protocol (Table 5-61, Section 5.2.12.14)) also appears to be useless (it points to http://infocenter.arm.com/help/index.jsp which doesn't have the document either). I've filed a change request with ASWG to fix this.
Thanks. I followed the same links but couldn't find any newer version.
That being said, the early systems still don't provide PSCI. They will at some point in the future, but not now. Regardless, I think it's reasonable for us to say that if you want ACPI support, PSCI must be used for secondary CPU startup. People can hack something up to get the parking protocol to work on development branches if they want, but I personally see no need to get that into the kernel -- and it needs to be said explicitly in arm-acpi.txt.
I'm fine with this. But note that it pretty much rules out the APM boards (I think we ruled them out a few times already) which don't have an EL3 to host PSCI firmware. EL2 is not suitable as it is likely that we want this level for KVM or Xen rather than platform firmware.
On 12/17/14, 4:25 AM, Catalin Marinas wrote:
On Wed, Dec 17, 2014 at 12:03:06AM +0000, Al Stone wrote:
That being said, the early systems still don't provide PSCI.
Clarification: some early systems are capable of PSCI but the firmware was in progress (and thus they implemented the Parking Protocol initially but will do full PSCI) - for example AMD Seattle - while one other implementation is unable to provide PSCI and will only do Parking Protocol. Both are trivially supportable through bits in the ACPI MADT.
They will at some point in the future, but not now. Regardless, I think it's reasonable for us to say that if you want ACPI support, PSCI must be used for secondary CPU startup. People can hack something up to get the parking protocol to work on development branches if they want, but I personally see no need to get that into the kernel -- and it needs to be said explicitly in arm-acpi.txt.
That could be ok for upstream - it's up to you guys - but note that there will be some early hardware that doesn't do PSCI. I've sat on *everyone* over the past couple of years behind the scenes to ensure that all future designs will do PSCI, but this takes time to realize.
I'm fine with this. But note that it pretty much rules out the APM boards (I think we ruled them out a few times already) which don't have an EL3 to host PSCI firmware. EL2 is not suitable as it is likely that we want this level for KVM or Xen rather than platform firmware.
The gen1 APM boards work great with UEFI and ACPI using the Parking Protocol. It's trivial to support, and it's a fairly constrained since everyone is headed toward PSCI. Personally I consider it unfair to punish the first guy out of the gate for something that was standardized after they made their design. My recommendation would be to get the relevant document updated in the public repository. Patches that implement Parking Protocol are already in existence/working well as an alternative to PSCI (which is always preferred if available).
Jon.
On Thu, Dec 18, 2014 at 04:57:05AM +0000, Jon Masters wrote:
On 12/17/14, 4:25 AM, Catalin Marinas wrote:
On Wed, Dec 17, 2014 at 12:03:06AM +0000, Al Stone wrote:
They will at some point in the future, but not now. Regardless, I think it's reasonable for us to say that if you want ACPI support, PSCI must be used for secondary CPU startup. People can hack something up to get the parking protocol to work on development branches if they want, but I personally see no need to get that into the kernel -- and it needs to be said explicitly in arm-acpi.txt.
[...]
I'm fine with this. But note that it pretty much rules out the APM boards (I think we ruled them out a few times already) which don't have an EL3 to host PSCI firmware. EL2 is not suitable as it is likely that we want this level for KVM or Xen rather than platform firmware.
The gen1 APM boards work great with UEFI and ACPI using the Parking Protocol. It's trivial to support, and it's a fairly constrained since everyone is headed toward PSCI. Personally I consider it unfair to punish the first guy out of the gate for something that was standardized after they made their design.
Most of the complaints about the APM ACPI support were around drivers, clocks rather than the parking protocol. On the latter, we complained when Hanjun posted core patches that did not match the documentation and the (old) parking protocol spec was not suitable to AArch64 either.
My recommendation would be to get the relevant document updated in the public repository.
This should be stronger than just a "recommendation".
Patches that implement Parking Protocol are already in existence/working well as an alternative to PSCI (which is always preferred if available).
Let's hope they match the new spec ;).
What the parking protocol doesn't cover (last time I looked) is a way to return CPUs to firmware for hotplug or CPUidle. On non-PSCI platforms this would need to be done with SoC-specific code implementing (part of) the cpu_operations back-end. Technically, I think there are solutions to standardise such CPU return to firmware but it's debatable whether it's worth the effort given the long term PSCI recommendation.
-----Original Message----- From: linaro-acpi-bounces@lists.linaro.org [mailto:linaro-acpi- bounces@lists.linaro.org] On Behalf Of Al Stone Sent: 17 December 2014 00:03 To: Catalin Marinas; Arnd Bergmann Cc: linaro-acpi@lists.linaro.org; Rafael J. Wysocki; ACPI Devel Mailing List; Olof Johansson; linux-arm-kernel@lists.infradead.org Subject: Re: [Linaro-acpi] [RFC] ACPI on arm64 TODO List
On 12/16/2014 08:27 AM, Catalin Marinas wrote:
On Tue, Dec 16, 2014 at 11:27:48AM +0000, Arnd Bergmann wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
TODO List for ACPI on arm64:
[snip..]
- Set clear expectations for those providing ACPI for use with
Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables
that
cannot be used by Linux without some guidance * Kernel developers cannot determine whether the kernel or
firmware
is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
[snip...]
Another example is SMP booting. The ACPI 5.1 spec mentions the
parking
protocol but I can't find a reference to the latest document. In the meantime, we stick to PSCI.
Hrm. A bug in the spec.
Every external document mentioned in the ACPI spec is supposed to have a link that will eventually get you to the source document. All links in the spec should point here http://www.uefi.org/acpi which in turn has links to the authoritative original documents. However, it looks like the parking protocol document pointed to (the "Multiprocessor Startup" link) may not be the most recent version. The reference in the spec to the protocol (Table 5-61, Section 5.2.12.14)) also appears to be useless (it points to http://infocenter.arm.com/help/index.jsp which doesn't have the document either). I've filed a change request with ASWG to fix this.
I also raised both of these a while back, I expect the next errata release to correct this.
That being said, the early systems still don't provide PSCI. They will at some point in the future, but not now. Regardless, I think it's reasonable for us to say that if you want ACPI support, PSCI must be used for secondary CPU startup. People can hack something up to get the parking protocol to work on development branches if they want, but I personally see no need to get that into the kernel -- and it needs to be said explicitly in arm-acpi.txt.
-- ciao, al
Al Stone Software Engineer Linaro Enterprise Group al.stone@linaro.org
Linaro-acpi mailing list Linaro-acpi@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-acpi
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
After seeing the AMD patches, I would add to this point that we need to find a way to come up with shared bindings for common hardware such as the ARM pl011/pl022/pl061/pl330 IP blocks or the designware i2c/spi/usb3/ahci blocks.
What I remember from this item on your list is actually a different problem: We need to define more clearly what kinds of machines we would expect ACPI support for and which machines we would not.
Fine-grained clock support is one such example: if anybody needs to expose that through a clock driver in the kernel, we need to be very clear that we will not support that kind of machine through ACPI, so we don't get developers building BIOS images that will never be supported. Other examples would be non-compliant PCI hosts or machines that are not covered by SBSA.
[...]
- How does the kernel handle_DSD usage?
- Problem:
- _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined?
- How do we ensure DT and _DSD bindings remain consistent with each other?
- Solution: public documentation for all bindings, and a process for defining them
- Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
I think we also need to make a decision here on whether we want to use PRP0001 devices on ARM64 servers, and to what degree. I would prefer if we could either make them required for any devices that already have a DT binding and that are not part of the official ACPI spec, or we decide to not use them at all and make any PRP0001 usage a testcase failure.
I am rather concerned about the relationship between items described with _DSD and ACPI's existing device model. Describing the relationship between devices and their input clocks, regulators, and so on defeats much of the benefit ACPI is marketed as providing w.r.t. abstraction of the underlying platform (and as Arnd mentioned above, that's not the kind of platform we want to support with ACPI).
I have not seen good guidelines on the usage of _DSD in that respect, and I'm worried we'll end up with clock controllers half-owned by AML and half-owned by the kernel's clock framework, and this separation varying from board to board (and FW revision to FW revision). I think that needs to be clarified at the ACPI spec level in addition to anything we have in the kernel documentation.
I'm still of the opinion that conflating _DSD and DT is a bad idea.
Mark.
On 12/16/2014 08:48 AM, Mark Rutland wrote:
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
After seeing the AMD patches, I would add to this point that we need to find a way to come up with shared bindings for common hardware such as the ARM pl011/pl022/pl061/pl330 IP blocks or the designware i2c/spi/usb3/ahci blocks.
What I remember from this item on your list is actually a different problem: We need to define more clearly what kinds of machines we would expect ACPI support for and which machines we would not.
Fine-grained clock support is one such example: if anybody needs to expose that through a clock driver in the kernel, we need to be very clear that we will not support that kind of machine through ACPI, so we don't get developers building BIOS images that will never be supported. Other examples would be non-compliant PCI hosts or machines that are not covered by SBSA.
[...]
- How does the kernel handle_DSD usage?
- Problem:
- _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined?
- How do we ensure DT and _DSD bindings remain consistent with each other?
- Solution: public documentation for all bindings, and a process for defining them
- Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
I think we also need to make a decision here on whether we want to use PRP0001 devices on ARM64 servers, and to what degree. I would prefer if we could either make them required for any devices that already have a DT binding and that are not part of the official ACPI spec, or we decide to not use them at all and make any PRP0001 usage a testcase failure.
I am rather concerned about the relationship between items described with _DSD and ACPI's existing device model. Describing the relationship between devices and their input clocks, regulators, and so on defeats much of the benefit ACPI is marketed as providing w.r.t. abstraction of the underlying platform (and as Arnd mentioned above, that's not the kind of platform we want to support with ACPI).
My belief is that all those things should be set up into a known good state by UEFI on initial boot. If they need to change, say as the result of going into a deeper sleep state or something, that's what the ACPI power management objects are for; Linux would execute one of the ACPI methods already defined by the spec to control transition to the desired state, and that method would have within it the ability to change whatever clocks or regulators it deems necessary. The kernel should not have to track these things.
If someone is describing all those relationships in _DSD, I agree that is not the kind of ARM platform I think we want to deal with. This is touched on, iirc, in arm-acpi.txt, but apparently too briefly.
I have not seen good guidelines on the usage of _DSD in that respect, and I'm worried we'll end up with clock controllers half-owned by AML and half-owned by the kernel's clock framework, and this separation varying from board to board (and FW revision to FW revision). I think that needs to be clarified at the ACPI spec level in addition to anything we have in the kernel documentation.
Hrm. The spec (section 6.2.5) basically says that there exists a thing called _DSD and that when invoked, it returns a UUID followed by a data structure. A separate document (on http://www.uefi.org/acpi) for each UUID defines the data structure it corresponds to. The only one defined so far is for device properties:
http://www.uefi.org/sites/default/files/resources/_DSD-implementation-guide-...
I think you are suggesting good guidelines for both, yes? The usage of new UUIDs and data structures, along with the usage of device properties for the UUID we have, right? I've been trying to write such a thing but all I've accomplished so far is throwing away a couple of drafts that got ugly. I'll keep at it, though.
I'm still of the opinion that conflating _DSD and DT is a bad idea.
Could you explain your usage of "conflating" here? I think I understand what you mean, but I'd rather be sure.
On 17 December 2014 at 00:37, Al Stone al.stone@linaro.org wrote:
I have not seen good guidelines on the usage of _DSD in that respect, and I'm worried we'll end up with clock controllers half-owned by AML and half-owned by the kernel's clock framework, and this separation varying from board to board (and FW revision to FW revision). I think that needs to be clarified at the ACPI spec level in addition to anything we have in the kernel documentation.
Hrm. The spec (section 6.2.5) basically says that there exists a thing called _DSD and that when invoked, it returns a UUID followed by a data structure. A separate document (on http://www.uefi.org/acpi) for each UUID defines the data structure it corresponds to. The only one defined so far is for device properties:
http://www.uefi.org/sites/default/files/resources/_DSD-implementation-guide-...
I think you are suggesting good guidelines for both, yes? The usage of new UUIDs and data structures, along with the usage of device properties for the UUID we have, right? I've been trying to write such a thing but all I've accomplished so far is throwing away a couple of drafts that got ugly. I'll keep at it, though.
Of course the way the spec is written also gives us the option, if the OEMs and kernel guys and MS all agree on a different format for aarch64 then all that is needed is a new UUID and we are no longer tied to trying to insert DT into ACPI. I personally think this is the way to go, I do not like the blindly copying DT into ACPI. I suspect the information that a ACPI platform "needs" is probably significantly simplified compared to some of the DT bindings.
Graeme
On Wed, Dec 17, 2014 at 12:37:22AM +0000, Al Stone wrote:
On 12/16/2014 08:48 AM, Mark Rutland wrote:
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
After seeing the AMD patches, I would add to this point that we need to find a way to come up with shared bindings for common hardware such as the ARM pl011/pl022/pl061/pl330 IP blocks or the designware i2c/spi/usb3/ahci blocks.
What I remember from this item on your list is actually a different problem: We need to define more clearly what kinds of machines we would expect ACPI support for and which machines we would not.
Fine-grained clock support is one such example: if anybody needs to expose that through a clock driver in the kernel, we need to be very clear that we will not support that kind of machine through ACPI, so we don't get developers building BIOS images that will never be supported. Other examples would be non-compliant PCI hosts or machines that are not covered by SBSA.
[...]
- How does the kernel handle_DSD usage?
- Problem:
- _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined?
- How do we ensure DT and _DSD bindings remain consistent with each other?
- Solution: public documentation for all bindings, and a process for defining them
- Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
I think we also need to make a decision here on whether we want to use PRP0001 devices on ARM64 servers, and to what degree. I would prefer if we could either make them required for any devices that already have a DT binding and that are not part of the official ACPI spec, or we decide to not use them at all and make any PRP0001 usage a testcase failure.
I am rather concerned about the relationship between items described with _DSD and ACPI's existing device model. Describing the relationship between devices and their input clocks, regulators, and so on defeats much of the benefit ACPI is marketed as providing w.r.t. abstraction of the underlying platform (and as Arnd mentioned above, that's not the kind of platform we want to support with ACPI).
My belief is that all those things should be set up into a known good state by UEFI on initial boot. If they need to change, say as the result of going into a deeper sleep state or something, that's what the ACPI power management objects are for; Linux would execute one of the ACPI methods already defined by the spec to control transition to the desired state, and that method would have within it the ability to change whatever clocks or regulators it deems necessary. The kernel should not have to track these things.
If someone is describing all those relationships in _DSD, I agree that is not the kind of ARM platform I think we want to deal with. This is touched on, iirc, in arm-acpi.txt, but apparently too briefly.
I think we're personally in agreement on the matter. but I'm not sure that's true of all involved parties, nor that they are necessarily aware of these issues.
Some parties only seem to be considering what's documented at the ACPI spec level, rather than what's documented in Linux. At that level, these issues are not touched upon.
I have not seen good guidelines on the usage of _DSD in that respect, and I'm worried we'll end up with clock controllers half-owned by AML and half-owned by the kernel's clock framework, and this separation varying from board to board (and FW revision to FW revision). I think that needs to be clarified at the ACPI spec level in addition to anything we have in the kernel documentation.
Hrm. The spec (section 6.2.5) basically says that there exists a thing called _DSD and that when invoked, it returns a UUID followed by a data structure. A separate document (on http://www.uefi.org/acpi) for each UUID defines the data structure it corresponds to. The only one defined so far is for device properties:
http://www.uefi.org/sites/default/files/resources/_DSD-implementation-guide-...
I think you are suggesting good guidelines for both, yes? The usage of new UUIDs and data structures, along with the usage of device properties for the UUID we have, right?
Yes, along with how _DSD interacts with the ACPI device model.
In the document linked from the document you linked, I spot that GPIOs are described with something along the lines of the DT GPIO bindings (though using ACPI's own references rather than phandles) and for reasons I've described previously and below w.r.t. conflation of DT and _DSD, I'm not a fan.
I'm not keen on all the linux-specific DT properties being blindly copied into _DSD without consideration of their semantics in the context of the rest of the device description, nor whether they will be usable by anything other than Linux.
I note that the example in the document also inconsistently uses "linux,retain-state-suspended" and "retain-state-suspended", which does not fill me with confidence.
I've been trying to write such a thing but all I've accomplished so far is throwing away a couple of drafts that got ugly. I'll keep at it, though.
I'm still of the opinion that conflating _DSD and DT is a bad idea.
Could you explain your usage of "conflating" here? I think I understand what you mean, but I'd rather be sure.
I believe that the idea that any DT binding should be a _DSD binding with common low-level accessors is flawed, as I've mentioned a few times in the past.
I think the two should be treated separately, and that commonalities should be handled at a higher level. While this does mean more code, I believe it's more manageable and results in a far smaller chance of accidentally exposing items as describable in _DSD.
We made mistakes in the way we bound DT to the driver model and exposed certain items unintentionally, and binding _DSD to DT makes the problem worse because now we have three not-quite-related sources of information masquerading as the same thing, with code authors likely only considering a single of these.
Thanks, Mark.
First, thanks Al for organizing this list and conversation. AMD is committed to helping with this effort. We welcome feedback on our document and table entries. Also, I see the following as areas that we can help the most: 3,4,5,7,8. As far as 7. Why ACPI?, it is clear for us - the customers we are talking to are requiring it from us.
Thanks, Sherry
-----Original Message----- From: linaro-acpi-bounces@lists.linaro.org [mailto:linaro-acpi- bounces@lists.linaro.org] On Behalf Of Mark Rutland Sent: Wednesday, December 17, 2014 10:02 AM To: Al Stone Cc: Arnd Bergmann; linaro-acpi@lists.linaro.org; Catalin Marinas; Rafael J. Wysocki; ACPI Devel Mailing List; Olof Johansson; linux-arm- kernel@lists.infradead.org Subject: Re: [Linaro-acpi] [RFC] ACPI on arm64 TODO List
On Wed, Dec 17, 2014 at 12:37:22AM +0000, Al Stone wrote:
On 12/16/2014 08:48 AM, Mark Rutland wrote:
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
After seeing the AMD patches, I would add to this point that we need to find a way to come up with shared bindings for common hardware such as the ARM pl011/pl022/pl061/pl330 IP blocks or the designware i2c/spi/usb3/ahci blocks.
What I remember from this item on your list is actually a different problem: We need to define more clearly what kinds of machines we would expect ACPI support for and which machines we would not.
Fine-grained clock support is one such example: if anybody needs to expose that through a clock driver in the kernel, we need to be very clear that we will not support that kind of machine through ACPI, so we don't get developers building BIOS images that will never be supported. Other examples would be non-compliant PCI hosts or machines that are not covered by SBSA.
[...]
- How does the kernel handle_DSD usage?
- Problem:
- _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined?
- How do we ensure DT and _DSD bindings remain consistent with each other?
- Solution: public documentation for all bindings, and a process for defining them
- Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
I think we also need to make a decision here on whether we want to use PRP0001 devices on ARM64 servers, and to what degree. I would prefer if we could either make them required for any devices that already have a DT binding and that are not part of the official ACPI spec, or we decide to not use them at all and make any PRP0001 usage a testcase failure.
I am rather concerned about the relationship between items described with _DSD and ACPI's existing device model. Describing the relationship between devices and their input clocks, regulators, and so on defeats much of the benefit ACPI is marketed as providing w.r.t. abstraction of the underlying platform (and as Arnd mentioned above, that's not the kind of platform we want to support with ACPI).
My belief is that all those things should be set up into a known good state by UEFI on initial boot. If they need to change, say as the result of going into a deeper sleep state or something, that's what the ACPI power management objects are for; Linux would execute one of the ACPI methods already defined by the spec to control transition to the desired state, and that method would have within it the ability to change whatever clocks or regulators it deems necessary. The kernel should not have to track these things.
If someone is describing all those relationships in _DSD, I agree that is not the kind of ARM platform I think we want to deal with. This is touched on, iirc, in arm-acpi.txt, but apparently too briefly.
I think we're personally in agreement on the matter. but I'm not sure that's true of all involved parties, nor that they are necessarily aware of these issues.
Some parties only seem to be considering what's documented at the ACPI spec level, rather than what's documented in Linux. At that level, these issues are not touched upon.
I have not seen good guidelines on the usage of _DSD in that respect, and I'm worried we'll end up with clock controllers half-owned by AML and half-owned by the kernel's clock framework, and this separation varying from board to board (and FW revision to FW revision). I think that needs to be clarified at the ACPI spec level in addition to anything we have in the kernel documentation.
Hrm. The spec (section 6.2.5) basically says that there exists a thing called _DSD and that when invoked, it returns a UUID followed by a data structure. A separate document (on http://www.uefi.org/acpi) for each UUID defines the data structure it corresponds to. The only one defined so far is for device properties:
http://www.uefi.org/sites/default/files/resources/_DSD-implementation- guide-toplevel.htm
I think you are suggesting good guidelines for both, yes? The usage of new UUIDs and data structures, along with the usage of device properties for the UUID we have, right?
Yes, along with how _DSD interacts with the ACPI device model.
In the document linked from the document you linked, I spot that GPIOs are described with something along the lines of the DT GPIO bindings (though using ACPI's own references rather than phandles) and for reasons I've described previously and below w.r.t. conflation of DT and _DSD, I'm not a fan.
I'm not keen on all the linux-specific DT properties being blindly copied into _DSD without consideration of their semantics in the context of the rest of the device description, nor whether they will be usable by anything other than Linux.
I note that the example in the document also inconsistently uses "linux,retain- state-suspended" and "retain-state-suspended", which does not fill me with confidence.
I've been trying to write such a thing but all I've accomplished so far is throwing away a couple of drafts that got ugly. I'll keep at it, though.
I'm still of the opinion that conflating _DSD and DT is a bad idea.
Could you explain your usage of "conflating" here? I think I understand what you mean, but I'd rather be sure.
I believe that the idea that any DT binding should be a _DSD binding with common low-level accessors is flawed, as I've mentioned a few times in the past.
I think the two should be treated separately, and that commonalities should be handled at a higher level. While this does mean more code, I believe it's more manageable and results in a far smaller chance of accidentally exposing items as describable in _DSD.
We made mistakes in the way we bound DT to the driver model and exposed certain items unintentionally, and binding _DSD to DT makes the problem worse because now we have three not-quite-related sources of information masquerading as the same thing, with code authors likely only considering a single of these.
Thanks, Mark.
Linaro-acpi mailing list Linaro-acpi@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-acpi
On Wed, Dec 17, 2014 at 12:37:22AM +0000, Al Stone wrote:
[...]
I am rather concerned about the relationship between items described with _DSD and ACPI's existing device model. Describing the relationship between devices and their input clocks, regulators, and so on defeats much of the benefit ACPI is marketed as providing w.r.t. abstraction of the underlying platform (and as Arnd mentioned above, that's not the kind of platform we want to support with ACPI).
My belief is that all those things should be set up into a known good state by UEFI on initial boot. If they need to change, say as the result of going into a deeper sleep state or something, that's what the ACPI power management objects are for; Linux would execute one of the ACPI methods already defined by the spec to control transition to the desired state, and that method would have within it the ability to change whatever clocks or regulators it deems necessary. The kernel should not have to track these things.
If someone is describing all those relationships in _DSD, I agree that is not the kind of ARM platform I think we want to deal with. This is touched on, iirc, in arm-acpi.txt, but apparently too briefly.
I would rather rule out using _DSD to describe clocks, regulators and their relationships in ACPI, period. There are established ACPI methods to control power for devices and processors and if we adopt ACPI those must be used, as you say (I am not saying they are perfect, given that x86 guys moved away from them when they could, eg CPU idle).
They can certainly be enhanced, as it is happening for idle states description, for instance.
I understand one reason for ACPI adoption is its established way of handling power management for processors and devices, if for any reason people who want ACPI on ARM require _DSD to describe clocks and regulators relationships then they have to explain to us why they require ACPI in the first place, given that at the moment those clocks and regulators descriptions are non-existent in ACPI world.
Lorenzo
On 12/16/14, 7:37 PM, Al Stone wrote:
On 12/16/2014 08:48 AM, Mark Rutland wrote:
I am rather concerned about the relationship between items described with _DSD and ACPI's existing device model. Describing the relationship between devices and their input clocks, regulators, and so on defeats much of the benefit ACPI is marketed as providing w.r.t. abstraction of the underlying platform (and as Arnd mentioned above, that's not the kind of platform we want to support with ACPI).
My belief is that all those things should be set up into a known good state by UEFI on initial boot.
Correct. There should never be a situation in which any clocks are explicitly exposed in the DSDT. Clock state should be set as a side effect of calling ACPI methods to transition device state. _DSD is intended to implement simple additions to the core spec, such as metadata describing the MAC address, interface type, and PHY on a network device. But it should never be used to expose clock nets.
I've spoken with nearly everyone building a 64-bit ARM server using ACPI and in nearly every case also reviewed their tables. Nobody is going to be so foolish on day one. The trick to some of the ongoing discussion/planning is to ensure that guidance prevents mistakes later.
Jon.
On 12/18/14, 12:04 AM, Jon Masters wrote:
On 12/16/14, 7:37 PM, Al Stone wrote:
On 12/16/2014 08:48 AM, Mark Rutland wrote:
I am rather concerned about the relationship between items described with _DSD and ACPI's existing device model. Describing the relationship between devices and their input clocks, regulators, and so on defeats much of the benefit ACPI is marketed as providing w.r.t. abstraction of the underlying platform (and as Arnd mentioned above, that's not the kind of platform we want to support with ACPI).
My belief is that all those things should be set up into a known good state by UEFI on initial boot.
Correct. There should never be a situation in which any clocks are explicitly exposed in the DSDT. Clock state should be set as a side effect of calling ACPI methods to transition device state. _DSD is intended to implement simple additions to the core spec, such as metadata describing the MAC address, interface type, and PHY on a network device. But it should never be used to expose clock nets.
I've spoken with nearly everyone building a 64-bit ARM server using ACPI and in nearly every case also reviewed their tables. Nobody is going to be so foolish on day one. The trick to some of the ongoing discussion/planning is to ensure that guidance prevents mistakes later.
Btw a little clarification. RH of course has commercial relations with many of those building first/second/third generation 64-bit ARM server designs. In that capacity we have reviewed nearly every SoC design as well as the firmware platform (including ACPI tables). We have a vested interest in ensuring that nobody builds anything that is crazy, and we're not the only OS vendor that is working to ensure such sanity.
Jon.
On 12/16/2014 04:27 AM, Arnd Bergmann wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
Below is what we currently know about; very brief notes on status are included. The TL;DR versions of the TODO list and the current status can be found at:
https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/CoreUpstreamNotes
[snip...]
TODO List for ACPI on arm64:
- Define how Aarch64 OS identifies itself to firmware.
- Problem:
- _OSI method is demonstrably unreliable. On x86 Linux claims to be Windows
- Proposal to use _OSC method as replacement is complicated and creates an explosion of combinations
- Solution:
- Draft and propose OS identification rules to ABST and ASWG for inclusion in ACPI spec.
- Draft and propose recommended practice for current ACPI 5.1 spec platforms.
- Status: Little progress, still under investigation
I must have missed the problem with _OSC, it sounded like it was good enough, but I have no clue about the details.
The _OSC object has two flavors: a global version (aka, _SB._OSC) and a per device version. In a way, these are similar to _DSD. In both of these cases, when the object is invoked, it returns a data packet that starts with a UUID, and some data blob. The UUID is associated with a specific documented data format that tells the end user of the data packet what the structure of the data is and how it is to be used.
So, for _OSC, we currently have two data formats -- i.e., two UUIDs -- defined. One format is a series of bit fields letting the OS and the firmware communicate what global ACPI features are available and who controls them; this is the global _OSC. The second format describes PCI host bridges and are only to be used for specific devices (and has been around for some time).
The first question is: is the global _OSC sufficient? What it has today is currently necessary, but does it have everything ARM needs? I think the answer is yes, but it needs some more thinking yet.
The second question is: how do we keep _OSC UUIDs from proliferating? Or should we? If we do, how to we keep them from getting out of hand and keep them usable across OSs? Again, this is much like _DSD and we need to think that through.
- Linux must choose DT booting by default when offered both ACPI and DT on arm64
- DONE
I've asked Hanjun to re-examine the patch we have. If the setup_arch() code for arm64 goes something like this:
If acpi=force (or =on?) If ACPI tables exist use ACPI Else if a DT exists use DT, but issue a warning that ACPI tables are missing Else panic, 'cause we got nothin' Endif Else If a DT exists use DT Else if ACPI tables exist use ACPI, but issue a warning that the DT is missing Else panic, 'cause we got nothin' Endif Endif
...is that better? Or worse? Or just far too kind to people trying to boot a kernel :)?
- Set clear expectations for those providing ACPI for use with Linux
- Problem:
- Hardware/Firmware vendors can and will create ACPI tables that cannot be used by Linux without some guidance
- Kernel developers cannot determine whether the kernel or firmware is broken without knowing what the firmware should do
- Solution: document the expectations, and iterate as needed. Enforce when we must.
- Status: initial kernel text available; AMD has offered to make their guidance document generic; firmware summit planned for deeper discussions.
After seeing the AMD patches, I would add to this point that we need to find a way to come up with shared bindings for common hardware such as the ARM pl011/pl022/pl061/pl330 IP blocks or the designware i2c/spi/usb3/ahci blocks.
Agreed. Discussions with ARM (started by AMD) are underway to settle this out.
What I remember from this item on your list is actually a different problem: We need to define more clearly what kinds of machines we would expect ACPI support for and which machines we would not.
Right. I need to think through some text to add that sets out a sort of vision, if you will, of such machines.
Fine-grained clock support is one such example: if anybody needs to expose that through a clock driver in the kernel, we need to be very clear that we will not support that kind of machine through ACPI, so we don't get developers building BIOS images that will never be supported. Other examples would be non-compliant PCI hosts or machines that are not covered by SBSA.
Yup. More text needed, I reckon. I know we've talked about such things in the past, but I need to make sure it is written down somewhere proper.
- Demonstrate the ACPI core patches work
- Problem: how can we be sure the patches work?
- Solution: verify the patches on arm64 server platforms
- Status:
- ACPI core works on at least the Foundation model, Juno, APM Mustang, and AMD Seattle
- FWTS results will be published as soon as possible
I think the problem is to a lesser degree the verification of the patches, but to have a patch set that demonstrates /how/ everything can work, and what the possible limitations are. I would not worry about any bugs that might keep the system from working properly, as long as you can show that there is a plan to make that work.
Indeed. Perhaps the item above needs to be folded together with the platform item at the end, where that's exactly the idea -- work out the _how_ for the platform.
Out of the four platforms you list, I think we have concluded that three of them are not appropriate for use with ACPI, but in order to do that, we needed to review the patches and pinpoint specific issues so we could avoid just exchanging different opinions on the matter of it "works" or not.
- How does the kernel handle_DSD usage?
- Problem:
- _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined?
- How do we ensure DT and _DSD bindings remain consistent with each other?
- Solution: public documentation for all bindings, and a process for defining them
- Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
I think we also need to make a decision here on whether we want to use PRP0001 devices on ARM64 servers, and to what degree. I would prefer if we could either make them required for any devices that already have a DT binding and that are not part of the official ACPI spec, or we decide to not use them at all and make any PRP0001 usage a testcase failure.
Right. _DSD is a whole long discussion by itself, I think. There are those who do not want to use it all, those who want to just replicate everything in DT in _DSD, and a whole range in between.
There are really two issues, though, and I want to be clear on that:
1) _DSD when used with the device properties UUID -- this is where we have to figure out how consistent we stay with DT
2) _DSD and other UUIDs that define different data structures (see the notes above on _OSC)
How and when to use both needs to be clearly stated.
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I don't think that it is that hard, but it is complicated. Part of it is that it just seems self-evident to those that use it (true of both DT and ACPI, I suspect), and it can be hard to explain something that seems obvious to you. I think the other part is the business aspects of any sort of public statements for companies that have not had to be very public about anything in the past. But, that's just my USD$0.02 worth. I'm not trying to justify how things have gone so far, but figure out how to make them better. So, I just keep whittling away at this for now, making whatever progress I can.
On Tue, Dec 16, 2014 at 10:55:50PM +0000, Al Stone wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
- Linux must choose DT booting by default when offered both ACPI and DT on arm64
- DONE
I've asked Hanjun to re-examine the patch we have. If the setup_arch() code for arm64 goes something like this:
If acpi=force (or =on?)
acpi=force is already defined, we could use that. But I don't have a problem with introducing =on either.
If ACPI tables exist use ACPI Else if a DT exists use DT, but issue a warning that ACPI tables are missing Else panic, 'cause we got nothin'
While this panic is still needed here to stop the kernel from booting, the problem with it is that it doesn't go anywhere, you could not retrieve any console information. Anyway, it's not worse than what we currently have but in the presence of EFI_STUB, maybe we could also report such error before existing boot services (like "no DT nor ACPI found").
Endif
Else If a DT exists use DT Else if ACPI tables exist use ACPI, but issue a warning that the DT is missing
Why? Maybe this could be temporary but longer term we should treat ACPI and DT as equal citizens, with DT having priority if both are present. I don't think we should require that ACPI tables are always accompanied by DT counterparts.
Else panic, 'cause we got nothin'
Same here with the panic.
Endif
Endif
...is that better? Or worse? Or just far too kind to people trying to boot a kernel :)?
Maybe the kernel should prompt the user: "Are you really sure you want to use ACPI?" ;)
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
- How does the kernel handle_DSD usage?
- Problem:
- _DSD defines key-value properties in the DT style. How do we ensure _DSD bindings are well defined?
- How do we ensure DT and _DSD bindings remain consistent with each other?
- Solution: public documentation for all bindings, and a process for defining them
- Status: proposal to require patch authors to point at public binding documentation; kernel Documentation/devicetree/bindings remains the default if no other location exists; UEFI forum has set up a binding repository.
I think we also need to make a decision here on whether we want to use PRP0001 devices on ARM64 servers, and to what degree. I would prefer if we could either make them required for any devices that already have a DT binding and that are not part of the official ACPI spec, or we decide to not use them at all and make any PRP0001 usage a testcase failure.
Hmmm... having rules specifically for Aarch64 doesn't make a whole lot of sense. Whatever rules we choose for PRP0001 should apply equally regardless of architecture.
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
g.
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
http://www.secretlab.ca/archives/151
Why ACPI on ARM? ----------------
Why are we doing ACPI on ARM? That question has been asked many times, but we haven't yet had a good summary of the most important reasons for wanting ACPI on ARM. This article is an attempt to state the rationale clearly.
During an email conversation late last year, Catalin Marinas asked for a summary of exactly why we want ACPI on ARM, Dong Wei replied with the following list:
- Support multiple OSes, including Linux and Windows
- Support device configurations
- Support dynamic device configurations (hot add/removal)
- Support hardware abstraction through control methods
- Support power management
- Support thermal management
- Support RAS interfaces
The above list is certainly true in that all of them need to be supported. However, that list doesn't give the rationale for choosing ACPI. We already have DT mechanisms for doing most of the above, and can certainly create new bindings for anything that is missing. So, if it isn't an issue of functionality, then how does ACPI differ from DT and why is ACPI a better fit for general purpose ARM servers?
The difference is in the support model. To explain what I mean, I'm first going to expand on each of the items above and discuss the similarities and differences between ACPI and DT. Then, with that as the groundwork, I'll discuss how ACPI is a better fit for the general purpose hardware support model.
Device Configurations --------------------- 2. Support device configurations 3. Support dynamic device configurations (hot add/removal)
From day one, DT was about device configurations. There isn't any
significant difference between ACPI & DT here. In fact, the majority of ACPI tables are completely analogous to DT descriptions. With the exception of the DSDT and SSDT tables, most ACPI tables are merely flat data used to describe hardware.
DT platforms have also supported dynamic configuration and hotplug for years. There isn't a lot here that differentiates between ACPI and DT. The biggest difference is that dynamic changes to the ACPI namespace can be triggered by ACPI methods, whereas for DT changes are received as messages from firmware and have been very much platform specific (e.g. IBM pSeries does this)
Power Management Model ---------------------- 4. Support hardware abstraction through control methods 5. Support power management 6. Support thermal management
Power, thermal, and clock management can all be dealt with as a group. ACPI defines a power management model (OSPM) that both the platform and the OS conform to. The OS implements the OSPM state machine, but the platform can provide state change behaviour in the form of bytecode methods. Methods can access hardware directly or hand off PM operations to a coprocessor. The OS really doesn't have to care about the details as long as the platform obeys the rules of the OSPM model.
With DT, the kernel has device drivers for each and every component in the platform, and configures them using DT data. DT itself doesn't have a PM model. Rather the PM model is an implementation detail of the kernel. Device drivers use DT data to decide how to handle PM state changes. We have clock, pinctrl, and regulator frameworks in the kernel for working out runtime PM. However, this only works when all the drivers and support code have been merged into the kernel. When the kernel's PM model doesn't work for new hardware, then we change the model. This works very well for mobile/embedded because the vendor controls the kernel. We can change things when we need to, but we also struggle with getting board support mainlined.
This difference has a big impact when it comes to OS support. Engineers from hardware vendors, Microsoft, and most vocally Red Hat have all told me bluntly that rebuilding the kernel doesn't work for enterprise OS support. Their model is based around a fixed OS release that ideally boots out-of-the-box. It may still need additional device drivers for specific peripherals/features, but from a system view, the OS works. When additional drivers are provided separately, those drivers fit within the existing OSPM model for power management. This is where ACPI has a technical advantage over DT. The ACPI OSPM model and it's bytecode gives the HW vendors a level of abstraction under their control, not the kernel's. When the hardware behaves differently from what the OS expects, the vendor is able to change the behaviour without changing the HW or patching the OS.
At this point you'd be right to point out that it is harder to get the whole system working correctly when behaviour is split between the kernel and the platform. The OS must trust that the platform doesn't violate the OSPM model. All manner of bad things happen if it does. That is exactly why the DT model doesn't encode behaviour: It is easier to make changes and fix bugs when everything is within the same code base. We don't need a platform/kernel split when we can modify the kernel.
However, the enterprise folks don't have that luxury. The platform/kernel split isn't a design choice. It is a characteristic of the market. Hardware and OS vendors each have their own product timetables, and they don't line up. The timeline for getting patches into the kernel and flowing through into OS releases puts OS support far downstream from the actual release of hardware. Hardware vendors simply cannot wait for OS support to come online to be able to release their products. They need to be able to work with available releases, and make their hardware behave in the way the OS expects. The advantage of ACPI OSPM is that it defines behaviour and limits what the hardware is allowed to do without involving the kernel.
What remains is sorting out how we make sure everything works. How do we make sure there is enough cross platform testing to ensure new hardware doesn't ship broken and that new OS releases don't break on old hardware? Those are the reasons why a UEFI/ACPI firmware summit is being organized, it's why the UEFI forum holds plugfests 3 times a year, and it is why we're working on FWTS and LuvOS.
Reliability, Availability & Serviceability (RAS) ------------------------------------------------ 7. Support RAS interfaces
This isn't a question of whether or not DT can support RAS. Of course it can. Rather it is a matter of RAS bindings already existing for ACPI, including a usage model. We've barely begun to explore this on DT. This item doesn't make ACPI technically superior to DT, but it certainly makes it more mature.
Multiplatform support --------------------- 1. Support multiple OSes, including Linux and Windows
I'm tackling this item last because I think it is the most contentious for those of us in the Linux world. I wanted to get the other issues out of the way before addressing it.
The separation between hardware vendors and OS vendors in the server market is new for ARM. For the first time ARM hardware and OS release cycles are completely decoupled from each other, and neither are expected to have specific knowledge of the other (ie. the hardware vendor doesn't control the choice of OS). ARM and their partners want to create an ecosystem of independent OSes and hardware platforms that don't explicitly require the former to be ported to the latter.
Now, one could argue that Linux is driving the potential market for ARM servers, and therefore Linux is the only thing that matters, but hardware vendors don't see it that way. For hardware vendors it is in their best interest to support as wide a choice of OSes as possible in order to catch the widest potential customer base. Even if the majority choose Linux, some will choose BSD, some will choose Windows, and some will choose something else. Whether or not we think this is foolish is beside the point; it isn't something we have influence over.
During early ARM server planning meetings between ARM, its partners and other industry representatives (myself included) we discussed this exact point. Before us were two options, DT and ACPI. As one of the Linux people in the room, I advised that ACPI's closed governance model was a show stopper for Linux and that DT is the working interface. Microsoft on the other hand made it abundantly clear that ACPI was the only interface that they would support. For their part, the hardware vendors stated the platform abstraction behaviour of ACPI is a hard requirement for their support model and that they would not close the door on either Linux or Windows.
However, the one thing that all of us could agree on was that supporting multiple interfaces doesn't help anyone: It would require twice as much effort on defining bindings (once for Linux-DT and once for Windows-ACPI) and it would require firmware to describe everything twice. Eventually we reached the compromise to use ACPI, but on the condition of opening the governance process to give Linux engineers equal influence over the specification. The fact that we now have a much better seat at the ACPI table, for both ARM and x86, is a direct result of these early ARM server negotiations. We are no longer second class citizens in the ACPI world and are actually driving much of the recent development.
I know that this line of thought is more about market forces rather than a hard technical argument between ACPI and DT, but it is an equally significant one. Agreeing on a single way of doing things is important. The ARM server ecosystem is better for the agreement to use the same interface for all operating systems. This is what is meant by standards compliant. The standard is a codification of the mutually agreed interface. It provides confidence that all vendors are using the same rules for interoperability.
Summary ------- To summarize, here is the short form rationale for ACPI on ARM:
- ACPI's bytecode allows the platform to encode behaviour. DT explicitly does not support this. For hardware vendors, being able to encode behaviour is an important tool for supporting operating system releases on new hardware. - ACPI's OSPM defines a power management model that constrains what the platform is allowed into a specific model while still having flexibility in hardware design. - For enterprise use-cases, ACPI has extablished bindings, such as for RAS, which are used in production. DT does not. Yes, we can define those bindings but doing so means ARM and x86 will use completely different code paths in both firmware and the kernel. - Choosing a single interface for platform/OS abstraction is important. It is not reasonable to require vendors to implement both DT and ACPI if they want to support multiple operating systems. Agreeing on a single interface instead of being fragmented into per-OS interfaces makes for better interoperability overall. - The ACPI governance process works well and we're at the same table as HW vendors and other OS vendors. In fact, there is no longer any reason to feel that ACPI is a Windows thing or that we are playing second fiddle to Microsoft. The move of ACPI governance into the UEFI forum has significantly opened up the processes, and currently, a large portion of the changes being made to ACPI is being driven by Linux.
At the beginning of this article I made the statement that the difference is in the support model. For servers, responsibility for hardware behaviour cannot be purely the domain of the kernel, but rather is split between the platform and the kernel. ACPI frees the OS from needing to understand all the minute details of the hardware so that the OS doesn't need to be ported to each and every device individually. It allows the hardware vendors to take responsibility for PM behaviour without depending on an OS release cycle which it is not under their control.
ACPI is also important because hardware and OS vendors have already worked out how to use it to support the general purpose ecosystem. The infrastructure is in place, the bindings are in place, and the process is in place. DT does exactly what we need it to when working with vertically integrated devices, but we don't have good processes for supporting what the server vendors need. We could potentially get there with DT, but doing so doesn't buy us anything. ACPI already does what the hardware vendors need, Microsoft won't collaborate with us on DT, and the hardware vendors would still need to provide two completely separate firmware interface; one for Linux and one for Windows.
On Saturday 10 January 2015 14:44:02 Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
http://www.secretlab.ca/archives/151
Why ACPI on ARM?
Why are we doing ACPI on ARM? That question has been asked many times, but we haven't yet had a good summary of the most important reasons for wanting ACPI on ARM. This article is an attempt to state the rationale clearly.
Thanks for writing this up, much appreciated. I'd like to comment on some of the points here, which seems easier than commenting on the blog post.
Device Configurations
- Support device configurations
- Support dynamic device configurations (hot add/removal)
...
DT platforms have also supported dynamic configuration and hotplug for years. There isn't a lot here that differentiates between ACPI and DT. The biggest difference is that dynamic changes to the ACPI namespace can be triggered by ACPI methods, whereas for DT changes are received as messages from firmware and have been very much platform specific (e.g. IBM pSeries does this)
This seems like a great fit for AML indeed, but I wonder what exactly we want to hotplug here, since everything I can think of wouldn't need AML support for the specific use case of SBSA compliant servers:
- CPU: I don't think a lot of people outside mainframes consider CPUs to be runtime-serviceable parts, so for practical purposes this would be for power-management purposes triggered by the OS, and we have PSCI for managing the CPUs here. In case of virtual machines, we will actually need hotplugging CPUs into the guest, but this can be done through the existing hypervisor based interfaces for KVM and Xen.
- memory: quite similar, I don't have runtime memory replacement on my radar for normal servers yet, and in virtual machines, we'd use the existing balloon drivers. Memory power management (per-bank self-refresh or powerdown) would be a good use-case but the Linux patches we had for this 5 years ago were never merged and I don't think anybody is working on them any more.
- standard AHCI/OHCI/EHCI/XHCI/PCIe-port/...: these all have register level support for hotplugging and don't need SoC-specific driver support or ACPI, as can easily be verified by hotplugging devices on x86 machines with ACPI turned off.
- nonstandard SATA/USB/PCI-X/PCI-e/...: These are common on embedded ARM SoCs and could to a certain extent be handled using AML, but for good reasons are not allowed by SBSA.
- anything else?
Power Management Model
- Support hardware abstraction through control methods
- Support power management
- Support thermal management
Power, thermal, and clock management can all be dealt with as a group. ACPI defines a power management model (OSPM) that both the platform and the OS conform to. The OS implements the OSPM state machine, but the platform can provide state change behaviour in the form of bytecode methods. Methods can access hardware directly or hand off PM operations to a coprocessor. The OS really doesn't have to care about the details as long as the platform obeys the rules of the OSPM model.
With DT, the kernel has device drivers for each and every component in the platform, and configures them using DT data. DT itself doesn't have a PM model. Rather the PM model is an implementation detail of the kernel. Device drivers use DT data to decide how to handle PM state changes. We have clock, pinctrl, and regulator frameworks in the kernel for working out runtime PM. However, this only works when all the drivers and support code have been merged into the kernel. When the kernel's PM model doesn't work for new hardware, then we change the model. This works very well for mobile/embedded because the vendor controls the kernel. We can change things when we need to, but we also struggle with getting board support mainlined.
I can definitely see this point, but I can also see two important downsides to the ACPI model that need to be considered for an individual implementor:
* As a high-level abstraction, there are limits to how fine-grained the power management can be done, or is implemented in a particular BIOS. The thinner the abstraction, the better the power savings can get when implemented right.
* From the experience with x86, Linux tends to prefer using drivers for hardware registers over the AML based drivers when both are implemented, because of efficiency and correctness.
We should probably discuss at some point how to get the best of both. I really don't like the idea of putting the low-level details that we tend to have DT into ACPI, but there are two things we can do: For systems that have a high-level abstraction for their PM in hardware (e.g. talking to an embedded controller that does the actual work), the ACPI description should contain enough information to implement a kernel-level driver for it as we have on Intel machines. For more traditional SoCs that do everything themselves, I would recommend to always have a working DT for those people wanting to get the most of their hardware. This will also enable any other SoC features that cannot be represented in ACPI.
What remains is sorting out how we make sure everything works. How do we make sure there is enough cross platform testing to ensure new hardware doesn't ship broken and that new OS releases don't break on old hardware? Those are the reasons why a UEFI/ACPI firmware summit is being organized, it's why the UEFI forum holds plugfests 3 times a year, and it is why we're working on FWTS and LuvOS.
Right.
Reliability, Availability & Serviceability (RAS)
- Support RAS interfaces
This isn't a question of whether or not DT can support RAS. Of course it can. Rather it is a matter of RAS bindings already existing for ACPI, including a usage model. We've barely begun to explore this on DT. This item doesn't make ACPI technically superior to DT, but it certainly makes it more mature.
Unfortunately, RAS can mean a lot of things to different people. Is there some high-level description of what the APCI idea of RAS is? On systems I've worked on in the past, this was generally done out of band (e.g. in an IPMI BMC) because you can't really trust the running OS when you report errors that may impact data consistency of that OS.
Multiplatform support
- Support multiple OSes, including Linux and Windows
I'm tackling this item last because I think it is the most contentious for those of us in the Linux world. I wanted to get the other issues out of the way before addressing it.
I know that this line of thought is more about market forces rather than a hard technical argument between ACPI and DT, but it is an equally significant one. Agreeing on a single way of doing things is important. The ARM server ecosystem is better for the agreement to use the same interface for all operating systems. This is what is meant by standards compliant. The standard is a codification of the mutually agreed interface. It provides confidence that all vendors are using the same rules for interoperability.
I do think that this is in fact the most important argument in favor of doing ACPI on Linux, because a number of companies are betting on Windows (or some in-house OS that uses ACPI) support. At the same time, I don't think talking of a single 'ARM server ecosystem' that needs to agree on one interface is helpful here. Each server company has their own business plan and their own constraints. I absolutely think that getting as many companies as possible to agree on SBSA and UEFI is helpful here because it reduces the the differences between the platforms as seen by a distro. For companies that want to support Windows, it's obvious they want to have ACPI on their machines, for others the factors you mention above can be enough to justify the move to ACPI even without Windows support. Then there are other companies for which the tradeoffs are different, and I see no reason for forcing it on them. Finally there are and will likely always be chips that are not built around SBSA and someone will use the chips in creative ways to build servers from them, so we already don't have a homogeneous ecosystem.
Arnd
On Mon, Jan 12, 2015 at 10:21 AM, Arnd Bergmann arnd@arndb.de wrote:
On Saturday 10 January 2015 14:44:02 Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
http://www.secretlab.ca/archives/151
Why ACPI on ARM?
Why are we doing ACPI on ARM? That question has been asked many times, but we haven't yet had a good summary of the most important reasons for wanting ACPI on ARM. This article is an attempt to state the rationale clearly.
Thanks for writing this up, much appreciated. I'd like to comment on some of the points here, which seems easier than commenting on the blog post.
Thanks for reading through it. Replies below...
Device Configurations
- Support device configurations
- Support dynamic device configurations (hot add/removal)
...
DT platforms have also supported dynamic configuration and hotplug for years. There isn't a lot here that differentiates between ACPI and DT. The biggest difference is that dynamic changes to the ACPI namespace can be triggered by ACPI methods, whereas for DT changes are received as messages from firmware and have been very much platform specific (e.g. IBM pSeries does this)
This seems like a great fit for AML indeed, but I wonder what exactly we want to hotplug here, since everything I can think of wouldn't need AML support for the specific use case of SBSA compliant servers:
[...]
I've trimmed the specific examples here because I think that misses the point. The point is that regardless of interface (either ACPI or DT) there are always going to be cases where the data needs to change at runtime. Not all platforms will need to change the CPU data, but some will (say for a machine that detects a failed CPU and removes it). Some PCI add-in boards will carry along with them additional data that needs to be inserted into the ACPI namespace or DT. Some platforms will have system level component (ie. non-PCI) that may not always be accessible.
ACPI has an interface baked in already for tying data changes to events. DT currently needs platform specific support (which we can improve on). I'm not even trying to argue for ACPI over DT in this section, but I included it this document because it is one of the reasons often given for choosing ACPI and I felt it required a more nuanced discussion.
Power Management Model
- Support hardware abstraction through control methods
- Support power management
- Support thermal management
Power, thermal, and clock management can all be dealt with as a group. ACPI defines a power management model (OSPM) that both the platform and the OS conform to. The OS implements the OSPM state machine, but the platform can provide state change behaviour in the form of bytecode methods. Methods can access hardware directly or hand off PM operations to a coprocessor. The OS really doesn't have to care about the details as long as the platform obeys the rules of the OSPM model.
With DT, the kernel has device drivers for each and every component in the platform, and configures them using DT data. DT itself doesn't have a PM model. Rather the PM model is an implementation detail of the kernel. Device drivers use DT data to decide how to handle PM state changes. We have clock, pinctrl, and regulator frameworks in the kernel for working out runtime PM. However, this only works when all the drivers and support code have been merged into the kernel. When the kernel's PM model doesn't work for new hardware, then we change the model. This works very well for mobile/embedded because the vendor controls the kernel. We can change things when we need to, but we also struggle with getting board support mainlined.
I can definitely see this point, but I can also see two important downsides to the ACPI model that need to be considered for an individual implementor:
- As a high-level abstraction, there are limits to how fine-grained the power management can be done, or is implemented in a particular BIOS. The thinner the abstraction, the better the power savings can get when implemented right.
Agreed. That is the tradeoff. OSPM defines a power model, and the machine must restrict any PM behaviour to fit within that power model. This is important for interoperability, but it also leaves performance on the table. ACPI at least gives us the option to pick that performance back up by adding better power management to the drivers, without sacrificing the interoperability provided by OSPM.
In other words, OSPM gets us going, but we can add specific optimizations when required.
Also important: Vendors can choose to not implement any PM into their ACPI tables at all. In this case the the machine would be left running at full tilt. It will be compatible with everything, but it won't be optimized. Then they have the option of loading a PM driver at runtime to optimize the system with the caveat that the PM driver must not be required for the machine to be operational. In this case, as far as the OS is concerned, it is still applying the OSPM state machine, but the OSPM behaviour never changes the state of the hardware.
- From the experience with x86, Linux tends to prefer using drivers for hardware registers over the AML based drivers when both are implemented, because of efficiency and correctness.
We should probably discuss at some point how to get the best of both. I really don't like the idea of putting the low-level details that we tend to have DT into ACPI, but there are two things we can do: For systems that have a high-level abstraction for their PM in hardware (e.g. talking to an embedded controller that does the actual work), the ACPI description should contain enough information to implement a kernel-level driver for it as we have on Intel machines. For more traditional SoCs that do everything themselves, I would recommend to always have a working DT for those people wanting to get the most of their hardware. This will also enable any other SoC features that cannot be represented in ACPI.
The nice thing about ACPI is that we always have the option of ignoring it when the driver knows better since it is always executed under the control of the kernel interpreter. There is no ACPI going off and doing something behind the kernel's back. To start with we have the OSPM state model and devices can use additional ACPI methods as needed, but as an optimization, the driver can do those operations directly if the driver author has enough knowledge about the device.
Reliability, Availability & Serviceability (RAS)
- Support RAS interfaces
This isn't a question of whether or not DT can support RAS. Of course it can. Rather it is a matter of RAS bindings already existing for ACPI, including a usage model. We've barely begun to explore this on DT. This item doesn't make ACPI technically superior to DT, but it certainly makes it more mature.
Unfortunately, RAS can mean a lot of things to different people. Is there some high-level description of what the APCI idea of RAS is? On systems I've worked on in the past, this was generally done out of band (e.g. in an IPMI BMC) because you can't really trust the running OS when you report errors that may impact data consistency of that OS.
RAS is also something where every company already has something that they are using on their x86 machines. Those interfaces are being ported over to the ARM platforms and will be equivalent to what they already do for x86. So, for example, an ARM server from DELL will use mostly the same RAS interfaces as an x86 server from DELL.
Multiplatform support
- Support multiple OSes, including Linux and Windows
I'm tackling this item last because I think it is the most contentious for those of us in the Linux world. I wanted to get the other issues out of the way before addressing it.
I know that this line of thought is more about market forces rather than a hard technical argument between ACPI and DT, but it is an equally significant one. Agreeing on a single way of doing things is important. The ARM server ecosystem is better for the agreement to use the same interface for all operating systems. This is what is meant by standards compliant. The standard is a codification of the mutually agreed interface. It provides confidence that all vendors are using the same rules for interoperability.
I do think that this is in fact the most important argument in favor of doing ACPI on Linux, because a number of companies are betting on Windows (or some in-house OS that uses ACPI) support. At the same time, I don't think talking of a single 'ARM server ecosystem' that needs to agree on one interface is helpful here. Each server company has their own business plan and their own constraints. I absolutely think that getting as many companies as possible to agree on SBSA and UEFI is helpful here because it reduces the the differences between the platforms as seen by a distro. For companies that want to support Windows, it's obvious they want to have ACPI on their machines, for others the factors you mention above can be enough to justify the move to ACPI even without Windows support. Then there are other companies for which the tradeoffs are different, and I see no reason for forcing it on them. Finally there are and will likely always be chips that are not built around SBSA and someone will use the chips in creative ways to build servers from them, so we already don't have a homogeneous ecosystem.
Allow me to clarify my position here. This entire document is about why ACPI was chosen for the ARM SBBR specification. The SBBR and the SBSA are important because they document the agreements and compromises made by vendors and industry representatives to get interoperability. It is a tool for vendors to say that they are aiming for compatibility with a particularly hardware/software ecosystem.
*Nobody* is forced to implement these specifications. Any company is free to ignore them and go their own way. The tradeoff in doing so is it means they are on their own for support. Non-compliant hardware vendors have to convince OS vendors to support them, and similarly, non-compliant OS vendors need to convince hardware vendors of the same. Red Had has stated very clearly that they won't support any hardware that isn't SBSA/SBBR compliant. So has Microsoft. Canonical on the other hand has said they will support whatever if there is a business case. This certainly is a business decision and each company needs to make its own choices.
As far as we (Linux maintainers) are concerned, we've also been really clear that DT is not a second class citizen to ACPI. Mainline cannot and should not force certain classes of machines to use ACPI and other classes of machines to use DT. As long as the code is well written and conforms to our rules for what ACPI or DT code is allowed to do, then we should be happy to take the patches.
g.
On Monday 12 January 2015 12:00:31 Grant Likely wrote:
On Mon, Jan 12, 2015 at 10:21 AM, Arnd Bergmann arnd@arndb.de wrote:
On Saturday 10 January 2015 14:44:02 Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
This seems like a great fit for AML indeed, but I wonder what exactly we want to hotplug here, since everything I can think of wouldn't need AML support for the specific use case of SBSA compliant servers:
[...]
I've trimmed the specific examples here because I think that misses the point. The point is that regardless of interface (either ACPI or DT) there are always going to be cases where the data needs to change at runtime. Not all platforms will need to change the CPU data, but some will (say for a machine that detects a failed CPU and removes it). Some PCI add-in boards will carry along with them additional data that needs to be inserted into the ACPI namespace or DT. Some platforms will have system level component (ie. non-PCI) that may not always be accessible.
Just to be sure I get this right: do you mean runtime or boot-time (re-)configuration for those?
ACPI has an interface baked in already for tying data changes to events. DT currently needs platform specific support (which we can improve on). I'm not even trying to argue for ACPI over DT in this section, but I included it this document because it is one of the reasons often given for choosing ACPI and I felt it required a more nuanced discussion.
I can definitely see the need for an architected interface for dynamic reconfiguration in cases like this, and I think the ACPI model actually does this better than the IBM Power hypervisor model, I just didn't see the need on servers as opposed to something like a laptop docking station to give a more obvious example I know from x86.
- From the experience with x86, Linux tends to prefer using drivers for hardware registers over the AML based drivers when both are implemented, because of efficiency and correctness.
We should probably discuss at some point how to get the best of both. I really don't like the idea of putting the low-level details that we tend to have DT into ACPI, but there are two things we can do: For systems that have a high-level abstraction for their PM in hardware (e.g. talking to an embedded controller that does the actual work), the ACPI description should contain enough information to implement a kernel-level driver for it as we have on Intel machines. For more traditional SoCs that do everything themselves, I would recommend to always have a working DT for those people wanting to get the most of their hardware. This will also enable any other SoC features that cannot be represented in ACPI.
The nice thing about ACPI is that we always have the option of ignoring it when the driver knows better since it is always executed under the control of the kernel interpreter. There is no ACPI going off and doing something behind the kernel's back. To start with we have the OSPM state model and devices can use additional ACPI methods as needed, but as an optimization, the driver can do those operations directly if the driver author has enough knowledge about the device.
Ok, makes sense.
Reliability, Availability & Serviceability (RAS)
- Support RAS interfaces
This isn't a question of whether or not DT can support RAS. Of course it can. Rather it is a matter of RAS bindings already existing for ACPI, including a usage model. We've barely begun to explore this on DT. This item doesn't make ACPI technically superior to DT, but it certainly makes it more mature.
Unfortunately, RAS can mean a lot of things to different people. Is there some high-level description of what the APCI idea of RAS is? On systems I've worked on in the past, this was generally done out of band (e.g. in an IPMI BMC) because you can't really trust the running OS when you report errors that may impact data consistency of that OS.
RAS is also something where every company already has something that they are using on their x86 machines. Those interfaces are being ported over to the ARM platforms and will be equivalent to what they already do for x86. So, for example, an ARM server from DELL will use mostly the same RAS interfaces as an x86 server from DELL.
Right, I'm still curious about what those are, in case we have to add DT bindings for them as well.
I do think that this is in fact the most important argument in favor of doing ACPI on Linux, because a number of companies are betting on Windows (or some in-house OS that uses ACPI) support. At the same time, I don't think talking of a single 'ARM server ecosystem' that needs to agree on one interface is helpful here. Each server company has their own business plan and their own constraints. I absolutely think that getting as many companies as possible to agree on SBSA and UEFI is helpful here because it reduces the the differences between the platforms as seen by a distro. For companies that want to support Windows, it's obvious they want to have ACPI on their machines, for others the factors you mention above can be enough to justify the move to ACPI even without Windows support. Then there are other companies for which the tradeoffs are different, and I see no reason for forcing it on them. Finally there are and will likely always be chips that are not built around SBSA and someone will use the chips in creative ways to build servers from them, so we already don't have a homogeneous ecosystem.
Allow me to clarify my position here. This entire document is about why ACPI was chosen for the ARM SBBR specification.
I thought it was about why we should merge ACPI support into the kernel, which seems to me like a different thing.
As far as we (Linux maintainers) are concerned, we've also been really clear that DT is not a second class citizen to ACPI. Mainline cannot and should not force certain classes of machines to use ACPI and other classes of machines to use DT. As long as the code is well written and conforms to our rules for what ACPI or DT code is allowed to do, then we should be happy to take the patches.
What we are still missing though is a recommendation for a boot protocol. The UEFI bits in SBBR are generally useful for having compatibility across machines that we support in the kernel regardless of the device description, and we also need to have guidelines along the lines of "if you do ACPI, then do it like this" that are in SBBR. However, the way that these two are coupled into "you have to use ACPI and UEFI this way to build a compliant server" really does make the document much less useful for Linux.
Arnd
On Mon, Jan 12, 2015 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 12 January 2015 12:00:31 Grant Likely wrote:
On Mon, Jan 12, 2015 at 10:21 AM, Arnd Bergmann arnd@arndb.de wrote:
On Saturday 10 January 2015 14:44:02 Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
This seems like a great fit for AML indeed, but I wonder what exactly we want to hotplug here, since everything I can think of wouldn't need AML support for the specific use case of SBSA compliant servers:
[...]
I've trimmed the specific examples here because I think that misses the point. The point is that regardless of interface (either ACPI or DT) there are always going to be cases where the data needs to change at runtime. Not all platforms will need to change the CPU data, but some will (say for a machine that detects a failed CPU and removes it). Some PCI add-in boards will carry along with them additional data that needs to be inserted into the ACPI namespace or DT. Some platforms will have system level component (ie. non-PCI) that may not always be accessible.
Just to be sure I get this right: do you mean runtime or boot-time (re-)configuration for those?
Both are important.
ACPI has an interface baked in already for tying data changes to events. DT currently needs platform specific support (which we can improve on). I'm not even trying to argue for ACPI over DT in this section, but I included it this document because it is one of the reasons often given for choosing ACPI and I felt it required a more nuanced discussion.
I can definitely see the need for an architected interface for dynamic reconfiguration in cases like this, and I think the ACPI model actually does this better than the IBM Power hypervisor model, I just didn't see the need on servers as opposed to something like a laptop docking station to give a more obvious example I know from x86.
- From the experience with x86, Linux tends to prefer using drivers for hardware registers over the AML based drivers when both are implemented, because of efficiency and correctness.
We should probably discuss at some point how to get the best of both. I really don't like the idea of putting the low-level details that we tend to have DT into ACPI, but there are two things we can do: For systems that have a high-level abstraction for their PM in hardware (e.g. talking to an embedded controller that does the actual work), the ACPI description should contain enough information to implement a kernel-level driver for it as we have on Intel machines. For more traditional SoCs that do everything themselves, I would recommend to always have a working DT for those people wanting to get the most of their hardware. This will also enable any other SoC features that cannot be represented in ACPI.
The nice thing about ACPI is that we always have the option of ignoring it when the driver knows better since it is always executed under the control of the kernel interpreter. There is no ACPI going off and doing something behind the kernel's back. To start with we have the OSPM state model and devices can use additional ACPI methods as needed, but as an optimization, the driver can do those operations directly if the driver author has enough knowledge about the device.
Ok, makes sense.
Reliability, Availability & Serviceability (RAS)
- Support RAS interfaces
This isn't a question of whether or not DT can support RAS. Of course it can. Rather it is a matter of RAS bindings already existing for ACPI, including a usage model. We've barely begun to explore this on DT. This item doesn't make ACPI technically superior to DT, but it certainly makes it more mature.
Unfortunately, RAS can mean a lot of things to different people. Is there some high-level description of what the APCI idea of RAS is? On systems I've worked on in the past, this was generally done out of band (e.g. in an IPMI BMC) because you can't really trust the running OS when you report errors that may impact data consistency of that OS.
RAS is also something where every company already has something that they are using on their x86 machines. Those interfaces are being ported over to the ARM platforms and will be equivalent to what they already do for x86. So, for example, an ARM server from DELL will use mostly the same RAS interfaces as an x86 server from DELL.
Right, I'm still curious about what those are, in case we have to add DT bindings for them as well.
Certainly.
I do think that this is in fact the most important argument in favor of doing ACPI on Linux, because a number of companies are betting on Windows (or some in-house OS that uses ACPI) support. At the same time, I don't think talking of a single 'ARM server ecosystem' that needs to agree on one interface is helpful here. Each server company has their own business plan and their own constraints. I absolutely think that getting as many companies as possible to agree on SBSA and UEFI is helpful here because it reduces the the differences between the platforms as seen by a distro. For companies that want to support Windows, it's obvious they want to have ACPI on their machines, for others the factors you mention above can be enough to justify the move to ACPI even without Windows support. Then there are other companies for which the tradeoffs are different, and I see no reason for forcing it on them. Finally there are and will likely always be chips that are not built around SBSA and someone will use the chips in creative ways to build servers from them, so we already don't have a homogeneous ecosystem.
Allow me to clarify my position here. This entire document is about why ACPI was chosen for the ARM SBBR specification.
I thought it was about why we should merge ACPI support into the kernel, which seems to me like a different thing.
Nope! I'm not trying to make that argument here. This document is primarily to document the rationale for choosing ACPI in the ARM server SBBR document (from a Linux developer's perspective, granted).
I'll make arguments about actually merging the patches in a different email. :-)
As far as we (Linux maintainers) are concerned, we've also been really clear that DT is not a second class citizen to ACPI. Mainline cannot and should not force certain classes of machines to use ACPI and other classes of machines to use DT. As long as the code is well written and conforms to our rules for what ACPI or DT code is allowed to do, then we should be happy to take the patches.
What we are still missing though is a recommendation for a boot protocol. The UEFI bits in SBBR are generally useful for having compatibility across machines that we support in the kernel regardless of the device description, and we also need to have guidelines along the lines of "if you do ACPI, then do it like this" that are in SBBR. However, the way that these two are coupled into "you have to use ACPI and UEFI this way to build a compliant server" really does make the document much less useful for Linux.
I don't follow your argument. Exactly what problem do you have with "You have to use ACPI and UEFI" to be compliant with the SBBR document? Vendors absolutely have the choice to ignore those documents, but doing so means they are explicitly rejecting the platform that ARM has defined for server machines, and by extension, explicitly rejecting the ecosystem and interoperability that goes with it.
On the UEFI front, I don't see a problem. Linux mainline has the UEFI stub on ARM, and that is the boot protocol.
For UEFI providing DT, the interface is set. We defined it, and it works, but that is not part of the ARM server ecosystem as defined by ARM. Why would the SBBR cover it?
My perspective is that Linux should support the SBSA+SBBR ecosystem, but we don't need to be exclusive about it. We'll happily support UEFI+DT platforms as well as UEFI+ACPI.
g.
On 01/13/2015 10:22 AM, Grant Likely wrote:
On Mon, Jan 12, 2015 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 12 January 2015 12:00:31 Grant Likely wrote:
On Mon, Jan 12, 2015 at 10:21 AM, Arnd Bergmann arnd@arndb.de wrote:
On Saturday 10 January 2015 14:44:02 Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
This seems like a great fit for AML indeed, but I wonder what exactly we want to hotplug here, since everything I can think of wouldn't need AML support for the specific use case of SBSA compliant servers:
[...]
I've trimmed the specific examples here because I think that misses the point. The point is that regardless of interface (either ACPI or DT) there are always going to be cases where the data needs to change at runtime. Not all platforms will need to change the CPU data, but some will (say for a machine that detects a failed CPU and removes it). Some PCI add-in boards will carry along with them additional data that needs to be inserted into the ACPI namespace or DT. Some platforms will have system level component (ie. non-PCI) that may not always be accessible.
Just to be sure I get this right: do you mean runtime or boot-time (re-)configuration for those?
Both are important.
ACPI has an interface baked in already for tying data changes to events. DT currently needs platform specific support (which we can improve on). I'm not even trying to argue for ACPI over DT in this section, but I included it this document because it is one of the reasons often given for choosing ACPI and I felt it required a more nuanced discussion.
I can definitely see the need for an architected interface for dynamic reconfiguration in cases like this, and I think the ACPI model actually does this better than the IBM Power hypervisor model, I just didn't see the need on servers as opposed to something like a laptop docking station to give a more obvious example I know from x86.
I know of at least one server product (non-ARM) that uses the hot-plugging of CPUs and memory as a key feature, using the ACPI OSPM model. Essentially, the customer buys a system with a number of slots and pays for filling one or more of them up front. As the need for capacity increases, CPUs and/or RAM gets enabled; i.e., you have spare capacity that you buy as you need it. If you use up all the CPUs and RAM you have, you buy more cards, fill the additional slots, and turn on what you need. This is very akin to the virtual machine model, but done with real hardware instead.
Whether or not this product is still being sold, I do not know. I have not worked for that company for eight years, and they were just coming out as I left. Regardless, this sort of hot-plug does make sense in the server world, and has been used in shipping products.
[snip....]
Reliability, Availability & Serviceability (RAS)
- Support RAS interfaces
This isn't a question of whether or not DT can support RAS. Of course it can. Rather it is a matter of RAS bindings already existing for ACPI, including a usage model. We've barely begun to explore this on DT. This item doesn't make ACPI technically superior to DT, but it certainly makes it more mature.
Unfortunately, RAS can mean a lot of things to different people. Is there some high-level description of what the APCI idea of RAS is? On systems I've worked on in the past, this was generally done out of band (e.g. in an IPMI BMC) because you can't really trust the running OS when you report errors that may impact data consistency of that OS.
RAS is also something where every company already has something that they are using on their x86 machines. Those interfaces are being ported over to the ARM platforms and will be equivalent to what they already do for x86. So, for example, an ARM server from DELL will use mostly the same RAS interfaces as an x86 server from DELL.
Right, I'm still curious about what those are, in case we have to add DT bindings for them as well.
Certainly.
In ACPI terms, the features used are called APEI (Advanced Platform Error Interface), and defined in Section 18 of the specification. The tables describe what the possible error sources are, where details about the error are stored, and what to do when the errors occur. A lot of the "RAS tools" out there that report and/or analyze error data rely on this information being reported in the form given by the spec.
I only put "RAS tools" in quotes because it is indeed a very loosely defined term -- I've had everything from webmin to SNMP to ganglia, nagios and Tivoli described to me as a RAS tool. In all of those cases, however, the basic idea was to capture errors as they occur, and try to manage them properly. That is, replace disks that seem to be heading down hill, or look for faults in RAM, or dropped packets on LANs -- anything that could help me avoid a catastrophic failure by doing some preventive maintenance up front.
And indeed a BMC is often used for handling errors in servers, or to report errors out to something like nagios or ganglia. It could also just be a log in a bit of NVRAM, too, with a little daemon that reports back somewhere. But, this is why APEI is used: it tries to provide a well defined interface between those reporting the error (firmware, hardware, OS, ...) and those that need to act on the error (the BMC, the OS, or even other bits of firmware).
Does that help satisfy the curiosity a bit?
BTW, there are also some nice tools from ACPICA that, if enabled, allow one to simulate the occurrence of an error and test out the response. What you can do is define the error source and what response you want the OSPM to take (HEST, or Hardware Error Source Table), then use the EINJ (Error Injection) table to describe how to simulate the error having occurred. You then tell ACPICA to "run" the EINJ and test how the system actually responds. You can do this with many EINJ tables, too, so you can experiment with or debug APEI tables as you develop them.
On 2015年01月14日 08:26, Al Stone wrote:
On 01/13/2015 10:22 AM, Grant Likely wrote:
On Mon, Jan 12, 2015 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 12 January 2015 12:00:31 Grant Likely wrote:
On Mon, Jan 12, 2015 at 10:21 AM, Arnd Bergmann arnd@arndb.de wrote:
On Saturday 10 January 2015 14:44:02 Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
This seems like a great fit for AML indeed, but I wonder what exactly we want to hotplug here, since everything I can think of wouldn't need AML support for the specific use case of SBSA compliant servers:
[...]
I've trimmed the specific examples here because I think that misses the point. The point is that regardless of interface (either ACPI or DT) there are always going to be cases where the data needs to change at runtime. Not all platforms will need to change the CPU data, but some will (say for a machine that detects a failed CPU and removes it). Some PCI add-in boards will carry along with them additional data that needs to be inserted into the ACPI namespace or DT. Some platforms will have system level component (ie. non-PCI) that may not always be accessible.
Just to be sure I get this right: do you mean runtime or boot-time (re-)configuration for those?
Both are important.
ACPI has an interface baked in already for tying data changes to events. DT currently needs platform specific support (which we can improve on). I'm not even trying to argue for ACPI over DT in this section, but I included it this document because it is one of the reasons often given for choosing ACPI and I felt it required a more nuanced discussion.
I can definitely see the need for an architected interface for dynamic reconfiguration in cases like this, and I think the ACPI model actually does this better than the IBM Power hypervisor model, I just didn't see the need on servers as opposed to something like a laptop docking station to give a more obvious example I know from x86.
I know of at least one server product (non-ARM) that uses the hot-plugging of CPUs and memory as a key feature, using the ACPI OSPM model. Essentially, the customer buys a system with a number of slots and pays for filling one or more of them up front. As the need for capacity increases, CPUs and/or RAM gets enabled; i.e., you have spare capacity that you buy as you need it. If you use up all the CPUs and RAM you have, you buy more cards, fill the additional slots, and turn on what you need. This is very akin to the virtual machine model, but done with real hardware instead.
There is another important user case for RAS, systems running critical missions such as bank billing system, system like that need high reliability that the machine can't be stopped.
So when error happened on hardware including CPU/memory DIMM on such machines, we need to replace them at run-time.
Whether or not this product is still being sold, I do not know. I have not worked for that company for eight years, and they were just coming out as I left. Regardless, this sort of hot-plug does make sense in the server world, and has been used in shipping products.
I think it still will be, Linux developers put lots of effort to enable memory hotplug and computer node hotplug in the kernel [1], and the code already merged into mainline.
[1]: http://events.linuxfoundation.org/sites/events/files/lcjp13_chen.pdf
Thanks Hanjun
On Thursday 15 January 2015 12:07:45 Hanjun Guo wrote:
On 2015年01月14日 08:26, Al Stone wrote:
On 01/13/2015 10:22 AM, Grant Likely wrote:
On Mon, Jan 12, 2015 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 12 January 2015 12:00:31 Grant Likely wrote:
I've trimmed the specific examples here because I think that misses the point. The point is that regardless of interface (either ACPI or DT) there are always going to be cases where the data needs to change at runtime. Not all platforms will need to change the CPU data, but some will (say for a machine that detects a failed CPU and removes it). Some PCI add-in boards will carry along with them additional data that needs to be inserted into the ACPI namespace or DT. Some platforms will have system level component (ie. non-PCI) that may not always be accessible.
Just to be sure I get this right: do you mean runtime or boot-time (re-)configuration for those?
Both are important.
But only one of the is relevant to the debate of what ACPI offers over DT. By mixing the two, it's no longer clear which of your examples are the ones that matter for runtime hotplugging.
ACPI has an interface baked in already for tying data changes to events. DT currently needs platform specific support (which we can improve on). I'm not even trying to argue for ACPI over DT in this section, but I included it this document because it is one of the reasons often given for choosing ACPI and I felt it required a more nuanced discussion.
I can definitely see the need for an architected interface for dynamic reconfiguration in cases like this, and I think the ACPI model actually does this better than the IBM Power hypervisor model, I just didn't see the need on servers as opposed to something like a laptop docking station to give a more obvious example I know from x86.
I know of at least one server product (non-ARM) that uses the hot-plugging of CPUs and memory as a key feature, using the ACPI OSPM model. Essentially, the customer buys a system with a number of slots and pays for filling one or more of them up front. As the need for capacity increases, CPUs and/or RAM gets enabled; i.e., you have spare capacity that you buy as you need it. If you use up all the CPUs and RAM you have, you buy more cards, fill the additional slots, and turn on what you need. This is very akin to the virtual machine model, but done with real hardware instead.
Yes, this is a good example, normally called Capacity-on-Demand (CoD), and is a feature typically found in enterprise servers, but not in commodity x86 machines. It would be helpful to hear from someone who actually plans to do this on ARM, but I get the idea.
There is another important user case for RAS, systems running critical missions such as bank billing system, system like that need high reliability that the machine can't be stopped.
So when error happened on hardware including CPU/memory DIMM on such machines, we need to replace them at run-time.
Whether or not this product is still being sold, I do not know. I have not worked for that company for eight years, and they were just coming out as I left. Regardless, this sort of hot-plug does make sense in the server world, and has been used in shipping products.
I think it still will be, Linux developers put lots of effort to enable memory hotplug and computer node hotplug in the kernel [1], and the code already merged into mainline.
The case of memory hotremove is interesting as well, but it has some very significant limitations, regarding system integrity after uncorrectable memory errors as well as nonmovable pages. The cases I know either only support hot-add for CoD (see above), or they support hot-replace for mirrored memory only, but that does not require any interaction with the OS.
Thanks for the examples!
Arnd
On Tuesday 13 January 2015 17:26:33 Al Stone wrote:
On 01/13/2015 10:22 AM, Grant Likely wrote:
On Mon, Jan 12, 2015 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 12 January 2015 12:00:31 Grant Likely wrote:
RAS is also something where every company already has something that they are using on their x86 machines. Those interfaces are being ported over to the ARM platforms and will be equivalent to what they already do for x86. So, for example, an ARM server from DELL will use mostly the same RAS interfaces as an x86 server from DELL.
Right, I'm still curious about what those are, in case we have to add DT bindings for them as well.
Certainly.
In ACPI terms, the features used are called APEI (Advanced Platform Error Interface), and defined in Section 18 of the specification. The tables describe what the possible error sources are, where details about the error are stored, and what to do when the errors occur. A lot of the "RAS tools" out there that report and/or analyze error data rely on this information being reported in the form given by the spec.
I only put "RAS tools" in quotes because it is indeed a very loosely defined term -- I've had everything from webmin to SNMP to ganglia, nagios and Tivoli described to me as a RAS tool. In all of those cases, however, the basic idea was to capture errors as they occur, and try to manage them properly. That is, replace disks that seem to be heading down hill, or look for faults in RAM, or dropped packets on LANs -- anything that could help me avoid a catastrophic failure by doing some preventive maintenance up front.
And indeed a BMC is often used for handling errors in servers, or to report errors out to something like nagios or ganglia. It could also just be a log in a bit of NVRAM, too, with a little daemon that reports back somewhere. But, this is why APEI is used: it tries to provide a well defined interface between those reporting the error (firmware, hardware, OS, ...) and those that need to act on the error (the BMC, the OS, or even other bits of firmware).
Does that help satisfy the curiosity a bit?
Yes, it's much clearer now, thanks!
Arnd
On Sat 2015-01-10 14:44:02, Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
Unfortunately, I seen the blog post before the mailing list post, so here's reply in blog format.
Grant Likely published article about ACPI and ARM at
http://www.secretlab.ca/archives/151
. He acknowledges systems with ACPI are harder to debug, but because Microsoft says so, we have to use ACPI (basically).
I believe doing wrong technical choice "because Microsoft says so" is a wrong thing to do.
Yes, ACPI gives more flexibility to hardware vendors. Imagine replacing block devices with interpretted bytecode coming from ROM. That is obviously bad, right? Why is it good for power management?
It is not.
Besides being harder to debug, there are more disadvantages:
* Size, speed and complexity disadvantage of bytecode interpretter in the kernel.
* Many more drivers. Imagine GPIO switch, controlling rfkill (for example). In device tree case, that's few lines in the .dts specifying which GPIO that switch is on.
In ACPI case, each hardware vendor initially implements rfkill switch in AML, differently. After few years, each vendor implements (different) kernel<->AML interface for querying rfkill state and toggling it in software. Few years after that, we implement kernel drivers for those AML interfaces, to properly integrate them in the kernel.
* Incompatibility. ARM servers will now be very different from other ARM systems.
Now, are there some arguments for ACPI? Yes -- it allows hw vendors to hack half-working drivers without touching kernel sources. (Half-working: such drivers are not properly integrated in all the various subsystems). Grant claims that power management is somehow special, and requirement for real drivers is somehow ok for normal drivers (block, video), but not for power management. Now, getting driver merged into the kernel does not take that long -- less than half a year if you know what you are doing. Plus, for power management, you can really just initialize hardware in the bootloader (into working but not optimal state). But basic drivers are likely to merged fast, and then you'll just have to supply DT tables.
Avoid ACPI. It only makes things more complex and harder to debug.
Pavel
On Mon, Jan 12, 2015 at 2:23 PM, Pavel Machek pavel@ucw.cz wrote:
On Sat 2015-01-10 14:44:02, Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
Unfortunately, I seen the blog post before the mailing list post, so here's reply in blog format.
Grant Likely published article about ACPI and ARM at
http://www.secretlab.ca/archives/151
. He acknowledges systems with ACPI are harder to debug, but because Microsoft says so, we have to use ACPI (basically).
Please reread the blog post. Microsoft is a factor, but it is not the primary driver by any means.
g.
On Mon 2015-01-12 14:41:50, Grant Likely wrote:
On Mon, Jan 12, 2015 at 2:23 PM, Pavel Machek pavel@ucw.cz wrote:
On Sat 2015-01-10 14:44:02, Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote:
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
Unfortunately, I seen the blog post before the mailing list post, so here's reply in blog format.
Grant Likely published article about ACPI and ARM at
http://www.secretlab.ca/archives/151
. He acknowledges systems with ACPI are harder to debug, but because Microsoft says so, we have to use ACPI (basically).
Please reread the blog post. Microsoft is a factor, but it is not the primary driver by any means.
Ok, so what is the primary reason? As far as I could tell it is "Microsoft wants ACPI" and "hardware people want Microsoft" and "fragmentation is bad so we do ACPI" (1) (and maybe "someone at RedHat says they want ACPI" -- but RedHat people should really speak for themselves.)
You snipped quite a lot of reasons why ACPI is inferior that were below this line in email.
Pavel
(1) ignoring fact that it causes fragmentation between servers and phones.
On Monday 12 January 2015 20:39:05 Pavel Machek wrote:
Ok, so what is the primary reason? As far as I could tell it is "Microsoft wants ACPI" and "hardware people want Microsoft" and "fragmentation is bad so we do ACPI" (1) (and maybe "someone at RedHat says they want ACPI" -- but RedHat people should really speak for themselves.)
I can only find the first two in Grant's document, not the third one, and I don't think it's on the table any more. The argument that was in there was that for a given platform that wants to support both Linux and Windows, they can use ACPI and Linux should work with that, but that is different from "all servers must use ACPI", which would be unrealistic.
Arnd
On Mon, Jan 12, 2015 at 7:39 PM, Pavel Machek pavel@ucw.cz wrote:
On Mon 2015-01-12 14:41:50, Grant Likely wrote:
On Mon, Jan 12, 2015 at 2:23 PM, Pavel Machek pavel@ucw.cz wrote:
On Sat 2015-01-10 14:44:02, Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote: > 7. Why is ACPI required? > * Problem: > * arm64 maintainers still haven't been convinced that ACPI is > necessary. > * Why do hardware and OS vendors say ACPI is required? > * Status: Al & Grant collecting statements from OEMs to be posted > publicly early in the new year; firmware summit for broader > discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
Unfortunately, I seen the blog post before the mailing list post, so here's reply in blog format.
Grant Likely published article about ACPI and ARM at
http://www.secretlab.ca/archives/151
. He acknowledges systems with ACPI are harder to debug, but because Microsoft says so, we have to use ACPI (basically).
Please reread the blog post. Microsoft is a factor, but it is not the primary driver by any means.
Ok, so what is the primary reason? As far as I could tell it is "Microsoft wants ACPI" and "hardware people want Microsoft" and "fragmentation is bad so we do ACPI" (1) (and maybe "someone at RedHat says they want ACPI" -- but RedHat people should really speak for themselves.)
The primary driver is abstraction. It is a hard requirement of the hardware vendors. They have to have the ability to adapt their products at a software level to support existing Linux distributions and other operating system releases. This is exactly what they do now in the x86 market, and they are not going to enter the ARM server market without this ability.
Even if DT was chosen, the condition would have been to add an abstraction model into DT, and then DT would end up looking like an immature ACPI.
The secondary driver is consistency. When hardware vendors and OS vendors produce independent products for the same ecosystem that must be compatible at the binary level, then it is really important that everyone in that ecosystem uses the same interfaces. At this level it doesn't matter if it is DT or ACPI, just as long as everyone uses the same thing.
[Of course, vendors have the option of completely rejecting the server specifications as published by ARM, with the understanding that they will probably need to either a) ship both the HW and OS themselves, or b) create a separate and competing ecosystem.]
If the reason was merely as you say, "because Microsoft says so", then my blog post would have been much shorter. I would have had no qualms about saying so bluntly if that was actually the case. Instead, this is the key paragraph to pay attention to:
However, the enterprise folks don't have that luxury. The platform/kernel split isn't a design choice. It is a characteristic of the market. Hardware and OS vendors each have their own product timetables, and they don't line up. The timeline for getting patches into the kernel and flowing through into OS releases puts OS support far downstream from the actual release of hardware. Hardware vendors simply cannot wait for OS support to come online to be able to release their products. They need to be able to work with available releases, and make their hardware behave in the way the OS expects. The advantage of ACPI OSPM is that it defines behaviour and limits what the hardware is allowed to do without involving the kernel.
All of the above applies regardless of whether or not vendors only care about Linux support. ACPI is strongly desired regardless of whether or not Microsoft is in the picture. Their support merely adds weight behind an argument that is already the choice preferred by hardware vendors.
You snipped quite a lot of reasons why ACPI is inferior that were below this line in email.
Yes, I did. My primary point is that ACPI was chosen because the hardware OEMs require some level of abstraction. If you don't agree that the abstraction is important, then we fundamentally don't agree on what the market looks like. In which case there is absolutely no value in debating the details because each of us are working from a different set of requirements.
Pavel
(1) ignoring fact that it causes fragmentation between servers and phones.
That's a red herring. ARM servers are not an extension of the ARM phone market. The software ecosystem is completely different (the phone vendor builds and ships the OS instead of being provided by one of many independent OS vendors). ARM servers are an extension of the x86 server market, and they will be judged in terms of how they compare to a similar x86 machine. It is in the HW vendors best interest to make using an ARM server as similar to their existing x86 products as possible.
When confronted with the choice of similarity with ARM phones or with from x86 servers, the vendors will chose to follow x86's lead, and they will be right to do so.
On 01/12/2015 12:39 PM, Pavel Machek wrote:
On Mon 2015-01-12 14:41:50, Grant Likely wrote:
On Mon, Jan 12, 2015 at 2:23 PM, Pavel Machek pavel@ucw.cz wrote:
On Sat 2015-01-10 14:44:02, Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote:
On Monday 15 December 2014 19:18:16 Al Stone wrote: > 7. Why is ACPI required? > * Problem: > * arm64 maintainers still haven't been convinced that ACPI is > necessary. > * Why do hardware and OS vendors say ACPI is required? > * Status: Al & Grant collecting statements from OEMs to be posted > publicly early in the new year; firmware summit for broader > discussion planned.
I was particularly hoping to see better progress on this item. It really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
Unfortunately, I seen the blog post before the mailing list post, so here's reply in blog format.
Grant Likely published article about ACPI and ARM at
http://www.secretlab.ca/archives/151
. He acknowledges systems with ACPI are harder to debug, but because Microsoft says so, we have to use ACPI (basically).
Please reread the blog post. Microsoft is a factor, but it is not the primary driver by any means.
Ok, so what is the primary reason? As far as I could tell it is "Microsoft wants ACPI" and "hardware people want Microsoft" and "fragmentation is bad so we do ACPI" (1) (and maybe "someone at RedHat says they want ACPI" -- but RedHat people should really speak for themselves.)
I have to say I found this statement fascinating.
I have been seconded to Linaro from Red Hat for over two years now, working on getting ACPI running, first as a prototype on an ARMv7 box, then on ARMv8. I have been working with Grant since very early on when some of us first started talking about ARM servers in the enterprise market, and what sorts of standards, if any, would be needed to build an ecosystem.
This is the first time in at least two years that I have had someone ask for Red Hat to speak up about ACPI on ARM servers; it's usually quite the opposite, as in "will you Red Hat folks please shut up about this already?" :).
For all the reasons Grant has already mentioned, my Customers need to have ACPI on ARM servers for them to be successful in their business. I view my job as providing what my Customers need to be successful. So, here I am. I want ACPI on ARMv8 for my Customers.
You snipped quite a lot of reasons why ACPI is inferior that were below this line in email.
Pavel
(1) ignoring fact that it causes fragmentation between servers and phones.
I see this very differently. This is a "fact" only when viewed from the perspective of having two different technologies that can do very similar things.
In my opinion, the issue is that these are two very, very different markets; technologies are only relevant as the tools to be used to be successful in those markets.
Just on a surface level, phones are expected to be completely replaced every 18 months or less -- new hardware, new version of the OS, new everything. That's the driving force in the market.
A server does not change that quickly; it is probable that the hardware will change, but it is unlikely to change at that speed. It can take 18 months just for some of the certification testing needed for new hardware or software. Further, everything from the kernel on up is expected to be stable for a long time -- as long as 25 years, in some cases I have worked on. "New" can be a bad word in this environment.
Best I can tell, I need different tool sets to do well in each of these environments -- one that allows me to move quickly for phones, and one that allows me to carefully control change for servers. I personally don't see that as fragmentation, but as using the right tool for the job. If I'm building a phone, I want the speed and flexibility of DT. If I'm building a server, I want the long term stability of ACPI.
On 1/13/2015 7:21 PM, Al Stone wrote:
On 01/12/2015 12:39 PM, Pavel Machek wrote:
On Mon 2015-01-12 14:41:50, Grant Likely wrote:
On Mon, Jan 12, 2015 at 2:23 PM, Pavel Machek pavel@ucw.cz wrote:
On Sat 2015-01-10 14:44:02, Grant Likely wrote:
On Wed, Dec 17, 2014 at 10:26 PM, Grant Likely grant.likely@linaro.org wrote:
On Tue, Dec 16, 2014 at 11:27 AM, Arnd Bergmann arnd@arndb.de wrote: > On Monday 15 December 2014 19:18:16 Al Stone wrote: >> 7. Why is ACPI required? >> * Problem: >> * arm64 maintainers still haven't been convinced that ACPI is >> necessary. >> * Why do hardware and OS vendors say ACPI is required? >> * Status: Al & Grant collecting statements from OEMs to be posted >> publicly early in the new year; firmware summit for broader >> discussion planned. > > I was particularly hoping to see better progress on this item. It > really shouldn't be that hard to explain why someone wants this feature.
I've written something up in as a reply on the firmware summit thread. I'm going to rework it to be a standalone document and post it publicly. I hope that should resolve this issue.
I've posted an article on my blog, but I'm reposting it here because the mailing list is more conducive to discussion...
Unfortunately, I seen the blog post before the mailing list post, so here's reply in blog format.
Grant Likely published article about ACPI and ARM at
http://www.secretlab.ca/archives/151
. He acknowledges systems with ACPI are harder to debug, but because Microsoft says so, we have to use ACPI (basically).
Please reread the blog post. Microsoft is a factor, but it is not the primary driver by any means.
Ok, so what is the primary reason? As far as I could tell it is "Microsoft wants ACPI" and "hardware people want Microsoft" and "fragmentation is bad so we do ACPI" (1) (and maybe "someone at RedHat says they want ACPI" -- but RedHat people should really speak for themselves.)
I have to say I found this statement fascinating.
I have been seconded to Linaro from Red Hat for over two years now, working on getting ACPI running, first as a prototype on an ARMv7 box, then on ARMv8. I have been working with Grant since very early on when some of us first started talking about ARM servers in the enterprise market, and what sorts of standards, if any, would be needed to build an ecosystem.
This is the first time in at least two years that I have had someone ask for Red Hat to speak up about ACPI on ARM servers; it's usually quite the opposite, as in "will you Red Hat folks please shut up about this already?" :).
For all the reasons Grant has already mentioned, my Customers need to have ACPI on ARM servers for them to be successful in their business. I view my job as providing what my Customers need to be successful. So, here I am. I want ACPI on ARMv8 for my Customers.
I want that too, even for platforms that might not ever run Windows.
-- ljk
You snipped quite a lot of reasons why ACPI is inferior that were below this line in email.
Pavel
(1) ignoring fact that it causes fragmentation between servers and phones.
I see this very differently. This is a "fact" only when viewed from the perspective of having two different technologies that can do very similar things.
In my opinion, the issue is that these are two very, very different markets; technologies are only relevant as the tools to be used to be successful in those markets.
Just on a surface level, phones are expected to be completely replaced every 18 months or less -- new hardware, new version of the OS, new everything. That's the driving force in the market.
A server does not change that quickly; it is probable that the hardware will change, but it is unlikely to change at that speed. It can take 18 months just for some of the certification testing needed for new hardware or software. Further, everything from the kernel on up is expected to be stable for a long time -- as long as 25 years, in some cases I have worked on. "New" can be a bad word in this environment.
Best I can tell, I need different tool sets to do well in each of these environments -- one that allows me to move quickly for phones, and one that allows me to carefully control change for servers. I personally don't see that as fragmentation, but as using the right tool for the job. If I'm building a phone, I want the speed and flexibility of DT. If I'm building a server, I want the long term stability of ACPI.
Hi Pavel,
For the sake of argument, I'll respond to your points below, even though we fundamentally disagree on what is required for a general purpose server...
On Mon, Jan 12, 2015 at 2:23 PM, Pavel Machek pavel@ucw.cz wrote:
On Sat 2015-01-10 14:44:02, Grant Likely wrote: Grant Likely published article about ACPI and ARM at
http://www.secretlab.ca/archives/151
. He acknowledges systems with ACPI are harder to debug, but because Microsoft says so, we have to use ACPI (basically).
I believe doing wrong technical choice "because Microsoft says so" is a wrong thing to do.
Yes, ACPI gives more flexibility to hardware vendors. Imagine replacing block devices with interpretted bytecode coming from ROM. That is obviously bad, right? Why is it good for power management?
It is not.
Trying to equate a block driver with the things that are done by ACPI is a pretty big stretch. It doesn't even come close to the same level of complexity. ACPI is merely the glue between the OS and the behaviour of the HW & FW. (e.g. On platform A, the rfkill switch is a GPIO, but on platform B it is an i2c transaction to a microcontroller). There are lots of these little details on modern hardware, and most of them are pretty trivial aside from the fact that support requires either a kernel change (for every supported OS) or something like ACPI.
You just need to look at the x86 market to see that huge drivers in ACPI generally doesn't happen. In fact, aside from the high profile bad examples (we all like to watch a good scandal) the x86 market works really well.
Besides being harder to debug, there are more disadvantages:
- Size, speed and complexity disadvantage of bytecode interpretter in
the kernel.
The bytecode interpreter has been in the kernel for years. It works, It isn't on the hot path for computing power, and it meets the requirement for abstraction between the kernel and the platform.
- Many more drivers. Imagine GPIO switch, controlling rfkill (for example). In device tree case, that's few lines in the .dts specifying which GPIO that switch is on.
In ACPI case, each hardware vendor initially implements rfkill switch in AML, differently. After few years, each vendor implements (different) kernel<->AML interface for querying rfkill state and toggling it in software. Few years after that, we implement kernel drivers for those AML interfaces, to properly integrate them in the kernel.
I don't know what you're trying to argue here. If an AML interface works, we never bother with writing direct kernel support for it. Most of the time it works. When it doesn't, we write a quirk in the driver and move on.
- Incompatibility. ARM servers will now be very different from other
ARM systems.
If anything, ARM servers are an extension of the existing x86 server market, not any of the existing ARM markets. They don't look like mobile, and they don't look like embedded. The fact that the OS vendors and the HW vendors are independent companies completely changes how this hardware gets supported. The x86 market has figured this out and that is a big reason why it is able to scale to the size it has. ACPI is a direct result of what that kind of market needs.
Now, are there some arguments for ACPI? Yes -- it allows hw vendors to hack half-working drivers without touching kernel sources. (Half-working: such drivers are not properly integrated in all the various subsystems).
That's an unsubstantiated claim. ACPI defines a model for the fiddly bits around the edges to be implemented by the platform, and performed under /kernel/ control, without requiring the kernel to explicitly have code for each and every possibility. It is a very good abstraction from that point of view.
Grant claims that power management is somehow special, and requirement for real drivers is somehow ok for normal drivers (block, video), but not for power management. Now, getting driver merged into the kernel does not take that long -- less than half a year if you know what you are doing. Plus, for power management, you can really just initialize hardware in the bootloader (into working but not optimal state). But basic drivers are likely to merged fast, and then you'll just have to supply DT tables.
The reality doesn't play out with the scenario you're describing. We have two major mass markets in the Linux world; Mobile and general purpose computing. For mobile, we have yet to have a device on the market that is well supported at launch by mainline. Vendors ship their own kernel because they only support a single OS, and we haven't figured out how to support their hardware at launch time (there is plenty of blame to go around on this issue; regardless of who is to blame, it is still a problem). The current mobile model doesn't even remotely address the needs of server vendors and traditional Linux distributions.
The general purpose (or in this case, the server subset), is the only place where there is a large ecosystem of independent hardware and OS vendors. For all the complaints about technical problems, the existing x86 architecture using ACPI works well and it has spawned an awful lot of very interesting hardware.
...
Personally, I think ACPI is the wrong thing to be getting upset over. Even supposing ACPI was rejected, it doesn't say anything about firmware that is completely out of visibility of the kernel. Both ARM and x86 CPUs have secure modes that are the domain of firmware, not the kernel, and have basically free reign on the machine.
ACPI on the other hand can be inspected. We know when the interpreter is running, because the kernel controls it. We can extract and decompile the ACPI tables. It isn't quite the hidden black box that the rhetoric against ACPI claims it to be.
If you really want to challenge the industry, then push for vendors to open source and upstream their firmware. Heck, all the necessary building blocks are already open sourced in the Tianocore and ARM Trusted Firmware projects. This would actually improve the visibility and auditing of the platform behaviour. The really invisible stuff is there, not in ACPI, and believe me, if ACPI (or similar) isn't available then the vendors will be stuffing even more into firmware than they are now.
g.
Hi!
Back in September 2014, a meeting was held at Linaro Connect where we discussed what issues remained before the arm64 ACPI core patches could be merged into the kernel, creating the TODO list below. I should have published this list sooner; I got focused on trying to resolve some of the issues instead.
We have made some progress on all of these items. But, I want to make sure we haven't missed something. Since this list was compiled by only the people in the room at Connect, it is probable we have. I, for one, do not yet claim omniscience.
So, I want to ask the ARM and ACPI communities:
-- Is this list correct?
-- Is this list complete?
I'm not sure this is how kernel development works. Expect new issues to be raised as patches are being reviewed.
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
This should be #1 really, right? I still hope the answer is "ACPI not needed".
Pavel
On Mon, Jan 05, 2015 at 08:52:27PM +0000, Pavel Machek wrote:
Back in September 2014, a meeting was held at Linaro Connect where we discussed what issues remained before the arm64 ACPI core patches could be merged into the kernel, creating the TODO list below. I should have published this list sooner; I got focused on trying to resolve some of the issues instead.
We have made some progress on all of these items. But, I want to make sure we haven't missed something. Since this list was compiled by only the people in the room at Connect, it is probable we have. I, for one, do not yet claim omniscience.
So, I want to ask the ARM and ACPI communities:
-- Is this list correct?
-- Is this list complete?
I'm not sure this is how kernel development works. Expect new issues to be raised as patches are being reviewed.
The problem with ACPI is not just the core patches but the entire hardware/firmware/kernel ecosystem and their interaction, which cannot always be captured in Linux patches (e.g. firmware test suite, LuvOS, _DSD properties review process, hardware standardisation).
The above is a to-do list that we raised last year. Of course, new issues will appear, that's just some items where we can say (for the time being) that the situation got to an acceptable state once implemented.
- Why is ACPI required?
- Problem:
- arm64 maintainers still haven't been convinced that ACPI is necessary.
- Why do hardware and OS vendors say ACPI is required?
- Status: Al & Grant collecting statements from OEMs to be posted publicly early in the new year; firmware summit for broader discussion planned.
This should be #1 really, right? I still hope the answer is "ACPI not needed".
I lost this hope some time ago ;).
From my perspective, the "why" above still needs to be publicly stated;
there have been several private discussions on the pros and cons and some of them cannot be made in the open. My main concern is no longer "why" but "how" to do it properly (and if we can't, we should rather not do it at all).