 
            On Thu, 12 Sep 2013, Catalin Marinas wrote:
On Tue, Sep 10, 2013 at 08:07:17PM +0100, Nicolas Pitre wrote:
On Tue, 10 Sep 2013, Catalin Marinas wrote:
I hope it is clear by now that if you care about _proper_ security in the power management context, point 3 above must be handled in firmware. That's independent of how complex the firmware needs to be for such task. PSCI as an API and the Generic Firmware implementation is addressing this (and at the same time providing a common framework for secure OS vendors to rely on). Generic Firmware will be further developed to address other concerns but that's not the point.
Indeed it is the point! And as I keep repeating this over and over in various ways because there is no one who is addressing my very concern so far:
It's not hard to understand: the generic firmware is *not* public yet. You will be able review it and discuss your concerns at the right time.
And how the reviewing of it will alleviate my concerns? At least if you or ARM have no answers then please say so rather than continuously ignoring the issues.
I'm asking if there is a _plan_ to produce _recommendations_ for best practices about firmware update deployment. Unless you don't recognize the need for them? That doesn't have to wait until a piece of code is published before answers are given, no?
I *do* appreciate the complexities of MCPM but that doesn't make it write once only (it's not even software only, part of the state coordination may be handled in hardware). But the experience gained from developing it is definitely not lost.
It is not "write once" for sure. The gained experience shows that this code is going to be an _evolving_ target that cannot be cast into static firmware just like a done job.
We clearly have a different understanding of the ARM security model and I've already gone to great lengths explaining it. I won't go over those arguments again, you seem to have ignored them so far.
I didn't ignore them. They simply failed to address my point. Repeating them won't make them any more relevant.
Let me summarize the situation one more time:
- The market is asking for security sensitive code to be executed in perfect isolation from the standard OS. Hence TrustZone / Secure World.
- The _design_ of the current Secure World architecture implies that the code running there has no choice but to concern itself with non security related operations as well simply to _preserve_ its secure attributes. In other words, the fact that secure code has to implement e.g. power management mechanisms and cache management operations is a consequence of the architecture design and not a secure service that the market asked for.
- For secure code to be truly secure, the code itself has to be unalterable and unaccessible, especially if it carries encryption keys, etc. What we've seen so far is that the secure code is getting burned directly into the SoC for those reasons.
- And Secure World is there to stay.
Do we agree so far?
What I'm claiming is that adding more and more complexity to non-alterable firmware code is bad bad bad. You may repeat over and over that the ARM security model requires that complexity in the secure firmware. I never denied that fact either. But I will continue to assert that this is still the unfortunate _consequence_ of a bad architecture model and not something that should be promoted as a design feature.
The decision for adopting the ARM generic firmware or other firmware lies entirely with the ARMv8 SoC vendors and they should know better what their security needs are (or will be). It's not the role of the Linux community to mandate certain firmware. However, we *do* have the right to mandate certain standards like booting protocols, DT, ACPI etc. for code aimed to be included in mainline.
Standards are _*NOT*_ the problem. Please drop this argument.
Linux interaction with the firmware is another area which badly needs standardisation, whether it is secure firmware or not, simply because that's the first code a CPU executes when out of reset. Such standardisation is even more important in the presence of secure firmware (and given that AArch64 is new, companies will have to write new firmware and there is little legacy to carry).
What I'm saying is that _complexity_ in the firmware is _*THE*_ problem. Whether it is old or new, AArch64 or AArch32, PSCI or whatnot.
The _increasing_ interactions between any firmware/bootloader and Linux _is_ a serious problem. This is a problem because it _will_ have bugs. It _will_ have version and implementation skews. It will require _more_ coordinations between different software parts beyond any standard interfaces. And that is _costly_ and a total nightmare to manage. And even more so when those bugs are in the (possibly unalterable) firmware.
You can bury your head in the sand as you wish and conveniently downplay those facts. I personally do care greatly and I do have sympathy for vendors who might wish to pursue a different path than this single solution with everything in firmware approach.
The first interaction with firmware (EL3 or not, boot loader) is the booting protocol (primary and secondary CPUs). This is defined by Documentation/arm64/booting.txt and will also cover PSCI. It can be extended to other protocols in the future as long as they follow simple rules:
- Existing protocols are not feasible/sufficient (with good arguments, EFI_STUB for example)
"Good" is pretty subjective.
"I don't want complex firmware in my SoC" could be a "good" reasons according to certain point of views.
- There are no SoC or CPU implementation specific quirks required before the FDT is parsed and the (secondary) CPUs enabled the MMU. IOW:
- caches and TLBs clean&invalidated
- full CPU/cluster coherency enabled
- errata workaround bits set
Items 1 and 2 are normally easy. Item 3 is often known _after_ product deployment. What do you do if this is the responsibility of the secure firmware to do? How do you manage re-certification of the secure code? How do you provide secure code updates to products in the field? What if the L3 firmware is not alterable? Are there recommendations in ARM's plans to address this?
Power management is not covered by the above document, though there is a relation between secondary CPU booting and hotplug. To be clear, as the arm64 kernel _gatekeeper_ I set the basic rules for ARMv8 SoC power management code aimed for mainline (I'll capture them in a soc.txt document):
- If EL3 is present, standard EL3 firmware interface required
Fair enough for booting.
- New EL3 interface can be accepted if the existing interfaces are not feasible (with good arguments, it is properly documented and widely accepted)
How can an interface be widely accepted if it is not accepted first?
- CPUs coming out of power down or idle need to be have all the SoC or implementation specific quirks enabled:
What if they're not all known up front the day the Soc goes into production? Because in practice those quirks often end up much more often than we would like being errata workarounds once products are deployed.
- caches and TLBs clean & invalidated
- coherency enabled
This looks like a simple enumeration, doesn't it?
What if the above implies the equivalent of MCPM which complexities you said you do appreciate? What if its hardware specific implementation (backend in MCPM parlance) is non trivial to implement optimally and requires updating?
- errata workaround bits set
Yada.
ARM provides PSCI as such standard API in the presence of EL3 but I'm _open_ to other _well-thought_ firmware API proposals that can gain wider acceptance.
Thank you.
One such API might simply be a small L3 stub which only purpose in life is to proxy system control accesses that L1 cannot do otherwise, especially if there is no need for a secure OS on the system. This is likely to free some vendors of the risks from not getting their firmware right for all cases.
Now let's be clear on what my position is: I do agree on the value of a standard booting interface for the kernel or even bootloaders, etc. This really helps in having a common distro procedure for different hardware, etc.
But once the kernel is booted, it does require hardware specific drivers to work properly in all cases. No one is ever going to accept abstracting ethernet hardware into firmware (virtual machines notwithstanding). Specialized disk arrays will also require custom drivers -- I really doubt AHCI will cut it for them all. Furthermore, improvements in kernel subsystems often implies modifications to those drivers (e.g. when NAPI was introduced), etc.
Therefore... there is no reason why _conceptually_ the same principle could not be applied to power management. If specific _drivers_ are needed to support this or that platform then this shouldn't be a problem, just like it is not a problem for ethernet interfaces. Once the kernel is booted via the standard firmware interface, then the kernel should be provided with the right modules to drive the rest of the system in the best possible way. The only reason why this wouldn't work on ARM is because of the security model.
Of course it is a good idea to have DT or ACPI. Those standards are very useful for the factoring of integration differences on otherwise common hardware blocks. They are _informative_ and they allow the kernel to bypass them when they turn out to be insufficient. Firmware calls do not have that flexibility.
Hint: Linaro is a good forum for wider SoC vendor and Linux community discussions, I would expect concrete proposals rather than complains.
They might come sooner than you'd expect.
(BTW, my impression from the last Connect was that LEG is adopting PSCI for the ACPI work)
PSCI has its place, there's no doubt about that. It can't be a one-size-fits-all solution though.
Note that the above rules don't have anything to do with MCPM. That's a SoC power driver implementation detail (and I already suggested turning it into a library if needed to avoid duplication).
I do agree. However this is again implementation details. I'm more concerned about the larger picture.
But the above firmware API rules still apply and if PSCI is present you have the advantage of generic support in Linux.
Easier said than done. Linux code is cheaper to write and maintain than firmware code, _even_ if it has to stay out of mainline because you'd be opposing it.
(as a side note, generalising your TC2 MMC experience to _any_ firmware is unprofessional IMHO. You keep repeating it and to me starts sounding like FUD)
I'm a pragmatic. I don't believe in magic and wishful thinking.
Ignoring reported firmware bugs for months may has its share of unprofessionalism too. But instead of going down the route of name calling, I prefer to believe that you have good reasons at ARM to explain this situation such as resource shortage and/or higher priorities. And from experience for having worked at several different companies I can tell you that resource shortage and priority shifts do happen everywhere.
Hence my assertion that complex firmware are not cost effective. If a simpler (aka cheaper) solution exists, you must expect vendors to embrace it, whether or not you like it.
Nicolas