On Thu, Sep 12, 2013 at 10:54:25PM +0100, Nicolas Pitre wrote:
On Thu, 12 Sep 2013, Catalin Marinas wrote:
On Tue, Sep 10, 2013 at 08:07:17PM +0100, Nicolas Pitre wrote:
On Tue, 10 Sep 2013, Catalin Marinas wrote:
I hope it is clear by now that if you care about _proper_ security in the power management context, point 3 above must be handled in firmware. That's independent of how complex the firmware needs to be for such task. PSCI as an API and the Generic Firmware implementation is addressing this (and at the same time providing a common framework for secure OS vendors to rely on). Generic Firmware will be further developed to address other concerns but that's not the point.
Indeed it is the point! And as I keep repeating this over and over in various ways because there is no one who is addressing my very concern so far:
It's not hard to understand: the generic firmware is *not* public yet. You will be able review it and discuss your concerns at the right time.
And how the reviewing of it will alleviate my concerns? At least if you or ARM have no answers then please say so rather than continuously ignoring the issues.
You can have answers at the upcoming Connect, if you want a timeline.
We clearly have a different understanding of the ARM security model and I've already gone to great lengths explaining it. I won't go over those arguments again, you seem to have ignored them so far.
I didn't ignore them. They simply failed to address my point. Repeating them won't make them any more relevant.
Let me summarize the situation one more time:
- The market is asking for security sensitive code to be executed in
perfect isolation from the standard OS. Hence TrustZone / Secure World.
- The _design_ of the current Secure World architecture implies that the
code running there has no choice but to concern itself with non security related operations as well simply to _preserve_ its secure attributes. In other words, the fact that secure code has to implement e.g. power management mechanisms and cache management operations is a consequence of the architecture design and not a secure service that the market asked for.
- For secure code to be truly secure, the code itself has to be
unalterable and unaccessible, especially if it carries encryption keys, etc. What we've seen so far is that the secure code is getting burned directly into the SoC for those reasons.
- And Secure World is there to stay.
Do we agree so far?
Mostly yes. Regarding what the market asked for, another way to do world separation is to use a separate processor. But the market found it cheaper/easier to use the same CPU with some restrictions (power management mechanism on one side, policy on the other). Anyway, nothing in the architecture prevents loading parts of the secure firmware (e.g. signed images), only that people aren't used to do it now. A few years ago you weren't even able to update your mobile phone software.
- There are no SoC or CPU implementation specific quirks required before
the FDT is parsed and the (secondary) CPUs enabled the MMU. IOW:
- caches and TLBs clean&invalidated
- full CPU/cluster coherency enabled
- errata workaround bits set
Items 1 and 2 are normally easy. Item 3 is often known _after_ product deployment. What do you do if this is the responsibility of the secure firmware to do? How do you manage re-certification of the secure code? How do you provide secure code updates to products in the field? What if the L3 firmware is not alterable? Are there recommendations in ARM's plans to address this?
We need to be careful here not to confuse secure OS (usually secure EL1) and the secure firmware (usually EL3). Certification (usually EAL4) limits itself to secure OS code design, test, review. I don't think we've ever had certified firmware. So modifying firmware does not mean re-certifying the secure OS. It also doesn't mean that the full system is certified.
Some (not all) errata have the lucky workaround of simply setting an implementation-defined bit. We've had lots of problems on ARMv7 with such bits which (for security reasons) are not exposed to the non-secure world. You can either get the SoC vendors to unblock those in firmware or get them to provide firmware updates (not necessarily secure firmware). Every time such issue appears on the list, we say that should have been done in firmware (or a boot loader in a SoC-specific way). You can't do a SoC specific SMC before you know the SoC and that's only after parsing the DT (unless we go back to machine numbers). To make things worse, many workarounds have to be enabled before enabling the MMU.
As for what ARM recommends, single OS image is not really a concern of the hardware/architecture people. Such recommendations really should come from the OS communities. So do we allow random #ifdef's throughout the kernel or tell SoC vendors to handle them in firmware or boot-loader?
Please don't try to make this an ARM responsibility only, the Linux community should come up with guidelines for the SoC and firmware people.
- New EL3 interface can be accepted if the existing interfaces are not
feasible (with good arguments, it is properly documented and widely accepted)
How can an interface be widely accepted if it is not accepted first?
Public reviews involving non-secure OS, secure OS (if applicable), firmware and SoC people. That's the process we went through with PSCI (including many public sessions at Linaro Connect).
- CPUs coming out of power down or idle need to be have all the SoC or
implementation specific quirks enabled:
What if they're not all known up front the day the Soc goes into production? Because in practice those quirks often end up much more often than we would like being errata workarounds once products are deployed.
We can still take quirks as errata workarounds. But quirks should not be a default non-errata mode. And they should not break single Image.
- caches and TLBs clean & invalidated
- coherency enabled
This looks like a simple enumeration, doesn't it?
What should it look like? The reason is that there are many implementation-specific ways of flushing the caches and enabling coherency. Hotplug and CPU boot are not necessarily different code paths in firmware and Linux (especially if we take kexec into account). So we can extend the cold boot reasoning to hoplug or idle.
What if the above implies the equivalent of MCPM which complexities you said you do appreciate? What if its hardware specific implementation (backend in MCPM parlance) is non trivial to implement optimally and requires updating?
You could defer things like CCI enabling (not specific to a single CPU) until SoC-specific code can be run (IOW you know what SoC it is) but only if (1) it doesn't break the security assumptions (if any) and (2) you have a (standard) way to call back into firmware.
ARM provides PSCI as such standard API in the presence of EL3 but I'm _open_ to other _well-thought_ firmware API proposals that can gain wider acceptance.
Thank you.
One such API might simply be a small L3 stub which only purpose in life is to proxy system control accesses that L1 cannot do otherwise, especially if there is no need for a secure OS on the system. This is likely to free some vendors of the risks from not getting their firmware right for all cases.
Minor correction on terminology to avoid confusion - ELx for exception levels, Lx for cache levels.
Such API should also state what the security aims are, what restrictions, if any, are imposed on the secure OS (e.g. running UP and tied to a single CPU).
Hint: Linaro is a good forum for wider SoC vendor and Linux community discussions, I would expect concrete proposals rather than complains.
They might come sooner than you'd expect.
Looking forward to it. Depending on how soon, Connect or the ARM kernel mini-summit are good opportunities for wider discussions. Actually for the ARM mini-summit, a topic on single Image and firmware requirements would be really good to create high-level guidelines.
BTW, I hope to get arm64 hotplug and suspend merged for 3.13. The patches target PSCI and you have seen/acked some of them. We'll do some slight reworking to generalise the CPU enable-method DT parsing (spin-table and PSCI), also used for hotplug and suspend. Please feel free to comment when they get on the list.