On Fri, Sep 06, 2013 at 02:50:45AM +0100, Nicolas Pitre wrote:
On Thu, 5 Sep 2013, Catalin Marinas wrote:
I respect your opinion but do you have a more concrete proposal? The options so far:
(current status) Don't use PSCI firmware, let Linux handle all the CPU power management (possibly under the MCPM framework). If not all power-related actions can be done at the non-secure level, just implement non-standard SMC calls as needed. If these are changed (because in time vendor may have other security needs), add them to the driver and hope they have a way to detect or just not upstream the #ifdef'ed code.
New standard firmware interface, simpler and less error-prone. Handle most power management in Linux (with an MCPM-like state machine) and have guaranteed race-free calls to the firmware. In the process, also convince the secure OS guys that Linux is part of their trusted environment (I personally trust Linux more than the trusted OS ;) but this doesn't hold any water for certification purposes). Basically if you can disable coherency from the non-secure OS (e.g. CCI or just powering down a CPU without the secure OS having the chance to flush its caches), the only workaround for the secure OS would be to run UP (which is probably the case now) or flush caches at every return to the non-secure world.
Very similar to 2, with PSCI firmware interface but without the requirements to do cluster state coordination in firmware (with some semantic changes to the affinity arguments). Linux handles the state coordination (MCPM state machine) but PSCI firmware does the necessary flushing, coherency disabling based on the specified affinity level (it doesn't question that because it does not track the "last man"). Slightly better security model than 2 as it can flush the secure OS caches but I'm not entirely sure PSCI can avoid a state machine and whether this has other security implications.
MCPM state machine on top of full PSCI. Here I don't see the point of tracking cluster/last-man state in Linux if PSCI does it already. If PSCI got it wrong (broken coherency, deadlocks), MCPM cannot really solve it. Also, are there additional races by having two separate state machines for the same thing (I can't think of any now)?
Full PSCI with a light wrapper (smp_operations) allowing SoC to hook additional code (only if needed for parts which aren't handled by firmware). This is similar to the mcpm_platform_ops registration but *without* the MCPM state machine and without the separate cpu/cluster arguments (just linearise this space for flexibility). Other key point is that the re-entry address is driven via PSCI rather than mcpm_entry_vectors. Platform ops registration is optional, just available for flexibility (which could also mean that it is not the platform ops driving the final PSCI call, different from the MCPM framework). This approach does not enforce a secure model, it's up to the SoC vendor to allow/prevent non-secure access to power controller, CCI etc. But it still mandates common kernel entry/exit path via PSCI.
Full PSCI with generic CPU hotplug and cpuidle drivers. I won't list the pros/cons, that's what this thread is mainly about.
Any other options?
My goal is for 6 but 5 could be a more practical/flexible approach.
Both those options (5 and 6) imply the state machine is in the firmware. And that's where complexity lies. So that wouldn't be my choice at all.
Indeed.
Option 4 is rather useless. In fact the MCPM backend for PSCI I showed you doesn't exercise the MCPM state machine. And the firmware still implements a state machine.
Correct. That's mainly to show that a PSCI back-end to MCPM doesn't help much. It adds some unification but we already have a common smp_ops API. The MCPM re-entry point is handled by the PSCI API.
Differences between 2,3,4 are a bit fuzzy to me.
4 we just discussed above. 2 and 3 are basically the same, with 3 an instantiation of a standard firmware API but it still has the same issues as 1 in terms of security.
I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?
The problem is when it is *not* told to do so. If the non-secure OS is allowed to disable coherency (at the CCI level or simply by shutting down a CPU in a cluster) *without* the secure OS being informed, the trusted model is broken (or has to include the non-secure OS). In a more paranoid world, this part must be moved into the secure firmware and there is no way to do it without a similar "last man" state machine. It is probably hard to create an attack but random data corruption in the secure OS is not something that can be ignored.
Of course option 1 is the most flexible in terms of implementation efficiency, but it has drawbacks as well.
Too much flexibility also has drawbacks and we have the ARM SoC past experience - code duplication, difficult single zImage, people asking for machine quirks in __v7_setup (though we managed to prevent them so far). A unified approach (like standard firmware interface) should be the default and we can later relax it if there are good reasons. This unification is more important for the server distro space as the mobile space tend not to contribute that much back into the kernel.
And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.
As I said above, this complexity in the firmware is required to *increase* security. Of course, any complexity has its own risks but unless you include the non-secure OS in your trusted environment there is no way around (well, not with TrustZone at least and not efficiently, but you can always have a separate independent processor doing security-related stuff).