On 7 Sep 2013, at 21:31, Nicolas Pitre nicolas.pitre@linaro.org wrote:
On Sat, 7 Sep 2013, Catalin Marinas wrote:
On 6 Sep 2013, at 20:52, Nicolas Pitre nicolas.pitre@linaro.org wrote:
On Fri, 6 Sep 2013, Catalin Marinas wrote:
On Fri, Sep 06, 2013 at 02:50:45AM +0100, Nicolas Pitre wrote:
I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?
The problem is when it is *not* told to do so.
Well, just halt the whole system in that case. Or raise a fault if you want to be nice.
I don't think you got my point. How to force the halting of the whole system when the non-secure OS controls what to halt? It controls which CPUs to halt, when to disable cluster coherency. This is normally for good reasons like power management but a malicious non-secure OS may also use these to cause data corruption on the secure side (with various consequences, it allows options for attack).
What I meant is:
- Secure OS traps on any attempt from the non secure OS to disable
coherency or halt CPUs while it is active.
Good. So we agree that the non-secure OS cannot freely disable coherency or halt CPUs without the secure OS being informed first.
The architecture does not allow trapping at EL3 as that's normally for hypervisor-type implementations. But it can indeed (subject to SoC implementation) block access to certain peripherals, in which case the non-secure OS (at EL1) most likely gets an external synchronous abort.
- Non-secure OS wants to do some power management so it tells secure OS
to pack its things and remove its hands from the hardware controls.
OK, so let's assume the non-secure OS does an SMC #PREPARE_* so that the firmware enables non-secure access to such hardware after packing its things.
- While non-secure OS has control over the hardware knobs, secure OS
refuses to operate.
- Non-secure OS tells secure OS to come back. Secure OS reinstates its
watch guard on the hardware control knobs.
- If non-secure OS attempts to touch the hardware knobs without telling
secure OS to get away first, secure OS takes offence and either hangs the system or signals a fault.
What you are missing here is that secure OS "packing its things" is a lot more complex than simply "refusing to operate". Let's consider some scenarios:
First scenario, the non-secure OS tells the secure one to pack all its things, no matter whether it's CPU suspend or power-down:
1. Non-secure OS issues SMC #SECURE_PACK_ALL. 2. Secure OS needs to issue IPIs to all the CPUs that may be running secure code. 3. Non-secure OS performs CPU or cluster power down.
This is either inefficient (secure OS waking up all the CPUs that may be in suspend) or just not possible if some CPUs were in power-down mode rather than suspend. In addition, if only a CPU is in suspend, you still want the secure OS to work on the other CPUs. We can just dismiss the "pack all things" scenario.
Second scenario, per-CPU SMC #PREPARE_CPU_DOWN (or SUSPEND):
1. Non-secure OS issues SMC #PREPARE_CPU_DOWN. 2. Secure OS disables the MMU (coherency) and flushes its caches on that CPU. It then enables non-secure access to the power controller. 3. Non-secure OS performs CPU power down.
This scenario only works if the secure firmware can control CPU and cluster down independently. Let's assume this is doable, so we move to the next scenario.
Third scenario, per-cluster SMC #PREPARE_CLUSTER_DOWN (or SUSPEND) as a result of a 'last man' detection (in the non-secure OS):
1. Non-secure OS issues SMC #PREPARE_CLUSTER_DOWN 2. Secure OS disables the MMU on that CPU, flushes L1 and L2 caches, enables non-secure access to power controller. 3. Non-secure OS performs cluster power down.
At point 2 above, the secure OS has 3 options:
2.a) trusts the non-secure OS to have shut down the other CPUs. 2.b) issues IPI to the other CPUs in the cluster to pack things (flush caches, disable MMU). 2.c) refuses to enable non-secure access to power controller.
2.a breaks the security model. 2.b has the same issues with the first scenario (which CPUs to send the IPI to?). 2.c is the safest but it *requires* 'last man' state machine in the *secure* firmware (same with 2.b, it would need to track which CPUs in the cluster are still up).
You can do the above exercise again but instead of enabling non-secure access to the power controller, the firmware would perform the actual power controller action. You'll see that the cluster scenario still requires the firmware to track the 'last man' state.
You still want the power decision (policy) to happen in the non-secure OS but with the actual hardware access in firmware.
That's where things get murky. The policy comes as a result of last man determination, etc. In other words, the policy is not only about "I want to save power now". It is also "what kind of power saving I can afford now". And that's basically what MCPM does. With an abstract interface such as PSCI, that policy decision is moved into firmware.
Wrong. PSCI ops get an affinity parameter whether its CPU or cluster power down/suspend. Of course, you can always ask for cluster if only interested in power saving and PSCI can choose what it is safe. There isn't anything in PSCI that would take CPU vs cluster decision away from the non-secure OS.
And the MCPM framework is not the place for such CPU vs cluster policy either. This needs is decided higher up in the cpuidle subsystem and in an abstract terms like target residency, time taken to recover from various low power states. You may go for cluster down directly if 'last man' but may as well go for CPU down first even if 'last man'. This is a decision to be taken by the cpuidle governor and *not* by MCPM. PSCI already allows this via affinity parameter.
Of course, we could have said a PSCI CPU_POWER_DOWN with cluster affinity should return an error if not 'last man'. But this would have required a duplicate 'last man' state machine in the non-secure OS.
same malicious use-case reasons, the firmware cannot afford to rely on the non-secure OS to prevent "last man" races.
Again, by the time non-secure OS attempts to determine the last man, it should have told the secure OS to take cover.
It cannot take cover entirely just because a CPU is going into idle. See above.
Shifting the privilege levels down, a better analogy would be user application (non-root since root has a special 'trusted' status in Linux) able to control coherency and CPU/cluster shutdown *without* having to do system calls. Would you feel comfortable with this?
Let me propose a counter-example: with PSCI and the power management in secure firmware is like having the GNOME Power Manager compiled into the kernel. It would work of course, but maybe the GNOME developers would prefer dealing with it in user space instead without having to update the kernel when there is a bug in the Power Manager.
I understand your uneasiness with more complex firmware but I now wonder whether you completely missed the point of PSCI. I'll restate - it does *not* take away the power management policy from the non-secure, high-level OS. It does what it is *asked* to do and in a safe, secure manner. This safety *requires* 'last man' state machine in the firmware. You can have another state machine in the non-secure OS if you want/need to, but as I said above, such CPU vs cluster should be decided based on cost, residency and that's by the cpuidle governor. MCPM or PSCI can only choose the safest state for the security level they run at.
The only workaround is not to trust things controlled from outside that privilege levels, in such case coherency. This pretty much means UP only.
And why is that a problem? Certainly the secure OS shouldn't need to be that CPU hungry, just like the Linux kernel is not supposed to take away too much CPU cycles from user space.
It's not about CPU intensive tasks. It is about the secure OS being available on all CPUs in an MP system. This secure OS can just be a library with a big lock to serialise MP access. But as long as it needs to run code on more than one CPU it needs to rely safe cache coherency.
Imagine a secure OS which gets some data from secure storage and provides it to the non-secure OS:
1. Such data is copied from a PIO secure device as a result of an FIQ. If the FIQ happens on CPU0, the secure firmware would dirty the caches on that CPU. 2. A non-secure OS asks for that data on CPU1 via an SMC. 3. The secure OS performs a memcpy from the buffer previously allocated for the FIQ copy to the non-secure buffer.
The above is a normal secure service provided to the non-secure OS and memcpy in step 3 requires cache coherency, otherwise CPU1 can access old data and leak information.
You can think of other scenarios where a (malicious) full cluster is shut down but the non-secure OS 'missed' one of the CPUs (third scenario above). Same loss of data, information leak.
Catalin