Re: [Question] MCPM Supporting For ARM64

9 Sep 2013

      On Sun, Sep 08, 2013 at 05:16:16PM +0100, Nicolas Pitre wrote:
...
On Sun, 8 Sep 2013, Catalin Marinas wrote:
...
On 7 Sep 2013, at 21:31, Nicolas Pitre nicolas.pitre@linaro.org wrote:
...
On Sat, 7 Sep 2013, Catalin Marinas wrote:
...
You still want the power decision (policy) to happen in the
non-secure OS but with the actual hardware access in firmware.
That's where things get murky.  The policy comes as a result of last man 
determination, etc.  In other words, the policy is not only about "I 
want to save power now".  It is also "what kind of power saving I can 
afford now".  And that's basically what MCPM does.  With an abstract 
interface such as PSCI, that policy decision is moved into firmware.
Wrong.  PSCI ops get an affinity parameter whether its CPU or cluster
power down/suspend.  Of course, you can always ask for cluster if only
interested in power saving and PSCI can choose what it is safe.  There
isn't anything in PSCI that would take CPU vs cluster decision away from
the non-secure OS.
And the MCPM framework is not the place for such CPU vs cluster policy
either.  This needs is decided higher up in the cpuidle subsystem and in
an abstract terms like target residency, time taken to recover from
various low power states.  You may go for cluster down directly if 'last
man' but may as well go for CPU down first even if 'last man'.  This is
a decision to be taken by the cpuidle governor and *not* by MCPM.  PSCI
already allows this via affinity parameter.
I think this shows a misunderstanding of the role of MCPM on your part.
My understanding is mostly based on what's currently in mainline and the
to be merged TC2 code. What I may not be aware of is future plans for
MCPM, the future use of residency parameter (which I don't think should
be handled in MCPM, see more below).
...
Indeed the cpuidle layer is responsible for deciding what level of power 
saving should be applied.  But that is done on a per CPU basis.  It 
*has* to be done on a per CPU basis because it is too difficult to track 
what's going on on the other CPUs in every subsystems interested in some 
form of power management.
I agree.
...
What MCPM does is to receive this power saving request from cpuidle on 
individual CPUs including their target residency, etc.  It also receives 
similar requests for CPU hotplug and so on. And then MCPM _arbitrates_ 
the combination of those requests according to
I also agree that (in the absence of anything else) MCPM needs to
arbitrate the combination of such requests.
...

the sttrictest restrictions in terms of wake-up latency of _all_

CPUs in the same power domain, and
Wouldn't the strictest restrictions just translate to
min(C-state(CPUs-in-cluster)), min(C-state(clusters)) etc.? IOW simple
if/then/or/and rules because deeper C states have higher target
residency and wake-up latency?
...

the state of the other CPUs which might be in the process of coming

back from an interrupt or any other event, and 3) the particularities
of the hardware platform where this is happening.
It's fine for MCPM to handle in the absence of any other synchronisation
(which could be firmware).
...
So the concept of "policy" has to be split in two parts: what is 
_desired_ by the upper layer such as cpuidle as determined by the 
governor and its view of the system load and utilisation patterns vs 
implied costs, and the second part which is the _possible_ power saving 
mode according to the sum of all the constraints presented to MCPM by 
various requestors.
And that's where I think MCPM (or PSCI) should only be concerned with
C-state concepts (and correct arbitration). Pushing actions based on the
expected residency down to the MCPM back-end is a bad design decision
IMHO.
Taking the TC2 code example (it may be extended, I don't know the plans
here) it seems that the cpuidle driver is only concerned with the C1
state (CPU rather than cluster suspend). IIUC, cpuidle is not aware of
deeper sleep states. The MCPM back-end would get an expected residency
information and make another decision for deeper sleep states. Where
does it get the residency information from? Isn't this the estimation
done by the cpuidle governor? At this point you pretty much move part of
cpuidle governor functionality (and its concepts like target residency)
down to the MCPM back-end level. Such split will have bad consequences
longer term with code duplication between back-ends, harder to
understand and maintain cpuidle decision points.
...
And because the action of shutting down a CPU or a cluster may take
some time (think of cache flushing) then those constraints may also
change _during_ the operation and proper measures should be taken to
re-evaluate the power management decision dynamically.  And that can
be achieved only by having simultaneous visibility into both the
higher level requirements and the lower level changing hardware
states.
I understand the races and how MCPM avoids them. But why not keep the
concepts clear: (1) residency and best C-state recommendation in cpuidle
(policy), (2) actual C-state hardware setting in MCPM (mechanism).
Point (1) is a cpuidle driver defining C states (for a single CPU, it
doesn't need to be concerned with cluster state, just abstract states):
C1: CPU suspend mode
  C2: cluster suspend mode
  C3: system suspend mode
  etc.
Each of these states have corresponding target_residency, exit_latency.
The cpuidle governor makes the best recommendation for a each CPU
individually. If, for example, it expects long sleep for a CPU, can ask
for (or recommend) a C2/C3 state directly.
Point (2) above is about MCPM (or PSCI) having an overall view of the
cluster/system that allows it to select the best safe recommended
C-state. Simplified pseudo-code:
if (all CPUs in cluster (have a recommended C2 state ||
    			 are in power-down) &&
        no CPU in cluster is coming up)
Enable cluster suspend
You can continue the logic for other C states, add more logic about CPUs
coming up to avoid races. But this still means the strictest of all
states (which normally means Cx stricter than Cy for x < y) in a
race-free manner.
What I don't get is why you want to make decisions based on expected
residency in the MCPM (framework or back-end). Isn't the C-state and the
strictness ordering enough?
...
So I reitterate my assertion that something is wrong in the overall 
secure OS architecture if it has to be that intimate with power 
management to the point of locking it up into firmware in order to 
remain secure.
One of the ARM security architecture features is secure vs non-secure
cache separation. Once the non-secure OS actions can affect the secure
caches, the security model is broken. In such case the only way the
secure OS can be secure is by not relying on its caches. That's a pretty
simple model.
-- 
Catalin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Question] MCPM Supporting For ARM64