Re: [PATCH] cpuidle: arm_big_little: pass target residency to mcpm

22 May 2013


      On Tue, May 21, 2013 at 10:08:29PM +0100, Sebastian Capella wrote:
...
Thanks Liviu!
Some comments below..
Quoting Liviu Dudau (2013-05-21 10:15:42)
...
... Which side of the interface are you actually thinking of?
Both, I'm really just trying to understand the problem.
...
I don't think there is any C-state other than simple idle (which
translates into an WFI for the core) that *doesn't* take into account
power domain latencies and code path lengths to reach that state.
I'm speaking more about additional c-states after the
lowest independent compute domain cstate, where we may add additional
cstates which reduce the power further at a higher latency cost.  These
may be changing power states for the rest of the SOC or external power
chips/supplies.  Those states would effectively enter the lowest PSCI
C-state, but then have additional steps in the CPUIdle hw specific
driver.
Quoting from the PSCI spec:
"ARM systems generally include a power controller which provides the necessary
mechanisms to control processor power. It normally provides interfaces to allow a number
of power management functions. These often include support for transitioning processors,
clusters or a superset, into low power states, where the processors are either fully switched
off, or in quiescent states where they are not executing code. ARM strongly recommends
that control of these states, via this power controller, is vested in the secure world.
Otherwise, the OSPM could enter a low power mode without informing the Trusted OS.
Even if such an arrangement could be made robust, it is unlikely to perform as well. In
particular, for states where the core is fully power gated, a longer boot sequence would
take place upon wake up as full initialization would be required by the secure world. This
would be required as the secure components would effectively be booting from scratch
every time. On a system where this power control is vested in the Secure world, these
components would have an opportunity to save their state before powering off, allowing a
faster resumption on power up. In addition, the secure world might need to manage
peripherals as part of a power transition."
If you don't have such a power controller in your system then yes, you
will have to drive the hardware from the CPUidle hw driver. But I don't
see the need of a separate C-state for that.
I would say that the list of C-states that I have listed further down
should cover most of the cases, maybe with the addition of an 
SYSTEM_SUSPEND state if I understood your concerns correctly.
Going on a tangent a bit:
To me, the C-states are like layers in an onion. Each deeper C-state
includes the previous C-states that came in the list earlier. Therefore,
you describe the C-state in terms of minimum total time to spend in that
state and it includes the worst transition times (cost of reaching
that state and to come out of it). Completely made up example:
CPU_ON          < 2ms
CPU_IDLE        > 2ms
CPU_OFF         > 10ms
CLUSTER_OFF     > 500ms
SYSTEM_SUSPEND  > 5min
SYSTEM_OFF      > 1h
If you do that then the CPUidle driver decision becomes as simple as
finding the right state that would not lead to a missed event and you
don't really have to understand the costs of the host OS (if there is
any). It should match the expectations of a real time system as well,
if the table is correctly fine tuned (and if one understands that a
real time system is about constant time response, not immediate response).
...
...
I don't know how to draw the line between the host OS costs and the
guest OS costs when using target latencies. On one hand I think that
the host OS should add its own costs into what gets passed to the
guest and the guest will see a slower than baremetal system in terms
of state transitions;
I was thinking maybe this also.. Is there a way to query the state
transition cost information through PSCI?  Would there be a way to
have the layers of hosts/monitors/etc contribute the cost of their
paths into the query results?
Possibly. PSCI spec doesn't specify any API for querying the C-state
costs because the way to do so is still in the air. We know that the
server world would like to carry on using ACPI for describing those
states, device tree-based systems would probably invent a different way
or learn how to integrate with ACPI.
...
...
... on the other hand I would like to see the
guest OS shielded from this type of information as there are too many
variables behind it (is the host OS also under some monitor code? are
all transitions to the same state happening in constant time or are
they dependent of number of cores involved, their state, etc, etc)
I agree, but don't see how.  In our systems, we do very much care about
the costs, and have ~real time constraints to manage.  I think
we need a good understanding of costs for the hw states.
And are those costs constant? Do you depend on how many CPUs you have
online to determine how long it will take to do a cluster shutdown? Does
having the DMA engine on add to the quiescence time? While I don't doubt
that you understand what are the minimum time constraints that the
hardware imposes, it's the combination of all the elements in the system
that is under software control that gives the final answer and in most
cases it is "depends".
...
...
If one uses a simple set of C-states (CPU_ON, CPU_IDLE, CPU_OFF, 
CLUSTER_OFF, SYSTEM_OFF) then the guest could make requests independent
of the host OS latencies _after the relevant translations between
time-to-next-event and intended target C-state have been performed_.
I think that if we don't know the real cost of entering a state,
we basically will end up chosing the wrong states in many occasions.
True. But that "real" cost is usually an estimate of the worst case, or
an average time, right?
...
CPUIdle is already binning the allowable costs into a specific state.
If we decide that CPUIdle does not know the real cost of the states then
the binning will be wrong sometimes, and cpuidle would not be selecting
the correct states.  I think this could have bad side effects for real time
systems.
CPUidle does know the costs. The "reality" of those costs depends on the
system you are running (virtualised or not, trusted OS trapping you calls
or not). If the costs do not reflect the actual transition time then yes,
CPUidle will make the wrong decision and the system won't work as intended.
I'm not advocating doing that.
Also, I don't understand your remark regarding real time systems. If the
CPUidle costs are wrong the decision will be wrong regardless of the type
of system you use. Or are you concerned that being too conservative and
lying to the OS about the actual cost for the system to transition to the
new state at that moment will introduce unnecessary delays and forgo the
real time functionality.
...
For my purposes and as things are today, I'd likely factor in the
(probably pre-known & measured) host os/monitor costs into the cpuidle
DT entries and have cpuidle run the show.  At the lower layers, it won't
matter what is passed through as long as the correct state is chosen.
Understood. I'm advocating the same thing with the only added caveat that
the state you choose is not a physical system state in all cases, but a
state that makes sense for the OS running at that level. As such, the
numbers that will be used by CPUidle will be in the "ballpark" region
rather than absolute numbers.
Any running OS should only be concerned with getting the time to the next
event right (be it real time constrained or not) and finding out which
C-state will guarantee availability at that time. If one doesn't know
when the next event will come then being conservative should be good
enough. There is no way you will have a ~real time system if you transition
to cluster off and the real cost of coming out is measured in miliseconds,
regardless of how you came to that decision.
Best regards,
Liviu
...
Thanks,
Sebastian
`
-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯_(ツ)_/¯

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH] cpuidle: arm_big_little: pass target residency to mcpm