Re: [Eas-dev] Thoughts and Questions For EAS Energy Model

22 Sep 2015

      On Mon, Sep 21, 2015 at 05:31:37PM +0100, Morten Rasmussen wrote:
...
On Mon, Sep 21, 2015 at 06:58:30AM +0100, Leo Yan wrote:
...
On Fri, Sep 18, 2015 at 05:57:48PM +0100, Morten Rasmussen wrote:
[...]
...
...
...
...
Energy_cpu [j]
             = Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i))
             + Sum(i=WFI, CPUOff)(Power_Cstate [w](i) * Util_IDL(i))

Thoughts and Questions

Let's summary EAS's energy model as below:
CPU::capacity_state::power : CPU's power [w] for specific OPP
  Power(OPP)         = Ps [w] + Pd [w]
CPU::idle_state::power : CPU's power [w] for specific idle state
  Power(IDLE_WFI)    = Ps [w]
  Power(IDLE_CPUOff) = 0
CPU's IDLE_WFI means: CPU is clock gating, so has static leakage but
  don't include dynamic leakage.

Agreed, but if we imagine that we have state between WFI and CPUOff
which powers down a part of the cpu core, but not everything (like
CPUOff), it would consume
  Power(IDLE_CPUalmostOff) = a * Ps [w]
                             -> a = ratio of transistors powered down.

Totally agree that CPU may have other extra idle states, and for a
common solution, we should not expose the limitation on idle states.
...
F.5 assume that all transistors are affected, which holds as long as all
transitors in the power domains that we provide separate model data for
(cpu core and cluster) are all equally affected by each idle-state.
For one specific power state, whatever it's a P-state or C-state,
actually we need define it with three factors: voltage domain, power
domain, and clock domain. After we define well these factors for a
state, then we can easily to just apply F.5/F.6.
So just like the cases of "IDLE_CPUalmostOff" and "IDLE_CPUOFF", there
must be something difference b/t them, for example they have different
power domain but may have same clock domain and voltage domain. So
naturally we can calculate different power result for them.
Agreed, if we capture all power domains in the model applying F.5/F.6
isn't a problem as all transistors in the domain will be affected by
definition. However, it does mean potentially having power domains which
are more fine grained than just one cpu core. It doesn't map well to our
current model representation using the sched_domain hierarchy. Also,
while it makes a lot of sense from a theoretical point of view, I'm not
sure if we should worry about intra-core power domains. I don't see how
we would define them beyond just being some a interpolated factor which
would basically be 'a' in the above formula. 'a' should be sufficient
for what we want to do as well.
Understood, it's hard to handle the 'a' issue, especially if we want
to simplize the energy model parameters.
[...]
...
...
...
The WFI power is zero for practical reasons. It is not possible derive
the per-core WFI power with the energy counters. We can put all cpus
into WFI and measure the cluster energy, which would be the result of
F.13, but we have no way of figuring out how to decompose it into
cluster and cpu energy contributions. We have to account for all the
energy somewhere, so instead of assuming some arbitrary split between
cluster and cpu energy, we assume that it is all cluster energy. Hence,
the WFI power is accounted for in the cluster 'active idle' power.
IOW, it isn't missing, it is just accounted for somewhere else as we
didn't have a way to figure out the true split between cluster and core.
Yes, it's hard to extract power data independently for cluster level
and core level. The main reason is hard to get the delta value for WFI
if SoC don't support CPU's power off.
It is actually a generic problem, we can't derive the per-core idle
power for the deepest per-core state. If we had CPUOff and WFI, we could
measure the WFI-CPUOff delta, which would give us a non-zero WFI power.
But we run into the same problem with measuring CPUOff as we currently
have we WFI on TC2.
Also, the WFI-CPUOff delta isn't the true per-core WFI power, it is the
delta on top of the CPUOff power which we can't measure. So the whole
table of per-core idle-state power is offset such that CPUOff = 0 (or
whatever the deepest per-core state is). As with the TC2 case it doesn't
mean that the power is unaccounted for, it is just accounted for
elsewhere (in the cluster power).
Yes, exactly.
...
...
Just curious, if it's feasible with below steps to measure WFI state
in TC2?

Firstly measure the power date when cluster is powered off;
Then power on CPU0 only, and place CPU0 into "WFI":
Power_Delta0 = cluster level power + one CPU's "WFI";
Then power on CPU1, and place and can get:
Power_Delta1 = cluster level power + two CPUs' "WFI";
So finally can get "WFI" power = Power_Delta1 - Power_Delta2;

The key point is step 2, when power on one core, will other cores in
the same cluster be automatically be powered on as well?
Unfortunately yes. We only have one physical power domain which spans
the entire cluster. So you can only power up everything in one go. If
you try tricks like hotplugging cpus out, they are just parked in WFI by
the driver/firmware even though they are removed from as OS perspective.
It is a limitation in the hardware which we can't work around.
OK.
...
...
...
Talking about idle-state representation. The current idle-state tables
are quite confusing. We only have per-cpu states listed in the per-cpu
tables, and per-cluster in the per-cluster tables (+ active idle). This
is why we have WFI for the core tables and 'active idle' (WFI) + CLSOff
for the cluster tables for TC2. I'm planning on changing that so we have
the full list of states in all tables, but with zeros or repeated power
numbers for states that don't affect the associated power domain.
Here i think we should create a clear principle for enery model and
apply it. If we go back to review for state "WFI", its power
domain/voltage domain/clock domain are all in CPU's level but not in
cluster level. So the most reasonable calculation for 'active idle'
state should be despicted as below:
Energy [j] = Energy_cluster [j]
           + Sum(i=0..MAX_CPU_PER_CLUSTER)Energy_cpu(i) [j]
Energy_cluster [j] = Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i))
  Sum(i=0..MAX_OPP)Util_OPP(i) = 1
Energy_cpuE [j] = Power(IDLE_WFI)
So that means for 'active idle' state, all cpus stay in "WFI" state,
but for cluster level, actually it always stays in P-state but not C-state.
This is decided by cluster level's power domain/clock domain is always
ON for 'active idle'.
But now EAS consider cluster level as a idle state for 'active idle',
right?
Yes, but it isn't easy to generalize based on the TC2 model due the
limitations of TC2. From a model point of view we want to know which
state the cpu/cluster is in: Running or idling. The C-states represents
the hardware supported idle-states (controlling clock and/or power). An
idle cluster or core may idle in one of these states or sit idle with
everything power up and clocked. The latter is 'active idle'. A cluster
may be active idle if all the cpus are idling in some per-cpu idle-state
and the cpuidle governor has chosen to leave it powered up (possibly due
to target residency constraints). The same could in theory be the case
for a cpu core. It could be spinning in the idle loop if cpuidle didn't
decide to enter a C-state. On ARM WFI is practically free to enter, so
we always enter a proper hardware idle-state whenever we are idle. Even
if it is only for a single clock cycle. Hence, we would never be active
idling an ARM cpu, so WFI takes the role of active idle in this case. If
WFI had a target_residency that would prevent cpus to enter it and leave
them spinning, we would need an active idle state for the cpus as well.
In the model we treat active idle as an idle state despite the
cpu/cluster being fully operational and running. The reason for this is
that even though we are in some P-state, we aren't actually doing
anything useful and the power consumption is likely to be very different
from when we are busy. In the cluster active idle case, all the cpus are
idling, which means nobody is accessing caches and memory hence the
transistor toggling is very limited (though it might be affected by
snooping traffic if another cluster is busy). If we used the busy
P-state power, we would vastly over-estimate the active idle power for
the cluster in most cases. In the cpu case (if we weren't guaranteed to
enter WFI), we would be spinning in some simple loop that probably
wouldn't exercise the entire cpu core and hopefully use a little less
power (no cache access and expensive instructions).
Since we are technically running when active idling, one could argue
that we should have an active idle power number for each P-state. For
ARM that isn't an issue for per-core idling as we have WFI. For clusters
we may want to consider it.
The short answer is: In active idle the cpu/cluster is in a P-state
doing nothing. We can make WFI the active idle state per-core (cpu) on
ARM as we are guaranteed to enter it when the cpu is idle.
Agreed, here have two concerns:
- If take cluster's 'active idle' as an idle state, that means it will
  totally ignore Pd [w] for it. That means whatever frequency the cluster
  level is running at, the dynamic power leakage will be ignored.
Below are some power data on CA7 for 'active idle' data:
CPUFreq@156MHz: 11mA
  CPUFreq@312MHz: 28mA
  CPUFreq@624MHz: 36mA
  CPUFreq@800MHz: 45mA
  CPUFreq@1100Hz: 56mA
So in practice, if we use lowest frequency for cluster's 'active
  idle', it will have some deviation if cluster actually is running at
  highest frequency.
- There may have more than one kind of 'active idle' state for cluster;
  for example, all cores in cluster can into 'WFI' state will have a
  corresponding 'active idle' state; and all cores in cluster run into
  'CPUOFF' state will have another corresponding 'active idle' state.
  These two kind of 'active idle' state we also should handle as the same
  one?
Furthermore, if one CPU only run into 'WFI' and other CPUs in the cluster
  run into 'CPUOFF', how to select the 'active idle' state?
If we change to take 'active idle' state as cluster level's P-state,
upper issues can easily dismiss.
Thanks,
Leo Yan

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] Thoughts and Questions For EAS Energy Model