On 10/12/2015 06:55 AM, Vincent Guittot wrote:
>> For the RT story, are you thinking to use rq->rt_avg in some way?
>
> Yes that was my goal but the deadline is also accounted in rq->rt_avg
> which is then used in scale_rt_capacity. I was planning to remove the
> deadline class from rq->rt_avg and to use the per cpu deadline
> bandwidth in scale_rt_capacity instead
If the deadline contribution to rq->rt_avg is left in place, can RT and
deadline both be accounted for via that one mechanism for the purposes
of sched-dvfs?
I'm not sure how the accuracy/behavior of the deadline accounting to
rq->rt_avg differs from the forthcoming patchset Juri mentioned, but if
it's not as good, perhaps the mechanisms could be combined so that
rt_avg (and by extension the per-CPU capacity adjustments in CFS)
benefit as well.
Hi all,
Below are some thoughts and questions after reviewed EAS's energy model; my
purpose is want to get clear the energy model from user's perspective, so
below question will _ONLY_ focus on the model and not dig into the
implementation.
This email is related long, but i think if use formulas, we can easily
get the same page; So i lists the energy model's formulas, then based
on them i try to match with TC2's power data and bring up some questions.
Look forward to your suggestions and comments.
* Basic Energy and Power Calculation Formulas
From the doc Documentation/scheduler/sched-energy.txt, we can get to know
the energy can be calculated with:
Energy [j] = Power [w] * Time [s] (F.1)
So let's assume there have one piece of code, which has fixed instruction
numbers will be executed on CPU, the execution duration is depend on CPU's
pipeline and CPU's frequency. So can convert F.1 to F.2:
Code [instructions]
Energy [j] = Power [w] * ------------------------------
(Inst Per Cycle) * Frequency
Code [instructions]
= Power [w] * ------------------------------ (F.2)
MIPS(f)
`-> 'f' is factor of frequency
Because MIPS(f) can be normalize as the CPU's capacity corresponding to
OPP, so we can simply convert from F.2 to F.3:
Code [instructions]
Energy [j] = Power [w] * ------------------------------ (F.3)
CPU_Capacity(f)
If breakdown Power[w], we can split it into two parts: static leakage, and
dynamic leakage:
Power [w] = Ps [w] + Pd [w] (F.4)
Static power leakage can be calculated with below formula:
Ps [w] = i * V [v] (F.5)
`-> 'i' is coefficient for according to silicon's process
V [v] is voltage according to OPP
Dynamic power leakage can be calculated with below formula:
Pd [w] = b * V [v] * V [v] * frequency (F.6)
`-> 'b' is coefficient for according to silicon's process
V [v] is voltage according to OPP
Here have two special cases, if the island's clock is gated, then
Pd [w] = 0, So:
Power [w] = Ps [w] (F.7)
If the island is powered off, then
Ps [w] = 0, Pd [w] = 0; So:
Power [w] = 0 (F.8)
So energy can be calculated as (come from F.3 and F.4):
Code [instructions]
Energy [j] = (Ps [w] + Pd [w]) * ---------------------- (F.9)
CPU_Capacity(f)
* Formulas for duty cycle
We separate the logic (cluster or CPU) into two states: P-state and C-state,
for P-state and C-state they have different power data, this is because
after the logic enter C-state, it will be clock gating or powered off. So if
we expand the time axis for relative long time, we need calculate CPU's
utilization percentage (for CPU is full running, util = 100%). Let's
simplize the ratio between "Code [instructions]" and "CPU_Capactity(f)" as
the utilization, So the energy calculation can be depicted as:
Code [instructions]
Util(f) = -------------------------- (F.10)
CPU_Capacity(f)
Energy [j] = Power_Pstate [w] * Util(f)
+ Power_Cstate [w] * (1 - Util(f)) (F.11)
(F.12)
Energy [j] = Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i))
+ Sum(i=0..MAX_IDL)(Power_Cstate [w](i) * Util_IDL(i))
Sum(i=0..MAX_OPP)Util_OPP(i) + Sum(i..MAX_IDLE)Util_IDL(i) = 1
* Formulas for clusters
(F.13)
Energy [j] = Energy_cluster [j]
+ Sum(i=0..MAX_CPU_PER_CLUSTER)Energy_cpu(i) [j]
(F.14)
Energy_cluster [j]
= Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i))
+ Sum(i=WFI, ClusterOff)(Power_Cstate [w](i) * Util_IDL(i))
(F.15)
Energy_cpu [j]
= Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i))
+ Sum(i=WFI, CPUOff)(Power_Cstate [w](i) * Util_IDL(i))
* Thoughts and Questions
- Let's summary EAS's energy model as below:
CPU::capacity_state::power : CPU's power [w] for specific OPP
Power(OPP) = Ps [w] + Pd [w]
CPU::idle_state::power : CPU's power [w] for specific idle state
Power(IDLE_WFI) = Ps [w]
Power(IDLE_CPUOff) = 0
CPU's IDLE_WFI means: CPU is clock gating, so has static leakage but
don't include dynamic leakage.
CLUSTER::capacity_state::power : Cluster's power [w] for specific OPP
Power(OPP) = Ps [w] + Pd [w]
CLUSTER::idle_state::power : CPU's power [w] for specific idle state
Power(IDLE_WFI) = Ps [w] + Pd [w]
Power(IDLE_CLSOff) = 0
Cluster's IDLE_WFI is quite special, means all CPUs in cluster have been
powered off, but cluster's logic (L2$ and SCU, etc) is powered on and clock
is enabled, so it includes cluster level's static power and dynamic power.
Are these formulas matching the original design?
- TC2's data for cluster's sleep:
static struct idle_state idle_states_cluster_a7[] = {
{ .power = 25 }, /* WFI */
{ .power = 10 }, /* cluster-sleep-l */
};
static struct idle_state idle_states_cluster_a15[] = {
{ .power = 70 }, /* WFI */
{ .power = 25 }, /* cluster-sleep-b */
};
For cluster level's sleep, the clock is gating and domain is powered off,
so the dynamic leakage and static leakge should be zero, right?
- TC2's data for CPU's idle state:
static struct idle_state idle_states_core_a7[] = {
{ .power = 0 }, /* WFI */
};
static struct idle_state idle_states_core_a15[] = {
{ .power = 0 }, /* WFI */
};
CPU has two idle state, one is 'WFI' and another is 'C2'; For 'WFI' state,
the power will not be zero, this is because 'WFI' state means internal
clock gating, so according to F.7, there should have static leakage.
BTW, for TC2, there have no corresponding idle state for 'C2', this is
weird. Could you confirm it has been delibrately removed?
- TC2's data for P-state:
static struct capacity_state cap_states_cluster_a7[] = {
/* Cluster only power */
{ .cap = 150, .power = 2967, }, /* 350 MHz */
[...]
};
static struct capacity_state cap_states_core_a7[] = {
/* Power per cpu */
{ .cap = 150, .power = 187, }, /* 350 MHz */
[...]
};
From previous experience, the CPU level's power leakage is very higher
than cluster level's leakage. For example, for CA7, if only power on cluster
(all CPUs in cluster are powered off), the power delta is ~10mA@156MHz; if
power on one CPUs, the power delta is about 30mA@156MHz. I also checked the
data for CA53, it has similar result.
So this is confilict with TC2's power data, you can see the cluster
level's power leakage is quite high (almost 15 times than CPU level). This
means almostly we cannot get much benefit from CPU level's low power
state, due cluster level will contribute most of power consumption. This
is not make sense.
- From formula F.4, we can combine power with static leakage and dynamic
leakage; IPA also used static/dynamic leakage to depict energy model. But
EAS uses another way, which provide the power data according to every OPP
and idle state. So that means on one platform, we need provide two kinds
of power data.
IMHO, i think the static and dynamic leakage is more simple; because
usually we will use (mW/MHz) to describe the power efficiency for specific
CPU, though (mW/MHz) cannot very accurately for power consumption if the
voltage has been changed (See formula F.6, usually the voltage will be
increased at higher frequency). But if we use mW/MHz, maybe we can
calculate with very simple way for we can just only use it to mulitplate
with frequency to get dynamic power.
So we only need provide below parameters:
P-state: static leakage, power efficiency (mW/MHz), capacity (DMIPS/MHz);
C-state: static leakage, power efficiency (mW/MHz);
What's the thoughts for unify the energy model?
Thanks,
Leo Yan
Hi eas-dev,
as pre-announced by Morten on that ML, last week we posted on LKML
an RFC introducing SchedTune, a proposal for a central tuning knob.
The current posting addresses DVFS biasing only. There will be a
follow up posting to deal with task placement via integration with EAS
as the upstream discussions move along.
The full cover letter can be found here:
http://www.kernelhub.org/?msg=819910&p=2
For your convenience the patch set is available here:
git://www.linux-arm.com/linux-power eas/stune/rfcv1
Cheers Patrick
--
#include <best/regards.h>
Patrick Bellasi
Hi eas-dev,
In case you don't follow LKML, we have posted RFCv5 of the EAS patch set
for review and comments. The main addition in this posting is
scheduler-driven DVFS along with a number of smaller fixes and
improvements. The full cover letter can be found here:
https://lkml.org/lkml/2015/7/7/754
For your convenience the patch set is available here:
http://www.linux-arm.org/git?p=linux-power.git
branch: energy_model_rfc_v5
Additional patches introducing a central tuning knob that influences
both task placement and DVFS will be posted later for discussion.
Thanks,
Morten
Hi Alex,
In 4.1-rc1, several patches (see 36ee28e4 onwards) related to cpu
capacity consolidation were merged.
It would be a good idea to refresh the eas-backport tree so that these
patches are cherry-picked directly from mainline into
stable/sched-upstream branch and their equivalent versions in
stable/sched-core are removed.
Regards,
Amit