Hello everyone,
I hope this is the right place to ask, otherwise please just point me in the right direction.
I'm currently testing the EAS patches [v4] on an ODROID-XU3 board, which has an Exynos5422 SoC. However, the corresponding DT is missing the "dynamic-power-coefficient" that is needed for an appropriate EM.
So I'm trying to compute the dynamic-power-coefficient according to the formula:
Pdyn = dynamic-power-coefficient * V^2 * f
The frequency f is given by the DT.The actual Voltage and Power are determined by means of the on-chip sensors (returns the values for the specific cluster). When using the Voltage given in the DT, the differences for the d-p-coefficient are negligible.
So I'm calculating the d-p-coefficient (mW/MHz/uV^2) by reading out the following SoC sensor values.
For the little cluster (A7):
frequency(MHz) Voltage(V) Power(mW) Dynamic-power-coefficient
200 0.9175 49.470 ~2.938*10^-13
400 0.9165 91.892 ~2.736*10^-13
600 0.9638 149.454 ~2.682*10^-13
800 1.0263 223.453 ~2.652*10^-13
1000 1.1000 327.707 ~2.708*10^-13
1200 1.1725 445.899 ~2.703*10^-13
1400 1.2713 627.010 ~2.771*10^-13
For the big cluster (A15):
frequency(MHz) Voltage(V) Power(mW) Dynamic-power-coefficient
200 0.9162 159.676 ~9.510*10^-13
500 0.9138 325.480 ~7.797*10^-13
800 0.9288 511.360 ~7.410*10^-13
1100 1.0063 828.020 ~7.434*10^-13
1400 1.0713 1209.774 ~7.530*10^-13
1700 1.1750 1835.784 ~7.822*10^-13
2000 1.2700 2661.849 ~8.252*10^-13
But those values are way off for the DT, unless I multiply the d-p-coefficient with 10^15.
Assuming the Power value needs to be subtracted by a static component (power usage when idle), the results change as follows:
For the little cluster (A7):
frequency(MHz) Voltage(V) P-dyn(mW) Dynamic-power-coefficient
200 0.9175 36,381 ~2,160*10^-13
400 0.9165 73,438 ~2,186*10^-13
600 0.9638 122,470 ~2,197*10^-13
800 1.0263 184,135 ~2,185*10^-13
1000 1.1000 270,425 ~2,234*10^-13
1200 1.1725 367,121 ~2,225*10^-13
1400 1.2713 513,989 ~2,271*10^-13
For the big cluster (A15):
frequency(MHz) Voltage(V) P-dyn(mW) Dynamic-power-coefficient
200 0.9162 99,834 ~5,945*10^-13
500 0.9138 244,651 ~5,860*10^-13
800 0.9288 401,986 ~5,825*10^-13
1100 1.0063 664,565 ~5,966*10^-13
1400 1.0713 985,268 ~6,132*10^-13
1700 1.1750 1494,173 ~6,366*10^-13
2000 1.2700 2086,273 ~6,467*10^-13
While looking for examples in the kernel that calculate the dynamic-power-coefficient, I found this patch [1] by Caesar Wang that introduced this value for the rk3399 big cluster.
Unfortunately, I'm unable to reproduce the same results for the coefficient with the given values. My results are also with a 10^-13 factor and even when I scale them with 10^15, my results have a certain margin of error:
> frequency(MHz) Voltage(V) Current(mA) Dynamic-power-coefficient (My) d-p-coefficient
> 24 0.8 15
> 48 0.8 23 ~417 ~598
> 96 0.8 40 ~443 ~520
> 216 0.8 82 ~438 ~474
> 312 0.8 115 ~430 ~460
> 408 0.8 150 ~455 ~459
Is this the right approach, or am I missing something?
Hope someone can help me out.
Thanks in advance,
Oliver Effland
[1] https://patchwork.kernel.org/patch/9861505/
Dear Manager,
Wish your business booming all the time.
Here Greetings from Sera of Threeway Steel Co., Ltd. Glad to know you are in business of steel pipes from your website.
We are a leading professional manufacturer and trader of carbon steel pipe(WELDED STEEL PIPES, HOLLOW SECTION and SEAMLESS STEEL PIPES) and stainless steel pipe with more than 25 Years exporting experience.
* Best delivery time, Competitive price, Favorable payment terms(T/T, L/C,D/P,O/A) can be provided.
* Pipe Standard: API 5L, API 5CT, ASTM A53, EN10217, EN10210, BS138785, Material: GR.B,X42,X46,X52,X60,S355,ST37,ST52
* Surface: Lightly oiled,Hot dip galvanized,Electro galvanized,Black Bare,Varnish coating/Antu rust oil,Protective Coatings
* Packing: Bare of Bundled or pvc packing ect.
* Test: Chemical Component Analysis,Mechanical Properties and other inspection,Third Inspection:BV,SGS,LIoyds ect.
If you have inquiry about steel pipes, please do not hesitate to contact us, best offer will be sent asap.
Hope we could be good partners.
Sera Ma
Oversea sales specialist
----------------------------------------
Tel : +86 731 8872 7983 | Email:sales28@srtsteelpipe.com
Mob/whatsapp: +86 18620133787 | Skype: serama2016
Threeway Steel Co., Ltd | Web: www.srtsteelpipe.com
Add.:Hunan Steel Industrial Zone, No.9 Xiangfu Rd, Changsha, Hunan, China
P I am sorry if it sent to the wrong person, pls let me know.
It's highly appreciate if you transfer to the people who need it
From: "Joel Fernandes (Google)" <joel(a)joelfernandes.org>
Here's a very rough patch just to discuss prevention of decay of
CPU/task's util_avg signal incase its preempted by RT or DL. Its
likely not correct and needs more work but it solves the issue I see
with my synthetic test.
To reproduce the issue, I wrote a synthetic rt-app test with RT task
preempting a 100% CFS task for 300ms. https://pastebin.com/raw/rXNmRUZY
I have seen in traces that the util_avg decays quickly even before the
RT task sleeps.
The idea is to mark the CFS class from put_prev_task_fair if the task
was running while the put_prev_task() happened and unmark it from
pick_next_task_fair. This approach also keeps the solution of this
within CFS without modification to other classes. What do you think?
Note:
- Just for demo/trial, I took a full int from sched_avg. I am open to
any ideas on a flag we can set, all that's needed is 1 bit.
- The current UTIL_EST design cannot prevent this issue because it
relies on task's sleeping and infact could be affect by this issue
itself incase task's util-est EWMA hasn't built up.
- Vincent's patches for PELT in DL/RT are still needed for correct OPP
selection and estimation of capacity_of etc. This patch solves a
different issue.
CC: "eas-dev" <eas-dev(a)lists.linaro.org>
CC: vincent.guittot(a)linaro.org
CC: patrick.bellasi(a)arm.com
CC: dietmar.eggemann(a)arm.com
CC: chris.redpath(a)arm.com
CC: juri.lelli(a)redhat.com
CC: tkjos(a)google.com
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
include/linux/sched.h | 1 +
kernel/sched/fair.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ca3f3eae8980..46b686192641 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -404,6 +404,7 @@ struct sched_avg {
unsigned long runnable_load_avg;
unsigned long util_avg;
struct util_est util_est;
+ int freeze_updates;
} ____cacheline_aligned;
struct sched_statistics {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09b7eb69802c..521656b05b22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3103,6 +3103,13 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
u64 periods;
+ /*
+ * If an update is supposed to be skipped, then do nothing.
+ * sa->last_update_time should still moves forward.
+ */
+ if (sa->freeze_updates)
+ return 0;
+
scale_freq = arch_scale_freq_capacity(cpu);
scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
@@ -6963,6 +6970,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
p = task_of(se);
done: __maybe_unused;
+
#ifdef CONFIG_SMP
/*
* Move the next running task to the front of
@@ -6975,6 +6983,8 @@ done: __maybe_unused;
if (hrtick_enabled(rq))
hrtick_start_fair(rq, p);
+ rq->cfs.avg.freeze_updates = 0;
+ p->se.avg.freeze_updates = 0;
return p;
idle:
@@ -6991,6 +7001,7 @@ done: __maybe_unused;
if (new_tasks > 0)
goto again;
+ rq->cfs.avg.freeze_updates = 0;
return NULL;
}
@@ -7006,6 +7017,18 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev)
cfs_rq = cfs_rq_of(se);
put_prev_entity(cfs_rq, se);
}
+
+ /*
+ * Task is being put because of preemption either by CFS or by a higher
+ * class. Make sure no util updates happen.
+ */
+ if (!prev->state) { /* Task is put while running */
+ rq->cfs.avg.freeze_updates = 1;
+ prev->se.avg.freeze_updates = 1;
+ } else {
+ rq->cfs.avg.freeze_updates = 0;
+ prev->se.avg.freeze_updates = 0;
+ }
}
/*
--
2.18.0.rc1.242.g61856ae69a-goog
Hi,
Joel was looking at something else in android-4.14 wakeup path and he
noticed that we have a difference in behavior for prefer_idle tasks when
we're using find_best_target, so he asked if we could discuss that here.
Behavior in android-4.9 and earlier
-----------------------------------
When we find an idle CPU for a task which has the prefer_idle attribute,
we immediately return that CPU from the energy-based wakeup CPU
selection. This happens in slightly different places over the EAS and
android versions but it is always true that we take the recommended idle
CPU for tasks in this class.
How that changed in android-4.14
--------------------------------
android-4.14 has two different wakeup paths, selected with a
sched_feature FIND_BEST_TARGET. This defaults to true with the intent of
preserving the previous behavior. Both paths are different, so I'll
describe them below separately.
The two paths however share some common code - the way we integrated EAS
with the regular wakeup code is different in android-4.14.
There were two reasons for doing this.
1. minimize the differences in select_task_rq_fair wrt mainline code
2. make better use of the per-sd overutilization flags
Since we have per-sd overutilised flags, we attempt to perform an EAS
wakeup at the highest non-overutilised sched_domain - meaning that we
can still perform an energy-aware wakeup for small tasks inside a
non-overutilized group of small CPUs while potentially other groups of
CPUs are overutilized.
The decision about attempting to use energy awareness is taken in
wake_energy function at the top of strf - all the cases where we can't
use energy-awareness are ruled out, and the decision about using
find_idlest_cpu/EAS for prefer_idle tasks is also done here.
If we are using energy aware wakeups, then we will find the highest
non-overutilised SD to wake in.
In all cases where we do an energy aware wakeup but don't find any
suitable candidate CPUs we will go on to use find_idlest_cpu.
android-4.14 sched_feat(FIND_BEST_TARGET) true wakeup path
----------------------------------------------------------
When FIND_BEST_TARGET sched feature is on (the default), we call
find_best_target to populate the energy_env structure. This takes note
of the prefer_idle flag and the task boost to change which task
placement strategy will be used - the algorithm is the same as in
previous versions of android.
However in android-4.14 (unintentionally) the prefer_idle task placement
is not immediately acted upon - when a prefer_idle task is placed, we
will select the first idle CPU we see *but* this will become the target
CPU and we will still perform an energy diff and select between
prev/target based upon energy requirement.
The open question is - now that we have realized that there is a
different strategy in place, should we change it to be the same as the
old version? I think that we should - it will be a simple change to use
the idle cpu selected immediately without the energy diff.
All we would need is to add something like this:
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7383,7 +7383,7 @@ static int find_energy_efficient_cpu(struct
sched_domain *sd,
}
} else {
int boosted = (schedtune_task_boost(p) > 0);
- int prefer_idle;
+ int prefer_idle, target_cpu;
/*
* give compiler a hint that if sched_features
@@ -7396,10 +7396,18 @@ static int find_energy_efficient_cpu(struct
sched_domain *sd,
eenv->max_cpu_count = EAS_CPU_BKP + 1;
/* Find a cpu with sufficient capacity */
- eenv->cpu[EAS_CPU_NXT].cpu_id = find_best_target(p,
+ target_cpu = find_best_target(p,
&eenv->cpu[EAS_CPU_BKP].cpu_id,
boosted, prefer_idle);
+ /* immediately select idle CPUs for prefer-idle tasks */
+ if (prefer_idle && target_cpu >= 0 &&
+ idle_cpu(target_cpu)))
+ return target_cpu;
+
+ /* place target into NEXT slot */
+ eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
+
/* take note if no backup was found */
if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0)
eenv->max_cpu_count = EAS_CPU_BKP;
It could be the case that there are energy advantages to performing an
energy diff for prefer_idle tasks, we've tried a few times to come up
with a way to use lower-energy CPUs for these tasks where possible (for
cases like full-screen video playback or screen-off audio playback)
without harming the interactive response, but we haven't managed to come
up with anything which doesn't need per-platform tuning which isn't
really suitable. The underlying requirement for these prefer_idle tasks
is that they should ideally not be preempted as they are likely to be
part of the critical frame drawing path.
We should also note that the EAS_PREFER_IDLE sched_feature potentially
also has an effect on this - if we are using find_best_target and we do
not have EAS_PREFER_IDLE set to true (true is the default), then we
don't pass the prefer_idle status of the task to find_best_target -
however if EAS_PREFER_IDLE is not true, we should have avoided entering
find_energy_efficient_cpu in the first place.
To be complete, there is also a further sched_feat for task placement
which controls how we evaluate the energy requirement for a task. If
FBT_STRICT_ORDER is on (the default), we will select the first CPU we
find which saves energy compared to the prev_cpu. This matches previous
versions of android.
We have been experimenting with selecting the most efficient out of the
target/backup CPUs provided by find_best_target, if FBT_STRICT_ORDER is
false this is what the scheduler will do. In testing there isn't really
a clear winner out of these options, so I added the sched feature so
users could play with the options easily on their platforms without code
changes.
android-4.14 sched_feat(FIND_BEST_TARGET) false wakeup path
-----------------------------------------------------------
This uses the brute-force placement strategy from the pre-simplified-EM
mainline EAS patch set, inside the android energy diff algorithm.
In android, we collect all the potential CPU placement options in a data
structure, and then evaluate the energy for all of them at each
sched_domain traversal level. This is done to avoid visiting each domain
& group more than once.
Other than the switch to the new energy calculation method, the energy
required is still calculated for every allowable CPU and then the option
which has the lowest overall consumption is selected.
When it comes to dealing with prefer_idle tasks, there is another
optional behaviour in this mode. If you have sched_feat(EAS_PREFER_IDLE)
set to true (the default), then prefer_idle tasks have no special
treatment in the energy-aware wakeup path. This means they will get the
lowest-energy option with no regard to idleness.
However, if EAS_PREFER_IDLE is false (i.e. don't use EAS for prefer-idle
tasks) then we route these tasks through the slow path wakeup code - the
decision is made inside wake_energy, which propagates through to calling
find_idlest_cpu instead of find_energy_efficient_cpu.
If the find_best_target feature FBT_STRICT_ORDER is true (the default)
then the first CPU which saves energy compared to default will be
selected rather than the most efficient. When combined with the brute
force selection algorithm, this will result in the lowest-numbered cpu
which saves energy over the prev_cpu being selected.
This is not the way that the brute force (mainline-alike) placement is
intended to work, so when using brute force (FIND_BEST_TARGET=false) one
should also turn off strict ordering (FBT_STRICT_ORDER=false) *and* opt
out of using EAS for prefer_idle tasks (EAS_PREFER_IDLE=false).
--Chris