Currently energy calculation in EAS has missed to consider RT pressure,
it's quite possible to select CPU for CFS tasks which has high RT
pressure and finally accumulate total utilization; as result the other
low RT pressure CPUs lose chance to run CFS tasks and reduce contention
between CFS and RT tasks, from performance view this is not optimal;
furthermore this also harms power data due pack RT task and CFS task on
single one CPU is more easily to trigger CPU frequency increasing.
We can measure the summed CPU utilization and calculate the CPU freqency
standard deviation to get to if the tasks can be well spreading within
the same cluster for middle workload case. So below is the comparison
result for video playback on Hikey960 for before and after applied this
patch set (Using schedutil CPUFreq governor):
Without Patch Set: With Patch Set:
CPU Min(Util) Mean(Util) Mean(Util) | Min(Util) Mean(Util) Mean(Util)
0 7 67 205 | 8 52 170
1 4 53 227 | 9 47 188
2 4 57 191 | 8 38 192
3 4 35 165 | 16 47 146
s.d. 1.5 13.3 25.9 | 3.9 5.83 20.9
4 0 35 160 | 10 34 129
5 0 24 129 | 0 30 115
6 0 18 123 | 0 18 95
7 0 12 84 | 0 21 73
s.d. 0 9.8 31.2 | 5 7.5 24.4
The standard diviation for CPU utilization mean value has been decreased
after applying this patch set (Little cluster: 13.3 vs 5.83, big cluster:
9.8 vs 7.5). This also confirm from the average CPU frequency:
Without Patch Set: With Patch Set:
Average Frequency | Average Frequency
LITTLT Cluster 737MHz | 646MHz
big Cluster 916MHz | 922MHz
Leo Yan (4):
sched/fair: Select maximum spare capacity for idle candidate CPUs
sched: Introduce cpu_util_sum()/__cpu_util_sum() functions
sched/fair: Consider RT pressure for find_best_target()
sched/fair: Consider RT/DL pressure for energy calculation
kernel/sched/fair.c | 22 +++++++++++++++++++---
kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++
2 files changed, 48 insertions(+), 3 deletions(-)
--
1.9.1
capacity_spare_wake in the slow path influences choice of idlest groups,
as we search for groups with maximum spare capacity. In scenarios where
RT pressure is high, a sub optimal group can be chosen and hurt
performance of the task being woken up.
Several tests with results are included below to show improvements with
this change.
1) Hackbench on Pixel 2 Android device (4x4 ARM64 Octa core)
------------------------------------------------------------
Here we have RT activity running on big CPU cluster induced with rt-app,
and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
runtime=20ms sleep=80ms.
Hackbench shows big benefit (30%) improvement when number of tasks is 8
and 32: Note: data is completion time in seconds (lower is better).
Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
+--------+-----+-------+-------------------+---------------------------+
| groups | fds | tasks | Without Patch | With Patch |
+--------+-----+-------+---------+---------+-----------------+---------+
| | | | Mean | Stdev | Mean | Stdev |
| | | +-------------------+-----------------+---------+
| 1 | 8 | 8 | 1.0534 | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
| 2 | 8 | 16 | 1.6219 | 0.16631 | 1.6391 (-1%) | 0.24001 |
| 4 | 8 | 32 | 1.2538 | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
+--------+-----+-------+---------+---------+-----------------+---------+
2) Rohit ran barrier.c test (details below) with following improvements:
------------------------------------------------------------------------
This was Rohit's original use case for a patch he posted at [1] however
from his recent tests he showed my patch can replace his slow path
changes [1] and there's no need to selectively scan/skip CPUs in
find_idlest_group_cpu in the slow path to get the improvement he sees.
barrier.c (open_mp code) as a micro-benchmark. It does a number of
iterations and barrier sync at the end of each for loop.
Here barrier,c is running in along with ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'
barrier.c can be found at:
http://www.spinics.net/lists/kernel/msg2506955.html
Following are the results for the iterations per second with this
micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
Intel x86 machine:
+--------+------------------+---------------------------+
|Threads | Without patch | With patch |
| | | |
+--------+--------+---------+-----------------+---------+
| | Mean | Std Dev | Mean | Std Dev |
+--------+--------+---------+-----------------+---------+
|1 | 539.36 | 60.16 | 572.54 (+6.15%) | 40.95 |
|2 | 481.01 | 19.32 | 530.64 (+10.32%)| 56.16 |
|4 | 474.78 | 22.28 | 479.46 (+0.99%) | 18.89 |
|8 | 450.06 | 24.91 | 447.82 (-0.50%) | 12.36 |
|16 | 436.99 | 22.57 | 441.88 (+1.12%) | 7.39 |
|32 | 388.28 | 55.59 | 429.4 (+10.59%)| 31.14 |
|64 | 314.62 | 6.33 | 311.81 (-0.89%) | 11.99 |
+--------+--------+---------+-----------------+---------+
3) ping+hackbench test on bare-metal sever (Rohit ran this test)
----------------------------------------------------------------
Here hackbench is running in threaded mode along
with, running ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'
This test is running on 2 socket, 20 core and 40 threads Intel x86
machine:
Number of loops is 10000 and runtime is in seconds (Lower is better).
+--------------+-----------------+--------------------------+
|Task Groups | Without patch | With patch |
| +-------+---------+----------------+---------+
|(Groups of 40)| Mean | Std Dev | Mean | Std Dev |
+--------------+-------+---------+----------------+---------+
|1 | 0.851 | 0.007 | 0.828 (+2.77%)| 0.032 |
|2 | 1.083 | 0.203 | 1.087 (-0.37%)| 0.246 |
|4 | 1.601 | 0.051 | 1.611 (-0.62%)| 0.055 |
|8 | 2.837 | 0.060 | 2.827 (+0.35%)| 0.031 |
|16 | 5.139 | 0.133 | 5.107 (+0.63%)| 0.085 |
|25 | 7.569 | 0.142 | 7.503 (+0.88%)| 0.143 |
+--------------+-------+---------+----------------+---------+
[1] https://patchwork.kernel.org/patch/9991635/
Matt Fleming also ran cyclictest and several different hackbench tests
on his test machines to santiy-check that the patch doesn't harm any
of his usecases.
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Morten Ramussen <morten.rasmussen(a)arm.com>
Cc: Brendan Jackman <brendan.jackman(a)arm.com>
Tested-by: Rohit Jain <rohit.k.jain(a)oracle.com>
Tested-by: Matt Fleming <matt(a)codeblueprint.co.uk>
Signed-off-by: Joel Fernandes <joelaf(a)google.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56f343b8e749..ba9609407cb9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
{
- return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
+ return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
}
/*
--
2.15.0.448.gf294e3d99a-goog
Good day!
I have noticed since release that EM for the Pixel 2 doesnt cover each
frequency step. 22 steps for small cores, 31 steps for big cores.
There are 22 tuples for the small cores but only 27 tuples for big cores.
I have checked and the Pixel 2 is using all frequency steps for both small
and big cores, so why doesnt the EM account for the last 4 freq steps for
big cores?
Thanks as always for taking the time to answer my questions.
Kind Regards,
Zachariah Kennedy
Hello EAS developers,
This email is to inform you about the latest EAS integration branch that
was published last Friday. All the information on where to get the
branch from are available at:
https://developer.arm.com/open-source/energy-aware-scheduling/EAS%20Mainlin…
The integration branch was conceived to keep the latest EAS patches on
track with tip/sched/core. Hence, on top of that the integration branch
puts:
- some new scheduler features, i.e. patches that relate to scheduler but
are not main components of EAS
- EAS-core patches
- debug patches, i.e. trace events, procfs interfaces, etc.
Integration will happen every two weeks. The above website covers the
main additions to each integration and the next work items for the ones
that will follow.
Kind regards,
Michele
Hello eas-dev!
I'm pleased to announce that EAS development is moving to the next
version of the android common kernel, android-4.9.
* EAS development will be done in a new android-4.9-eas-dev branch
* android-4.9-eas-dev will be merged into android-4.9 twice during
the period January - June 2018
* EAS functionality in android-4.4 is frozen
* an android-4.4-eas-test branch is provided to help testing new EAS
features on android-4.4 devices
* assembly of an android common kernel based upon 4.14 is underway
Q&A:
* Why have you moved to android-4.9?
* Partners developing devices have largely completed their
android-4.4 derived device kernels and continuous development
of EAS features is disruptive to tuning efforts
* Device kernels derived from android-4.9 are in active development
* Will you deliver new EAS patches to android-4.4?
* The plan is to only do fixes for critical bugs for android-4.4
* How will you be confident your patches are OK when you don't have
devices running android-4.9 kernels yet?
* This is the reason that the android-4.4-eas-test branch exists
* This branch will contain patches which are merged into
android-4.9-eas-dev and can be used to help test on device
kernels derived from android-4.4
* The content will be whatever patches are necessary to be able to
add patches from android-4.9-eas-dev cleanly, plus the patches
from android-4.9-eas-dev
* android-4.4-eas-test will be updated until we have a product
quality device for testing with android-4.9 derived kernels
* What is the expected patch flow for testing eas-dev patches on
android-4.4?
* first cherry-pick the patches from android-4.4-eas-test to the
device kernel
* next cherry-pick in-development patches from android-4.9-eas-dev
gerrit reviews
* run tests to obtain power and performance numbers from real
product-quality environments
* How critical are you going to be for patches sent to
android-4.9-eas-dev?
* Patches accepted there must be of good code quality and have at
least one of the four necessary attributes:
1. Must reduce energy consumption
2. Must improve performance
3. Must bring android EAS closer to mainline
4. Must fix a bug
* All patches must pass checkpatch.pl
* Given that you intend to merge android-4.9-eas-dev into android-4.9,
will you freeze it at any time?
* Yes. The intention is to have a 1 month stabilization ahead of
each merge (January and June)
* For the January merge, stabilization will begin December 1st,
2017.
* For the June merge, stabilization will begin May 1st, 2018
* During stabilization, only fixes will be taken
* Will there be merges in-between January and June?
* We do not plan to do this right now, but in principle it can be
done
* When will android-4.4-eas-test update after android-4.9-eas-dev
merges into android-4.9?
* We intend to add patches to android-4.4-eas-test for review soon
after merging them
* What happens if there is a bug in the merged branch?
* A fix will be provided to android-4.9 and android-4.9-eas-dev
* The fix will be reflected in android-4.4-eas-test
* Can I expect this to happen again any time soon?
* Yes, there has been a new android common kernel based on a new
LTS branch each year so far
* Arm expects that pattern to continue
* If the pattern holds, in October 2018 the target android kernel
version for EAS development will be based on Linux 4.14
* We currently plan to use the same branching structure with the
version numbers changed
* Dates are projections based upon previous android releases and are
subject to change
* The kernel versions of eas-dev and eas-test branches are driven by
the availability of suitable development and testing platforms, so
are also subject to change
* What happens when you move to a 4.14 kernel?
* After changes are reviewed and merged into android-4.9 from
android-4.9-eas-test, those changes will be pushed for review
on the 4.14 android branch
* Anything merged in android's 4.14 branch which is broken will also
be patched
Warmest Regards,
Chris Redpath
Open Source Software Power Team @ arm
Here are some patches that are generally minor changes and I am posting them
together. Patches 1/5 and 2/5 are related to skipping cpufreq updates for the
dequeue of the last task before the CPU enters idle. That's just a rebase of
[1] mostly. Patches 3/5 and 4/5 fix some minor things I noticed after the
remote cpufreq update work. and patch 5/5 is just a small clean up of
find_idlest_group. Let me know your thoughts and thanks. I've based these
patches on peterz's queue.git master branch.
[1] https://patchwork.kernel.org/patch/9936555/
Joel Fernandes (5):
Revert "sched/fair: Drop always true parameter of
update_cfs_rq_load_avg()"
sched/fair: Skip frequency update if CPU about to idle
cpufreq: schedutil: Use idle_calls counter of the remote CPU
sched/fair: Correct obsolete comment about cpufreq_update_util
sched/fair: remove impossible condition from find_idlest_group_cpu
include/linux/tick.h | 1 +
kernel/sched/cpufreq_schedutil.c | 2 +-
kernel/sched/fair.c | 44 ++++++++++++++++++++++++++++------------
kernel/sched/sched.h | 1 +
kernel/time/tick-sched.c | 13 ++++++++++++
5 files changed, 47 insertions(+), 14 deletions(-)
--
2.15.0.rc2.357.g7e34df9404-goog
wltests (workload tests)
ARM is pleased to announce a new automated test suite for benchmarking Linux scheduler & EAS improvements on Android workloads.
wltests is built on top of Lisa and Workload Automation (in-development version of WA v3) with the goal of:
* automatically running a range of Android-based tests on a platform, collecting performance and power metrics
* comparing different kernel versions and/or kernel options
* analyzing differences using Lisa-based notebooks
* easier porting to custom platform
It is intended to allow full evaluation of EAS/scheduler changes with real Android workloads (for example PELT vs. WALT comparisons)
The current set of workloads are:
* Jankbench
* Exoplayer for video & audio playback tests
* Youtube (if gapps available on platform)
* PCmark
* Geekbench
* Homescreen (to measure steady state energy consumption)
Install entire Lisa first according to installation instructions (Lisa now includes an in-development version of WA v3)
https://github.com/ARM-software/lisa/wiki/Installation#required-dependencies
The VM can be used if you have incompatibilities with locally-installed python libraries
Please see README.md in the wltests directory:
https://github.com/ARM-software/lisa/tree/master/tools/wltests
If you have concerns about results being published for in-development hardware, comment out the commercial benchmarks (PCmark & Geekbench) in the agenda:
tools/wltests/agendas/sched-evaluation-full.yaml
Platform - currently only one public platform (Linaro HiKey960):
tools/wltests/platforms/hikey960_android-4.4
(this actually works for 4.4 and 4.9 based kernels)
Adding a new platform is easy - 3 files in platform directory
Any questions please let us know!
-- ARM powersoftware team
Hello there,
I would like to know if you are interested in acquiring Microsoft Dynamics Users List.
Information fields: Names, Title, Email, Phone, Company Name, Company URL, Company physical address, SIC Code, Industry, Company Size (Revenue and Employee).
If you are interested, let me know your targeted geography so that I will get back to you with the counts and more information.
Regards,
Erin
Marketing Executive
If you are not interested in receiving further emails, please answer back with "overlook" in the title.