Hello eas-dev!
I'm pleased to announce that EAS development is moving to the next
version of the android common kernel, android-4.9.
* EAS development will be done in a new android-4.9-eas-dev branch
* android-4.9-eas-dev will be merged into android-4.9 twice during
the period January - June 2018
* EAS functionality in android-4.4 is frozen
* an android-4.4-eas-test branch is provided to help testing new EAS
features on android-4.4 devices
* assembly of an android common kernel based upon 4.14 is underway
Q&A:
* Why have you moved to android-4.9?
* Partners developing devices have largely completed their
android-4.4 derived device kernels and continuous development
of EAS features is disruptive to tuning efforts
* Device kernels derived from android-4.9 are in active development
* Will you deliver new EAS patches to android-4.4?
* The plan is to only do fixes for critical bugs for android-4.4
* How will you be confident your patches are OK when you don't have
devices running android-4.9 kernels yet?
* This is the reason that the android-4.4-eas-test branch exists
* This branch will contain patches which are merged into
android-4.9-eas-dev and can be used to help test on device
kernels derived from android-4.4
* The content will be whatever patches are necessary to be able to
add patches from android-4.9-eas-dev cleanly, plus the patches
from android-4.9-eas-dev
* android-4.4-eas-test will be updated until we have a product
quality device for testing with android-4.9 derived kernels
* What is the expected patch flow for testing eas-dev patches on
android-4.4?
* first cherry-pick the patches from android-4.4-eas-test to the
device kernel
* next cherry-pick in-development patches from android-4.9-eas-dev
gerrit reviews
* run tests to obtain power and performance numbers from real
product-quality environments
* How critical are you going to be for patches sent to
android-4.9-eas-dev?
* Patches accepted there must be of good code quality and have at
least one of the four necessary attributes:
1. Must reduce energy consumption
2. Must improve performance
3. Must bring android EAS closer to mainline
4. Must fix a bug
* All patches must pass checkpatch.pl
* Given that you intend to merge android-4.9-eas-dev into android-4.9,
will you freeze it at any time?
* Yes. The intention is to have a 1 month stabilization ahead of
each merge (January and June)
* For the January merge, stabilization will begin December 1st,
2017.
* For the June merge, stabilization will begin May 1st, 2018
* During stabilization, only fixes will be taken
* Will there be merges in-between January and June?
* We do not plan to do this right now, but in principle it can be
done
* When will android-4.4-eas-test update after android-4.9-eas-dev
merges into android-4.9?
* We intend to add patches to android-4.4-eas-test for review soon
after merging them
* What happens if there is a bug in the merged branch?
* A fix will be provided to android-4.9 and android-4.9-eas-dev
* The fix will be reflected in android-4.4-eas-test
* Can I expect this to happen again any time soon?
* Yes, there has been a new android common kernel based on a new
LTS branch each year so far
* Arm expects that pattern to continue
* If the pattern holds, in October 2018 the target android kernel
version for EAS development will be based on Linux 4.14
* We currently plan to use the same branching structure with the
version numbers changed
* Dates are projections based upon previous android releases and are
subject to change
* The kernel versions of eas-dev and eas-test branches are driven by
the availability of suitable development and testing platforms, so
are also subject to change
* What happens when you move to a 4.14 kernel?
* After changes are reviewed and merged into android-4.9 from
android-4.9-eas-test, those changes will be pushed for review
on the 4.14 android branch
* Anything merged in android's 4.14 branch which is broken will also
be patched
Warmest Regards,
Chris Redpath
Open Source Software Power Team @ arm
Here are some patches that are generally minor changes and I am posting them
together. Patches 1/5 and 2/5 are related to skipping cpufreq updates for the
dequeue of the last task before the CPU enters idle. That's just a rebase of
[1] mostly. Patches 3/5 and 4/5 fix some minor things I noticed after the
remote cpufreq update work. and patch 5/5 is just a small clean up of
find_idlest_group. Let me know your thoughts and thanks. I've based these
patches on peterz's queue.git master branch.
[1] https://patchwork.kernel.org/patch/9936555/
Joel Fernandes (5):
Revert "sched/fair: Drop always true parameter of
update_cfs_rq_load_avg()"
sched/fair: Skip frequency update if CPU about to idle
cpufreq: schedutil: Use idle_calls counter of the remote CPU
sched/fair: Correct obsolete comment about cpufreq_update_util
sched/fair: remove impossible condition from find_idlest_group_cpu
include/linux/tick.h | 1 +
kernel/sched/cpufreq_schedutil.c | 2 +-
kernel/sched/fair.c | 44 ++++++++++++++++++++++++++++------------
kernel/sched/sched.h | 1 +
kernel/time/tick-sched.c | 13 ++++++++++++
5 files changed, 47 insertions(+), 14 deletions(-)
--
2.15.0.rc2.357.g7e34df9404-goog
wltests (workload tests)
ARM is pleased to announce a new automated test suite for benchmarking Linux scheduler & EAS improvements on Android workloads.
wltests is built on top of Lisa and Workload Automation (in-development version of WA v3) with the goal of:
* automatically running a range of Android-based tests on a platform, collecting performance and power metrics
* comparing different kernel versions and/or kernel options
* analyzing differences using Lisa-based notebooks
* easier porting to custom platform
It is intended to allow full evaluation of EAS/scheduler changes with real Android workloads (for example PELT vs. WALT comparisons)
The current set of workloads are:
* Jankbench
* Exoplayer for video & audio playback tests
* Youtube (if gapps available on platform)
* PCmark
* Geekbench
* Homescreen (to measure steady state energy consumption)
Install entire Lisa first according to installation instructions (Lisa now includes an in-development version of WA v3)
https://github.com/ARM-software/lisa/wiki/Installation#required-dependencies
The VM can be used if you have incompatibilities with locally-installed python libraries
Please see README.md in the wltests directory:
https://github.com/ARM-software/lisa/tree/master/tools/wltests
If you have concerns about results being published for in-development hardware, comment out the commercial benchmarks (PCmark & Geekbench) in the agenda:
tools/wltests/agendas/sched-evaluation-full.yaml
Platform - currently only one public platform (Linaro HiKey960):
tools/wltests/platforms/hikey960_android-4.4
(this actually works for 4.4 and 4.9 based kernels)
Adding a new platform is easy - 3 files in platform directory
Any questions please let us know!
-- ARM powersoftware team
Hello there,
I would like to know if you are interested in acquiring Microsoft Dynamics Users List.
Information fields: Names, Title, Email, Phone, Company Name, Company URL, Company physical address, SIC Code, Industry, Company Size (Revenue and Employee).
If you are interested, let me know your targeted geography so that I will get back to you with the counts and more information.
Regards,
Erin
Marketing Executive
If you are not interested in receiving further emails, please answer back with "overlook" in the title.
capacity_spare_wake in the slow path influences choice of idlest groups,
as we search for groups with maximum spare capacity. In scenarios where
RT pressure is high, a sub optimal group can be chosen and hurt
performance of the task being woken up.
Several tests with results are included below to show improvements with
this change.
1) Hackbench on Pixel 2 Android device (4x4 ARM64 Octa core)
------------------------------------------------------------
Here we have RT activity running on big CPU cluster induced with rt-app,
and running hackbench in parallel. The RT tasks are bound to 4 CPUs on
the big cluster (cpu 4,5,6,7) and have 100ms periodicity with
runtime=20ms sleep=80ms.
Hackbench shows big benefit (30%) improvement when number of tasks is 8
and 32: Note: data is completion time in seconds (lower is better).
Number of loops for 8 and 16 tasks is 50000, and for 32 tasks its 20000.
+--------+-----+-------+-------------------+---------------------------+
| groups | fds | tasks | Without Patch | With Patch |
+--------+-----+-------+---------+---------+-----------------+---------+
| | | | Mean | Stdev | Mean | Stdev |
| | | +-------------------+-----------------+---------+
| 1 | 8 | 8 | 1.0534 | 0.13722 | 0.7293 (+30.7%) | 0.02653 |
| 2 | 8 | 16 | 1.6219 | 0.16631 | 1.6391 (-1%) | 0.24001 |
| 4 | 8 | 32 | 1.2538 | 0.13086 | 1.1080 (+11.6%) | 0.16201 |
+--------+-----+-------+---------+---------+-----------------+---------+
2) Rohit ran barrier.c test (details below) with following improvements:
------------------------------------------------------------------------
This was Rohit's original use case for a patch he posted at [1] however
from his recent tests he showed my patch can replace his slow path
changes [1] and there's no need to selectively scan/skip CPUs in
find_idlest_group_cpu in the slow path to get the improvement he sees.
barrier.c (open_mp code) as a micro-benchmark. It does a number of
iterations and barrier sync at the end of each for loop.
Here barrier,c is running in along with ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'
barrier.c can be found at:
http://www.spinics.net/lists/kernel/msg2506955.html
Following are the results for the iterations per second with this
micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
Intel x86 machine:
+--------+------------------+---------------------------+
|Threads | Without patch | With patch |
| | | |
+--------+--------+---------+-----------------+---------+
| | Mean | Std Dev | Mean | Std Dev |
+--------+--------+---------+-----------------+---------+
|1 | 539.36 | 60.16 | 572.54 (+6.15%) | 40.95 |
|2 | 481.01 | 19.32 | 530.64 (+10.32%)| 56.16 |
|4 | 474.78 | 22.28 | 479.46 (+0.99%) | 18.89 |
|8 | 450.06 | 24.91 | 447.82 (-0.50%) | 12.36 |
|16 | 436.99 | 22.57 | 441.88 (+1.12%) | 7.39 |
|32 | 388.28 | 55.59 | 429.4 (+10.59%)| 31.14 |
|64 | 314.62 | 6.33 | 311.81 (-0.89%) | 11.99 |
+--------+--------+---------+-----------------+---------+
3) ping+hackbench test on bare-metal sever (Rohit ran this test)
----------------------------------------------------------------
Here hackbench is running in threaded mode along
with, running ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'
This test is running on 2 socket, 20 core and 40 threads Intel x86
machine:
Number of loops is 10000 and runtime is in seconds (Lower is better).
+--------------+-----------------+--------------------------+
|Task Groups | Without patch | With patch |
| +-------+---------+----------------+---------+
|(Groups of 40)| Mean | Std Dev | Mean | Std Dev |
+--------------+-------+---------+----------------+---------+
|1 | 0.851 | 0.007 | 0.828 (+2.77%)| 0.032 |
|2 | 1.083 | 0.203 | 1.087 (-0.37%)| 0.246 |
|4 | 1.601 | 0.051 | 1.611 (-0.62%)| 0.055 |
|8 | 2.837 | 0.060 | 2.827 (+0.35%)| 0.031 |
|16 | 5.139 | 0.133 | 5.107 (+0.63%)| 0.085 |
|25 | 7.569 | 0.142 | 7.503 (+0.88%)| 0.143 |
+--------------+-------+---------+----------------+---------+
[1] https://patchwork.kernel.org/patch/9991635/
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Morten Ramussen <morten.rasmussen(a)arm.com>
Cc: Brendan Jackman <brendan.jackman(a)arm.com>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Tested-by: Rohit Jain <rohit.k.jain(a)oracle.com>
Signed-off-by: Joel Fernandes <joelaf(a)google.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 740602ce799f..487e485b3560 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5742,7 +5742,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
{
- return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
+ return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
}
/*
--
2.15.0.rc2.357.g7e34df9404-goog
Hi,
I tried an experiment this weekend - basically I have RT threads bound
to big CPUs running a fixed period load, with hack bench
running with all CPUs allowed. The system is a Pixel2 ARM big.LITTLE
8-core (4x4).
Basically, I changed capacity_orig_of to capacity_of in
capacity_spare_wake and wake_cap and I see a good performance
improvement. That makes sense because wake_cap would send the task
wake up to the slow-path if RT capacity was eating into the CFS
capacity on prev/current CPU, and capacity_spare_wake would find a
better group with spare-capacity deducted by the RT pressure capacity.
One of the concerns for such a change to wake_cap, that I had, was
that it might affect upstream cases that may still want to do a
select_idle_sibling even if the capacity on the previous/waker's CPU
was not enough after deducting RT pressure. In that case, the wake_cap
change to use capacity_of would cause it to enter the slow-path for
those cases I think.
Could you let me know your thoughts about such a change? I heard that
capacity_of was attempted before and there might be some cases to
consider. Anything from your previous experiences with this change
that you could share? Atleast for capacity_spare_wake, the
improvements seems to be worthwhile and dramatic in some cases. I also
have some more changes I am thinking off to find_idlest_group but I
wanted to start a discussion on the spare capacity idea first.
This is related to Rohit's work on RT Capacity awareness, I was
talking to him and we were discussing ideas on the implementation.
thanks,
- Joel
The blocked load and shares of root cfs_rqs is currently only
updated by a the CPU owning the rq. That means if a CPU goes
suddenly from being busy to totally idle, its load and shares are
not updated.
Schedutil works around this problem by ignoring the util of CPUs
that were last updated more than a tick ago. However the stale
load does impact task placement: elements that look at load and
util (in particular the slow-path of select_task_rq_fair) can
leave the idle CPUs un-used while other CPUs go unnecessarily
overloaded. Furthermore the stale shares can impact CPU time
allotment.
Two complementary solutions are proposed here:
1. When a task wakes up, if necessary an idle CPU is woken as if to
perform a NOHZ idle balance, which is then aborted once the load
of NOHZ idle CPUs has been updated. This solves the problem but
brings with it extra CPU wakeups, which have an energy cost.
2. During newly-idle load balancing, the load of remote nohz-idle
CPUs in the sched_domain is updated. When all of the idle CPUs
were updated in that step, the nohz.next_update field
is pushed further into the future. This field is used to determine
the need for triggering the newly-added NOHZ kick. So if such
newly-idle balances are happening often enough, no additional CPU
wakeups are required to keep all the CPUs' loads updated.
[eas-dev] Patch 2/3 here is to highlight a change I made from
Vincent's original patch, so that it can be reviewed more
easily - if the modification is accepted then I'll squash
it before posting this to LKML proper.
Brendan Jackman (2):
sched/fair: Refactor nohz blocked load udpates
sched/fair: Update blocked load from newly idle balance
Vincent Guittot (1):
sched: force update of blocked load of idle cpus
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 106 ++++++++++++++++++++++++++++++++++++++++++++-------
kernel/sched/sched.h | 2 +
3 files changed, 96 insertions(+), 13 deletions(-)
--
2.14.1