After set negative boost value it impacts task placement and OPP
selection. For task placement, the scheduler uses function
boosted_task_util() to get smaller value for negative boost value, so it
give more chance for task can fit low capacity CPU; as result this
biases to place tasks on low capacity CPU (Like LITTLE core for ARM
big.LITTLE system). In current code, the waken up path uses this method
to avoid migration task with negative boost value to big core, but in
load balance flow there has no any checking for task with negative
value; so finally it still migrate tasks with negative boosting value to
big core.
So this patch checks task with negative boost value in load balance flow
and avoid to migrate it to big CPU if the task can fit low capacity CPU.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
kernel/sched/fair.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 77ca4df..c22d256 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6747,17 +6747,30 @@ static inline int migrate_degrades_locality(struct task_struct *p,
static
int can_migrate_task(struct task_struct *p, struct lb_env *env)
{
- int tsk_cache_hot;
+ int tsk_cache_hot, boost;
+ unsigned long cpu_rest_util;
lockdep_assert_held(&env->src_rq->lock);
/*
* We do not migrate tasks that are:
- * 1) throttled_lb_pair, or
- * 2) cannot be migrated to this CPU due to cpus_allowed, or
- * 3) running (obviously), or
- * 4) are cache-hot on their current CPU.
+ * 1) task has negative boost value and task fits cpu, or
+ * 2) throttled_lb_pair, or
+ * 3) cannot be migrated to this CPU due to cpus_allowed, or
+ * 4) running (obviously), or
+ * 5) are cache-hot on their current CPU.
*/
+ if (energy_aware() &&
+ capacity_orig_of(env->dst_cpu) > capacity_orig_of(env->src_cpu)) {
+
+ boost = schedtune_task_boost(p);
+ cpu_rest_util = cpu_util(env->src_cpu) - task_util(p);
+ cpu_rest_util = max(0UL, cpu_rest_util);
+
+ if (boost < 0 && __task_fits(p, env->src_cpu, cpu_rest_util))
+ return 0;
+ }
+
if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
return 0;
--
1.9.1
o This patch series is to evaluate if can use rb tree to track task
load and util on rq; there have some concern for this method is:
rb tree has O(log(N)) computation complexity, so this will introduce
extra workload by rb tree's maintainence. For this concern using
hackbench to do stress testing, hackbench will generate mass tasks
for message sender and receiver, so there will have many enqueue
and dequeue operations, so we can use hackbench to get to know if
rb tree will introduce big workload or not (Thanks a lot for Chris
suggestion for this).
Another concern is scheduler has provided LB_MIN feature, after
enable feature LB_MIN the scheduler will avoid to migrate task with
load < 16. Somehow this also can help to filter out big tasks for
migration. So we need compare power data between this patch series
with directly setting LB_MIN.
o Testing result:
Tested hackbench on Hikey with CA53x8 CPUs with SMP load balance:
time sh -c 'for i in `seq 100`; do /data/hackbench -p -P > /dev/null; done'
real user system
baseline 6m00.57s 1m41.72s 34m38.18s
rb tree 5m55.79s 1m33.68s 34m08.38s
For hackbench test case we can see with rb tree it even has better
result than baseline kernel.
Tested video playback on Juno for LB_MIN vs rb tree:
LB_MIN Nrg:LITTLE Nrg:Big Nrg:Sum
---------------------------------------------------------
11.3122 8.983429 20.295629
11.337446 8.174061 19.511507
11.256941 8.547895 19.804836
10.994329 9.633028 20.627357
11.483148 8.522364 20.005512
avg. 11.2768128 8.7721554 20.0489682
rb tree Nrg:LITTLE Nrg:Big Nrg:Sum
---------------------------------------------------------
11.384301 8.412714 19.797015
11.673992 8.455219 20.129211
11.586081 8.414606 20.000687
11.423509 8.64781 20.071319
11.43709 8.595252 20.032342
avg. 11.5009946 8.5051202 20.0061148
vs LB_MIN +1.99% -3.04% -0.21%
o Known issues:
For patch 2, function detach_tasks() iterates rb tree for tasks, if
there have one task has been detached then it calls rb_first() to
fetch first node and it will iterate again from first node; it's
better to use rb_next() but after change to use rb_next() will
introduce panic.
Welcome any suggestion for better implementation for it.
Leo Yan (3):
sched/fair: support to track biggest task on rq
sched/fair: select biggest task for migration
sched: remove unused rq::cfs_tasks
include/linux/sched.h | 1 +
include/linux/sched/sysctl.h | 1 +
kernel/sched/core.c | 2 -
kernel/sched/fair.c | 123 ++++++++++++++++++++++++++++++++++++-------
kernel/sched/sched.h | 5 +-
kernel/sysctl.c | 7 +++
6 files changed, 116 insertions(+), 23 deletions(-)
--
1.9.1
Dear Dev,
This is to confirm that one or more of your parcels has been shipped.
Please, download Delivery Label attached to this email.
Yours trully,
Everett Bray,
Sr. Operation Manager.
Dear Dev,
Courier was unable to deliver the parcel to you.
Please, open email attachment to print shipment label.
Yours faithfully,
Ramon Klein,
FedEx Delivery Agent.
Dear Dev,
This is to confirm that one or more of your parcels has been shipped.
Shipment Label is attached to email.
Sincerely,
Jonathan Stanley,
FedEx Station Agent.
o This patch series include performance optimization and some fixes.
One main purpose is to resolve performance issues for
multi-threading, this is finished by patch 0001, 0003, 0005 and
0006; also includes one main fix for tipping point which is
finished by patch 0007.
o All these patches have been tested on Juno R2 board. Especially for
performance optimization patches, the testing result is consistent
and repeatable on Juno board. This will make sure we have more
confidience to upstream these patches into Android common kernel and
mainline kernel.
The testing enviornment is based on ARM LT git tree:
https://git.linaro.org/landing-teams/working/arm/kernel-release.git
branch: origin/lsk-4.4-armlt-experimental
Test case: Geekbench with workload-automation
Test setting:
echo 0 > /proc/sys/kernel/sched_migration_cost_ns
echo 1 > /proc/sys/kernel/sched_domain/cpu0/domain0/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu0/domain1/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu1/domain0/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu1/domain1/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu2/domain0/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu2/domain1/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu3/domain0/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu3/domain1/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu4/domain0/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu4/domain1/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu5/domain0/busy_factor
echo 1 > /proc/sys/kernel/sched_domain/cpu5/domain1/busy_factor
o Test result:
Optimization with Patch 0001:
baseline Patch 0001 Opt.
Geekbench ST: 953.2 966.2 1.36%
Geekbench MT: 2175.8 2280.8 4.83%
Optimization with Patch 0003:
baseline Patch 0001+0003 Opt.
Geekbench ST: 953.2 969.2 1.68%
Geekbench MT: 2175.8 2356.8 8.32%
Optimization with all patches:
baseline All Patch Opt.
Geekbench ST: 953.2 968.6 1.62%
Geekbench MT: 2175.8 2371.2 8.98%
For performance improvment, three main contributed patches are:
0001: ~4.83%, 0003: ~3.3%, 0005: ~0.7%.
Also need note one thing is: usually sched_migration_cost_ns also has
big impaction on multi-threading performance, but we cannot see
prominent boosting on Juno board; the mainly reason is Juno board has
only 2 big cores.
o Compared to RFCv4 version [1], I have dropped all power optimization
related patches. The related patches are important for power saving,
but in the patches there have many hard-coded code but not general
enough. So I'd like to split these patches into a individe patch set.
[1] https://lists.linaro.org/pipermail/eas-dev/2016-September/000543.html
Leo Yan (7):
sched/fair: kick nohz idle balance for misfit task
sched/fair: replace capacity_of by capacity_orig_of
sched/fair: fall back to traditional wakeup migration when system is
busy
sched/fair: fix build error for schedtune_task_margin
sched/fair: force load balance when busiest group is overloaded
Documentation: use sysfs for EAS performance tunning
sched/fair: consider CPU overutilized only when it is not idle
Documentation/scheduler/sched-energy.txt | 24 ++++++++++++++
kernel/sched/fair.c | 57 +++++++++++++++++++++++++++-----
2 files changed, 72 insertions(+), 9 deletions(-)
--
1.9.1