Hi Leo,
On 26/10/16 18:28, Leo Yan wrote:
o This patch series is to evaluate if can use rb tree to track task load and util on rq; there have some concern for this method is: rb tree has O(log(N)) computation complexity, so this will introduce extra workload by rb tree's maintainence. For this concern using hackbench to do stress testing, hackbench will generate mass tasks for message sender and receiver, so there will have many enqueue and dequeue operations, so we can use hackbench to get to know if rb tree will introduce big workload or not (Thanks a lot for Chris suggestion for this).
Another concern is scheduler has provided LB_MIN feature, after enable feature LB_MIN the scheduler will avoid to migrate task with load < 16. Somehow this also can help to filter out big tasks for migration. So we need compare power data between this patch series with directly setting LB_MIN.
I have difficulties to understand the whole idea here. Basically, you're still doing classical load-balancing (lb) (with the aim of setting env->imbalance ((runnable) load based) to 0. On a system like Hikey (SMP) any order (load or util related) of the tasks can potentially change how many tasks a dst cpu might pull (in case of an ordered list (large to small load) we potentially pull only one task (doesn't have to be the first one because 'task_h_load(p)/2 > env->imbalance', in case its load is smaller but close to env->imbalance/2). But how can this help to increase performance in a workload-agnostic way? Besides this periodic lb case, what's the benefit of potentially pulling the biggest task instead of an arbitrary one in active lb?
Now if we think about a big.LITTLE system and we are on DIE sd level, an ordered list (large to small) on little cpu and vice versa on big could potentially make sense. Not very scalable though because we're just exploiting the idea of _two_ items with an opposite property (big vs. LITTLE) and different order.
Another remark, where do you see the benefit of using util instead of load? In both cases, you do 'task_h_load(p)/2 > env->imbalance), env->imbalance -= task_h_load(p) and if (env->imbalance <= 0) break' i.e. a load based policy any way?
The LB_MIN thing (avoiding to consider tiny tasks for pulling) is a related thing but IMHO, aren't we seeing results which are pretty much inside the noise-cloud here for a particular system/workload combination?
Let's talk about it further in tomorrow's meeting.
-- Dietmar
o Testing result:
Tested hackbench on Hikey with CA53x8 CPUs with SMP load balance:
time sh -c 'for i in `seq 100`; do /data/hackbench -p -P > /dev/null; done' real user system baseline 6m00.57s 1m41.72s 34m38.18s rb tree 5m55.79s 1m33.68s 34m08.38s
For hackbench test case we can see with rb tree it even has better result than baseline kernel.
Tested video playback on Juno for LB_MIN vs rb tree:
LB_MIN Nrg:LITTLE Nrg:Big Nrg:Sum
11.3122 8.983429 20.295629 11.337446 8.174061 19.511507 11.256941 8.547895 19.804836 10.994329 9.633028 20.627357 11.483148 8.522364 20.005512 avg. 11.2768128 8.7721554 20.0489682
rb tree Nrg:LITTLE Nrg:Big Nrg:Sum
11.384301 8.412714 19.797015 11.673992 8.455219 20.129211 11.586081 8.414606 20.000687 11.423509 8.64781 20.071319 11.43709 8.595252 20.032342 avg. 11.5009946 8.5051202 20.0061148 vs LB_MIN +1.99% -3.04% -0.21%
o Known issues:
For patch 2, function detach_tasks() iterates rb tree for tasks, if there have one task has been detached then it calls rb_first() to fetch first node and it will iterate again from first node; it's better to use rb_next() but after change to use rb_next() will introduce panic.
Welcome any suggestion for better implementation for it.
Leo Yan (3): sched/fair: support to track biggest task on rq sched/fair: select biggest task for migration sched: remove unused rq::cfs_tasks
include/linux/sched.h | 1 + include/linux/sched/sysctl.h | 1 + kernel/sched/core.c | 2 - kernel/sched/fair.c | 123 ++++++++++++++++++++++++++++++++++++------- kernel/sched/sched.h | 5 +- kernel/sysctl.c | 7 +++ 6 files changed, 116 insertions(+), 23 deletions(-)