On Fri, Oct 28, 2016 at 10:19:41AM +0200, Vincent Guittot wrote:
On 28 October 2016 at 10:13, Leo Yan leo.yan@linaro.org wrote:
On Thu, Oct 27, 2016 at 08:37:05PM +0100, Dietmar Eggemann wrote:
Hi Leo,
On 26/10/16 18:28, Leo Yan wrote:
o This patch series is to evaluate if can use rb tree to track task load and util on rq; there have some concern for this method is: rb tree has O(log(N)) computation complexity, so this will introduce extra workload by rb tree's maintainence. For this concern using hackbench to do stress testing, hackbench will generate mass tasks for message sender and receiver, so there will have many enqueue and dequeue operations, so we can use hackbench to get to know if rb tree will introduce big workload or not (Thanks a lot for Chris suggestion for this).
Another concern is scheduler has provided LB_MIN feature, after enable feature LB_MIN the scheduler will avoid to migrate task with load < 16. Somehow this also can help to filter out big tasks for migration. So we need compare power data between this patch series with directly setting LB_MIN.
I have difficulties to understand the whole idea here. Basically, you're still doing classical load-balancing (lb) (with the aim of setting env->imbalance ((runnable) load based) to 0. On a system like Hikey (SMP) any order (load or util related) of the tasks can potentially change how many tasks a dst cpu might pull (in case of an ordered list (large to small load) we potentially pull only one task (doesn't have to be the first one because 'task_h_load(p)/2 > env->imbalance', in case its load is smaller but close to env->imbalance/2). But how can this help to increase performance in a workload-agnostic way?
On 4.4, the result is better than simple list. Vincent also suggested me to do comparision on mainline kernel, the result shows my rt-tree patches introduce performance regression:
mainline kernel real 2m23.701s user 1m4.500s sys 4m34.604s
mainline kernel + fork regression patch real 2m24.377s user 1m3.952s sys 4m39.928s real 2m19.100s user 0m48.776s sys 3m33.440s
mainline with big task tracking: real 2m28.837s user 1m16.388s sys 5m26.864s real 2m28.501s user 1m18.104s sys 5m30.516s
That would be interesting to understand where the huge difference between mainline above and your 1st test with v4.4 come from: 1st results on v4.4 were real user system baseline 6m00.57s 1m41.72s 34m38.18s rb tree 5m55.79s 1m33.68s 34m08.38s
Is the difference linked to v4.4 vs mainline ? different version of hackbench ? different version of rootfs/distro ? something else ?
Two things I think it's quite different between v4.4 and mainline kernel:
- Mainline kernel has not enabled CPUFreq, so suppose always run@1.2GHz; v4.4 kernel has enabled "interactive" governor;
- v4.4 has merged EAS and WALT related code, but when I tested I disable EAS by "echo NO_ENERGY_AWARE > sched_features"; also used PELT signals rather than WALT.
Thanks, Leo Yan