Re: [Eas-dev] [PATCH v1 6/7] Documentation: use sysfs for EAS performance tunning

13 Oct 2016

On Thu, Oct 13, 2016 at 02:35:57PM +0100, Patrick Bellasi wrote:
[...]
...
...
...
...
Method 1 can improve prominent performance on one big.LITTLE system
(which has CA53x4 + CA72x4 cores), from the Geekbench testing result the
score can improve performance ~5%.
Tested method 1 with Geekbench on the ARM Juno R2 board for multi-thread
case, the score can be improved from 2348 to 2368, so can improve
performance ~0.84%.
Am I correct on assuming that potentially different values can give us
even better performance but we tried and tested only the two values
you are proposing?
For the 1st test, the root cause was that tasks was hot on a CPU and
can't be selected to migrate on other CPU because of its hotness so
decreasing the sched_migration_cost_ns directly reduce the hotness
period during which a task can't migrate on another CPU
Ok, I don't know how exactly this value is impacting on load balancing
but, still, what Leo is proposing is to reduce the value from 50us to
0su.... even if the original value should be 500us: does Geekbenck
tasks needs to migrate more often than 500us?
I think setting to 0 the most benefit is we do not miss any opportunity
to migrate tasks if have imbalance in the first place. So it give more
chance to load balance within CPUs.
This will be helpful for performance especially we want to spread tasks
within big cluster.
...
If I'm not wrong 500us is quite likely lower than sched_min_granularity_ns
(2.25 ms on my Nexus 5X).
Sorry for my lake of knowledge on that code but I would really like to
know what are the real reasons for the speedup we get by completely
disregarding the migration costs.
Maybe Geekbench is an heavily CPU bounded task with a small working set?
Geekbench generates per CPU's thread, and if all threads can run
simultaneously on all CPUs then it can achieve higher score. So
usually we want spread out all tasks as possible for it.
...
If that's the case, what's the impact of using 0 for sched_min_granularity_ns
on tasks which have a bigger working set?
Let me explain the phenomenon for this case.
For EAS, if big tasks are migrated to big core and system is under
tipping point, so task wakeup path is more likely to pack tasks to one
or two CPUs. So if we don't set sched_min_granularity_ns to 0, the load
balance may miss the opportunity to spread task to idle CPUs, so
multi-threads are running on the same CPU and introduce scheduling
latency.
After set sched_min_granularity_ns to 0, it will give more chance for
task migration for imbalance. As result, the tasks can be spread out
more quickly when load balance happens within CPUs.
...
...
That being said, i'm not sure that this should be put in the
Documentation/scheduler/sched-energy.tx
I agree with Vincent on that.
So what's your suggestion for tracking this? Should write a dedicated
doc file? My purpose is to easily tracking them and avoid to lose them;
also want other guys to avoid to do duplicate things when they have
same issue on their platform.
Or we still should use wiki page to track these sysfs setting?
...
...
...
Moreover, do we have any measure of the impact on energy consumption
for the proposed value?
...
Tested method 2 on Juno as well, but it has very minor performance
boosting.
That seems to support the idea that what you are proposing are values
"optimal" only for performance on a specific platform. Isn't it?
...
Signed-off-by: Leo Yan leo.yan@linaro.org
Documentation/scheduler/sched-energy.txt | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
index dab2f90..c0e62fe 100644
--- a/Documentation/scheduler/sched-energy.txt
+++ b/Documentation/scheduler/sched-energy.txt
@@ -360,3 +360,27 @@ of the cpu from idle/busy power of the shared resources. The cpu can be tricked
 into different per-cpu idle states by disabling the other states. Based on
 various combinations of measurements with specific cpus busy and disabling
 idle-states it is possible to extrapolate the idle-state power.



+Performance tunning method
+==========================



+Below setting may impact heavily for performance tunning:



+echo 0 > /proc/sys/kernel/sched_migration_cost_ns



+After set sched_migration_cost_ns to 0, it is helpful to spread tasks within
+the big cluster. Otherwise when scheduler executes load balance, it calls
+can_migrate_task() to check if tasks are cache hot or not and it compares
+sched_migration_cost_ns to avoid migrate tasks frequently. This introduce side
+effect to easily pack tasks on the same one CPU and introduce latency to
+spread tasks within multi-cores, especially if we think about energy awared
+scheduling to pack tasks on single CPU.



+echo 1 > /proc/sys/kernel/sched_domain/cpuX/domain0/busy_factor
+echo 1 > /proc/sys/kernel/sched_domain/cpuX/domain1/busy_factor



+After set busy_factor to 1, it decreases load balance inteval time. So if we
+take min_interval = 8, that means we permit the load balance interval =
+busy_factor * min_interval = 8ms. So this will shorten task migration latency,
+especially if we want to migrate a running task from little core to big core
+to trigger active load balance.
1.9.1
--
#include <best/regards.h>
Patrick Bellasi
--
#include <best/regards.h>
Patrick Bellasi

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH v1 6/7] Documentation: use sysfs for EAS performance tunning

Signed-off-by: Leo Yan leo.yan@linaro.org

+to trigger active load balance.