On Thu, Oct 13, 2016 at 02:35:57PM +0100, Patrick Bellasi wrote:
[...]
Method 1 can improve prominent performance on one big.LITTLE system (which has CA53x4 + CA72x4 cores), from the Geekbench testing result the score can improve performance ~5%.
Tested method 1 with Geekbench on the ARM Juno R2 board for multi-thread case, the score can be improved from 2348 to 2368, so can improve performance ~0.84%.
Am I correct on assuming that potentially different values can give us even better performance but we tried and tested only the two values you are proposing?
For the 1st test, the root cause was that tasks was hot on a CPU and can't be selected to migrate on other CPU because of its hotness so decreasing the sched_migration_cost_ns directly reduce the hotness period during which a task can't migrate on another CPU
Ok, I don't know how exactly this value is impacting on load balancing but, still, what Leo is proposing is to reduce the value from 50us to 0su.... even if the original value should be 500us: does Geekbenck tasks needs to migrate more often than 500us?
I think setting to 0 the most benefit is we do not miss any opportunity to migrate tasks if have imbalance in the first place. So it give more chance to load balance within CPUs.
This will be helpful for performance especially we want to spread tasks within big cluster.
If I'm not wrong 500us is quite likely lower than sched_min_granularity_ns (2.25 ms on my Nexus 5X).
Sorry for my lake of knowledge on that code but I would really like to know what are the real reasons for the speedup we get by completely disregarding the migration costs.
Maybe Geekbench is an heavily CPU bounded task with a small working set?
Geekbench generates per CPU's thread, and if all threads can run simultaneously on all CPUs then it can achieve higher score. So usually we want spread out all tasks as possible for it.
If that's the case, what's the impact of using 0 for sched_min_granularity_ns on tasks which have a bigger working set?
Let me explain the phenomenon for this case.
For EAS, if big tasks are migrated to big core and system is under tipping point, so task wakeup path is more likely to pack tasks to one or two CPUs. So if we don't set sched_min_granularity_ns to 0, the load balance may miss the opportunity to spread task to idle CPUs, so multi-threads are running on the same CPU and introduce scheduling latency.
After set sched_min_granularity_ns to 0, it will give more chance for task migration for imbalance. As result, the tasks can be spread out more quickly when load balance happens within CPUs.
That being said, i'm not sure that this should be put in the Documentation/scheduler/sched-energy.tx
I agree with Vincent on that.
So what's your suggestion for tracking this? Should write a dedicated doc file? My purpose is to easily tracking them and avoid to lose them; also want other guys to avoid to do duplicate things when they have same issue on their platform.
Or we still should use wiki page to track these sysfs setting?
Moreover, do we have any measure of the impact on energy consumption for the proposed value?
Tested method 2 on Juno as well, but it has very minor performance boosting.
That seems to support the idea that what you are proposing are values "optimal" only for performance on a specific platform. Isn't it?
Signed-off-by: Leo Yan leo.yan@linaro.org
Documentation/scheduler/sched-energy.txt | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt index dab2f90..c0e62fe 100644 --- a/Documentation/scheduler/sched-energy.txt +++ b/Documentation/scheduler/sched-energy.txt @@ -360,3 +360,27 @@ of the cpu from idle/busy power of the shared resources. The cpu can be tricked into different per-cpu idle states by disabling the other states. Based on various combinations of measurements with specific cpus busy and disabling idle-states it is possible to extrapolate the idle-state power.
+Performance tunning method +==========================
+Below setting may impact heavily for performance tunning:
+echo 0 > /proc/sys/kernel/sched_migration_cost_ns
+After set sched_migration_cost_ns to 0, it is helpful to spread tasks within +the big cluster. Otherwise when scheduler executes load balance, it calls +can_migrate_task() to check if tasks are cache hot or not and it compares +sched_migration_cost_ns to avoid migrate tasks frequently. This introduce side +effect to easily pack tasks on the same one CPU and introduce latency to +spread tasks within multi-cores, especially if we think about energy awared +scheduling to pack tasks on single CPU.
+echo 1 > /proc/sys/kernel/sched_domain/cpuX/domain0/busy_factor +echo 1 > /proc/sys/kernel/sched_domain/cpuX/domain1/busy_factor
+After set busy_factor to 1, it decreases load balance inteval time. So if we +take min_interval = 8, that means we permit the load balance interval = +busy_factor * min_interval = 8ms. So this will shorten task migration latency, +especially if we want to migrate a running task from little core to big core
+to trigger active load balance.
1.9.1
-- #include <best/regards.h>
Patrick Bellasi
-- #include <best/regards.h>
Patrick Bellasi