From: Morten Rasmussen morten.rasmussen@arm.com
Hi Paul, Paul, Peter, Suresh, linaro-sched-sig, and LKML,
As a follow-up on my Linux Plumbers Conference talk about my experiments with scheduling on heterogeneous systems I'm posting a proof-of-concept patch set with my modifications. The intention behind the modifications is to tweak scheduling behaviour to only use fast (and power hungry) cores when it is necessary and also improve performance consistency. Without the modifications it is more or less random where tasks are scheduled and so is the execution time.
I'm seeing good improvements on performance consistency for web browsing on Android using Bbench http://www.gem5.org/Bbench on the ARM big.LITTLE TC2 chip, which has two fast cores (Cortex-A15) and three power-efficient cores (Cortex-A7). The total execution time numbers below are for Androids SurfaceFlinger process is key for page rendering performance. The average execution time is lower with the patches enabled and the standard deviation is much smaller. Similar improvements can be seen for the Android.Browser and WebViewCoreThread processes.
Total execution time statistics based on 50 runs.
SurfaceFlinger SMP kernel [s] HMP modifications [s] ------------------------------------------------------ Average 14.617 11.012 St. Dev. 4.577 0.902 10% Pctl. 9.343 10.783 90% Pctl. 18.743 11.695
Unfortunately, I cannot share power-efficiency numbers at this stage.
This patch set introduces proof-of-concept scheduler modifications which attempt to improve scheduling decisions on heterogeneous multi-processor systems (HMP) such as ARM big.LITTLE systems. The patch set relies on the entity load-tracking re-work patch set by Paul Turner:
https://lkml.org/lkml/2012/8/23/267
The modifications attempt to migrate tasks between cores with different compute capacity depending on the tracked load and priority. The aim is to only use fast cores for tasks which really need the extra performance and thereby improve power consumption by running everything else on the slow cores.
The patch introduces hmp_domains to represent the different types of cores that are available on the given platform. Multiple (>2) hmp_domains is supported but not tested. hmp_domains must be set up by platform code and the patch set includes patches for ARM platforms using device-tree.
The patches intentionally try to avoid modifying the existing code paths as much as possible. The aim is to experiment with HMP scheduling and get the overall policy right before integrating it properly with the existing load-balancer.
Morten
Morten Rasmussen (10): sched: entity load-tracking load_avg_ratio sched: Task placement for heterogeneous systems based on task load-tracking sched: Forced task migration on heterogeneous systems sched: Introduce priority-based task migration filter ARM: Add HMP scheduling support for ARM architecture ARM: sched: Use device-tree to provide fast/slow CPU list for HMP ARM: sched: Setup SCHED_HMP domains sched: Add ftrace events for entity load-tracking sched: Add HMP task migration ftrace event sched: SCHED_HMP multi-domain task migration control
arch/arm/Kconfig | 46 +++++ arch/arm/include/asm/topology.h | 32 +++ arch/arm/kernel/topology.c | 91 ++++++++ include/linux/sched.h | 11 + include/trace/events/sched.h | 153 ++++++++++++++ kernel/sched/core.c | 4 + kernel/sched/fair.c | 434 ++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 9 + 8 files changed, 779 insertions(+), 1 deletion(-)