The sched_mc feature has been originally designed to improve power consumption of multi-package system and several architecture functions are available to tune the topology and the scheduler's parameters when scheduler rebuilds the sched_domain hierarchy (change the sched_mc_power_savings level). This patches series is a trial to improve the power consumption of dual and quad cortex-A9 when the sched_mc_power_savings is set to 2. The following patch's policy is to accept up to 4 threads (can be configured) in the run queue of a core before starting to load balance if cpu runs at low frequencies but to accept only 1 thread for high frequencies, which is the normal behaviour. The goal is to use only one cpu in light load situation and both cpu in heavy load situation
Patches [1-3] modify the ARM cpu topology according to sched_mc_power_savings value and Cortex id Patch [4] enables ARCH_POWER feature of the scheduler Patch [5] adds ARCH_POWER function for ARM platform Patches [6-7] modify the cpu_power of CA-9 according to sched_mc_power_savings' level and current frequency. The main goal is to increase the capacity of a core when using low cpu frequency in order to pull tasks on this core. Note that this behaviour is not really advised but it can be seen as an intermediate step between the use of cpu hotplug (which is not a power saving feature) and a new load balancer which will take into account low load situation on dual core. Patch [8] ensures that cpu0 is used in priority when only one CPU is running Patch [9] adds some debugfs interface for test purpose Patch [10] ensures that the cpu_power will be update periodically Patch [11] fixes an issue around the trigger of ilb.
TODO list: -remove useless start of ilb when the core has capacity. -add a method (DT, sysfs, ...) to set threshold for using 1 or 2 cpus for dual CA-9 -irq balancing
The tests hereafter have been done on a u8500 with kernel linaro-3.1. They check that there is no obvious lost of performance when sched_mc=2.
sysbench --test=cpu --num-threads=12 --max-time=20 run Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug total number of events: 665 664 336 per-request statistics: min: 92.68ms 70.53ms 618.89ms avg: 361.75ms 361.38ms 725.29ms max: 421.08ms 420.73ms 840.74ms approx. 95 percentile: 402.28ms 390.53ms 760.17ms
sysbench --test=threads --thread-locks=9 --num-threads=12 --max-time=20 run Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug total number of events: 10000 10000 3129 per-request statistics: min: 1.62ms 1.70ms 13.16ms avg: 22.23ms 21.87ms 76.77ms max: 153.52ms 133.99ms 173.82ms approx. 95 percentile: 54.12ms 52.65ms 136.32ms
sysbench --test=threads --thread-locks=2 --num-threads=3 --max-time=20 run Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug total number of events: 10000 10000 10000 per-request statistics: min: 1.38ms 1.38ms 5.70ms avg: 4.67ms 5.37ms 11.85ms max: 36.84ms 32.42ms 32.58ms approx. 95 percentile: 14.34ms 12.89ms 21.30ms
cyclictest -q -t -D 20 Only one cpu is used during this test when sched_mc=2 whereas both cpu are used when sched_mc=0 Test execution summary: sched_mc=0 sched_mc=2 cpu hotplug
Avg, Max: 15, 434 19, 2145 17, 3556 Avg, Max: 14, 104 19, 1686 17, 3593
Regards, Vincent