Changelog: --------------------------------------------------------------------------- v1->v2: * Changed the dynamic threshold calculation as the having global state can be avoided.
v2->v3: * Split up the patch for find_idlest_cpu and select_idle_sibling code paths.
v3->v4: * Rebased it to peterz's tree (apologies for wrong tree for v3)
v4->v5: * Changed the threshold to 768 from 819 for easier shifts * Changed the find_idlest_cpu code path to be simpler * Changed the select_idle_core code path to search for idlest+full_capacity core * Added scaled capacity awareness to wake_affine_idle code path ---------------------------------------------------------------------------
During OLTP workload runs, threads can end up on CPUs with a lot of softIRQ activity, thus delaying progress. For more reliable and faster runs, if the system can spare it, these threads should be scheduled on CPUs with lower IRQ/RT activity.
Currently, the scheduler takes into account the original capacity of CPUs when providing 'hints' for select_idle_sibling code path to return an idle CPU. However, the rest of the select_idle_* code paths remain capacity agnostic. Further, these code paths are only aware of the original capacity and not the capacity stolen by IRQ/RT activity.
This patch introduces capacity awarness in scheduler (CAS) which avoids CPUs which might have their capacities reduced (due to IRQ/RT activity) when trying to schedule threads (on the push side) in the system. This awareness has been added into the fair scheduling class.
It does so by, using the following algorithm: 1) As in rt_avg the scaled capacities are already calculated.
2) Any CPU which is running below 80% capacity is considered running low on capacity.
3) During idle CPU search if a CPU is found running low on capacity, it is skipped if better CPUs are available.
4) If none of the CPUs are better in terms of idleness and capacity, then the low-capacity CPU is considered to be the best available CPU.
The performance numbers: --------------------------------------------------------------------------- CAS shows upto 1.5% improvement on x86 when running 'SELECT' database workload.
For microbenchmark results, I used hackbench running with process along with, running ping on CPU 0,1 and 2 as: 'ping -l 10000 -q -s 10 -f hostX'
The results below should be read as:
* 'Baseline without ping' is how the workload would've behaved if there was no IRQ activity.
* Compare 'Baseline with ping' and 'Baseline without ping' to see the effect of ping
* Compare 'Baseline with ping' and 'CAS with ping' to see the improvement CAS can give over baseline
Following are the runtime(s) with hackbench and ping activity as described above (lower is better), on a 44 core 2 socket x86 machine:
+---------------+------+--------+--------+ |Num. |CAS |Baseline|Baseline| |Tasks |with |with |without | |(groups of 40) |ping |ping |ping | +---------------+------+--------+--------+ | |Mean |Mean |Mean | +---------------+------+--------+--------+ |1 | 0.55 | 0.59 | 0.53 | |2 | 0.66 | 0.81 | 0.51 | |4 | 0.99 | 1.16 | 0.95 | |8 | 1.92 | 1.93 | 1.88 | |16 | 3.24 | 3.26 | 3.15 | |32 | 5.93 | 5.98 | 5.68 | |64 | 11.55| 11.94 | 10.89 | +---------------+------+--------+--------+
Rohit Jain (3): sched/fair: Introduce scaled capacity awareness in find_idlest_cpu code path sched/fair: Introduce scaled capacity awareness in select_idle_sibling code path sched/fair: Introduce scaled capacity awareness in wake_affine_idle code path
kernel/sched/fair.c | 66 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 13 deletions(-)
-- 2.7.4