If a CPU accesses the runnable_avg_sum and runnable_avg_period fields of its buddy CPU while the latter updates it, it can get the new version of a field and the old version of the other one. This can generate erroneous decisions. We don't want to use a lock mechanism for ensuring the coherency because of the overhead in this critical path. The previous attempt can't ensure coherency of both fields for 100% of the platform and use case as it will depend of the toolchain and the platform architecture. The runnable_avg_period of a runqueue tends to the max value in less than 345ms after plugging a CPU, which implies that we could use the max value instead of reading runnable_avg_period after 345ms. During the starting phase, we must ensure a minimum of coherency between the fields. A simple rule is runnable_avg_sum <= runnable_avg_period.
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org --- kernel/sched/fair.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fc93d96..f1a4c24 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5153,13 +5153,16 @@ static bool numa_allow_migration(struct task_struct *p, int prev_cpu, int new_cp static bool is_buddy_busy(int cpu) { struct rq *rq = cpu_rq(cpu); + u32 sum = rq->avg.runnable_avg_sum; + u32 period = rq->avg.runnable_avg_period; + + sum = min(sum, period);
/* * A busy buddy is a CPU with a high load or a small load with a lot of * running tasks. */ - return ((rq->avg.runnable_avg_sum << rq->nr_running) > - rq->avg.runnable_avg_period); + return ((sum << rq->nr_running) > period); }
static bool is_light_task(struct task_struct *p)