I have faced a sequence where the Idle Load Balance was sometime not triggered for a while on my platform.
CPU 0 and CPU 1 are running tasks and CPU 2 is idle
CPU 1 kicks the Idle Load Balance CPU 1 selects CPU 2 as the new Idle Load Balancer CPU 1 sets NOHZ_BALANCE_KICK for CPU 2 CPU 1 sends a reschedule IPI to CPU 2 While CPU 2 wakes up, CPU 0 or CPU 1 migrates a waking task A on CPU 2 CPU 2 finally wakes up, runs task A and discards the Idle Load Balance Task A quickly goes back to sleep (before a tick occurs on CPU 2) CPU 2 goes back to idle with NOHZ_BALANCE_KICK set
Whenever CPU 2 will be selected for the ILB, reschedule IPI will be not sent to CPU2, which is idle, because NOHZ_BALANCE_KICK is already set and no Idle Load Balance will be performed.
We must wait for the sched softirq to be raised on CPU 2 thanks to another part of the kernel to clear NOHZ_BALANCE_KICKand come back to a normal situation.
The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if we can't raise the sched_softirq for the Idle Load Balance.
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org --- kernel/sched/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 58453b8..51fc715 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1420,7 +1420,8 @@ void scheduler_ipi(void) if (unlikely(got_nohz_idle_kick() && !need_resched())) { this_rq()->idle_balance = 1; raise_softirq_irqoff(SCHED_SOFTIRQ); - } + } else + clear_bit(NOHZ_BALANCE_KICK, nohz_flags(smp_processor_id())); irq_exit(); }