Re: [PATCH] sched: fix clear NOHZ_BALANCE_KICK

4 Jun 2013


      On 4 June 2013 16:44, Frederic Weisbecker fweisbec@gmail.com wrote:
...
On Tue, Jun 04, 2013 at 01:48:47PM +0200, Vincent Guittot wrote:
...
On 4 June 2013 13:19, Frederic Weisbecker fweisbec@gmail.com wrote:
...
On Tue, Jun 04, 2013 at 01:11:47PM +0200, Vincent Guittot wrote:
...
On 4 June 2013 12:26, Frederic Weisbecker fweisbec@gmail.com wrote:
...
On Tue, Jun 04, 2013 at 11:36:11AM +0200, Peter Zijlstra wrote:
...
The best I can seem to come up with is something like the below; but I think
its ghastly. Surely we can do something saner with that bit.
Having to clear it at 3 different places is just wrong.
We could clear the flag early in scheduler_ipi() and set some
specific value in rq->idle_balance that tells we want nohz idle
balancing from the softirq, something like this untested:
I'm not sure that we can have less than 2 places to clear it: cancel
place or acknowledge place otherwise we can face a situation where
idle load balance will be triggered 2 consecutive times because
NOHZ_BALANCE_KICK will be cleared before the idle load balance has
been done and had a chance to migrate tasks.
I guess it depends what is the minimum value of rq->next_balance, it seems
to be large enough to avoid this kind of incident. Although I don't
know well the whole logic with rq->next_balance and ilb trigger so I must
defer to you.
In the trace that was showing the issue, i can see that both CPU0 and
CPU1 were trying to trig ILB almost simultaneously and the
test_and_set NOHZ_BALANCE_KICK filters one request so i would say that
clearing the bit before the end of the idle load balance sequence can
generate such sequence
I see.
...
In the sequence below, i have minimized the clear of NOHZ_BALANCE_KICK
in 2 places : acknowledge and cancel. I have reused part of the
proposal from peter which clears the bit if the condition doesn't
match but i have reordered the tests to done that only if all other
condition are matching
static inline bool got_nohz_idle_kick(void)
 {

int cpu = smp_processor_id();
return idle_cpu(cpu) && test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu));


bool nohz_kick = test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu));

  if (!nohz_kick)


          return false;



  if (idle_cpu(cpu) && !need_resched())


          return true;



  clear_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu));


  return false;



}
#else /* CONFIG_NO_HZ_COMMON */
@@ -1393,8 +1401,9 @@ static void sched_ttwu_pending(void)
void scheduler_ipi(void)
 {

if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick()
&& !tick_nohz_full_cpu(smp_processor_id()))


if (llist_empty(&this_rq()->wake_list)
&& !tick_nohz_full_cpu(smp_processor_id())
&& !got_nohz_idle_kick())
return;

But we still need got_nohz_idle_kick() to be the first check, don't we? Otherwise
if we run an "idle -> quick task slice -> idle" sequence we may keep the flag
but lose the notifying IPI in between.
I'm not sure to catch the sequence you are describing above: "idle ->
quick task slice -> idle".
In addition, got_nohz_idle_kick must be the last tested condition (in
my proposal) in order to clear NOHZ_BALANCE_KICK only if we are sure
that we are going to return without possibility to trig the Idle load
balance
Vincent

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [PATCH] sched: fix clear NOHZ_BALANCE_KICK