Re: [PATCH v7] sched: fix init NOHZ_IDLE flag

23 Apr 2013

On 22 April 2013 22:10, Peter Zijlstra peterz@infradead.org wrote:
...
On Mon, 2013-04-22 at 17:38 +0200, Vincent Guittot wrote:
...

include/linux/sched.h |    1 +
 kernel/sched/fair.c   |   34 ++++++++++++++++++++++++----------
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..cde4f7f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -899,6 +899,7 @@ struct sched_domain {
      unsigned int wake_idx;
      unsigned int forkexec_idx;
      unsigned int smt_gain;

unsigned long nohz_flags;       /* NOHZ_IDLE flag status */
int flags;                      /* See SD_* */
int level;



There was a 4 byte hole here, I'm assuming you used unsigned long and
not utilized the hole because of the whole atomic bitop interface
taking unsigned long?
Bit wasteful but ok..
So we're only pulling NOHZ_IDLE out of nohz_flags()? Should we then not
also amend the rq_nohz_flag_bits enum? And it seems pointless to me to
set bit 2 our nohz_flags word if all other bits are unused.
Yes, i have been too quick to update the patchset. i'm going to remove
NOHZ_IDLE from the rq_nohz_flag_bits
...
...
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..09e440f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5394,14 +5394,21 @@ static inline void set_cpu_sd_state_busy(void)
 {
      struct sched_domain *sd;
      int cpu = smp_processor_id();


if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))


        return;


clear_bit(NOHZ_IDLE, nohz_flags(cpu));




int first_nohz_idle = 1;

rcu_read_lock();




for_each_domain(cpu, sd)




for_each_domain(cpu, sd) {


        if (first_nohz_idle) {


                if (!test_bit(NOHZ_IDLE, &sd->nohz_flags))


                        goto unlock;



                clear_bit(NOHZ_IDLE, &sd->nohz_flags);


                first_nohz_idle = 0;


        }


        atomic_inc(&sd->groups->sgp->nr_busy_cpus);


}



the mind boggles... what's wrong with something like:
static inline unsigned long *rq_nohz_flags(int cpu)
{
        return rcu_dereference(cpu_rq(cpu)->sd)->nohz_flags;
}
    if (!test_bit(0, rq_nohz_flags(cpu)))
            return;
    clear_bit(0, rq_nohz_flags(cpu));


AFAICT, if we use different rcu_dereferences for modifying nohz_flags
and for updating the nr_busy_cpus, we open a time window for
desynchronization, isn't it ?
That why i'm doing the update of rq_nohz_flags with the same
rcu_dereference than nr_busy_cpus
...
...
+unlock:
      rcu_read_unlock();
 }
@@ -5409,14 +5416,21 @@ void set_cpu_sd_state_idle(void)
 {
      struct sched_domain *sd;
      int cpu = smp_processor_id();


if (test_bit(NOHZ_IDLE, nohz_flags(cpu)))


        return;


set_bit(NOHZ_IDLE, nohz_flags(cpu));




int first_nohz_idle = 1;

rcu_read_lock();




for_each_domain(cpu, sd)




for_each_domain(cpu, sd) {


        if (first_nohz_idle) {


                if (test_bit(NOHZ_IDLE, &sd->nohz_flags))


                        goto unlock;



                set_bit(NOHZ_IDLE, &sd->nohz_flags);


                first_nohz_idle = 0;


        }


        atomic_dec(&sd->groups->sgp->nr_busy_cpus);


}



+unlock:
Same here, .. why on earth do it for every sched_domain for that cpu?
The update of rq_nohz_flags is done only once on the top level
sched_domain and the nr_busy_cpus is updated on each sched_domain
level in order to have a quick access to the number of busy cpu when
we check for the need to kick an idle load balance
...
...
  rcu_read_unlock();

}
And since its now only a single bit in a single word, we can easily
change it to something like:
if (*rq_nohz_idle(cpu))
        return;
  xchg(rq_nohz_idle(cpu), 1); /* set; use 0 for clear */
which drops the unsigned long nonsense..
that's fair. i'm going to use xchg instead of atomic
...
Or am I completely missing something obvious here?

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [PATCH v7] sched: fix init NOHZ_IDLE flag