On Wed, Feb 27, 2013 at 09:28:26AM +0100, Vincent Guittot wrote:
Ok I don't like having a per cpu state in struct sched domain but for now I can't find anything better. So my suggestion is that we do this and describe well the race, define the issue in the changelog and code comments and explain how we are solving it. This way at least the issue is identified and known. Then later, on review or after the patch is upstream, if somebody with some good taste comes with a better idea, we consider it.
What do you think?
I don't have better solution than adding this state in the sched_domain if we want to keep the exact same behavior. This will be a bit of waste of mem because we don't need to update all sched_domain level (1st level is enough).
Or you can try something like the below. Both flags and sched_domain share the same object here so the same RCU lifecycle. And there shouldn't be more overhead there since accessing rq->sd_rq.sd is the same than rq->sd_rq in the ASM level: only one pointer to dereference.
Also rq_idle becomes a separate value from rq->nohz_flags. It's a simple boolean (just making it an int here because boolean size are a bit opaque, although they are supposed to be char, let's just avoid surprises in structures).
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cc03cfd..16c0d55 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -417,7 +417,10 @@ struct rq {
#ifdef CONFIG_SMP struct root_domain *rd; - struct sched_domain *sd; + struct sched_domain_rq { + struct sched_domain sd; + int rq_idle; + } __rcu *sd_rq;
unsigned long cpu_power;
@@ -505,9 +508,14 @@ DECLARE_PER_CPU(struct rq, runqueues);
#ifdef CONFIG_SMP
-#define rcu_dereference_check_sched_domain(p) \ - rcu_dereference_check((p), \ - lockdep_is_held(&sched_domains_mutex)) +#define rcu_dereference_check_sched_domain(p) ({\ + struct sched_domain_rq *__sd_rq = rcu_dereference_check((p), \ + lockdep_is_held(&sched_domains_mutex)); \ + if (!__sd_rq) \ + NULL; \ + else \ + &__sd_rq->sd; \ +})
/* * The domain tree (rq->sd) is protected by RCU's quiescent state transition. @@ -517,7 +525,7 @@ DECLARE_PER_CPU(struct rq, runqueues); * preempt-disabled sections. */ #define for_each_domain(cpu, __sd) \ - for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \ + for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd_rq); \ __sd; __sd = __sd->parent)
#define for_each_lower_domain(sd) for (; sd; sd = sd->child)