On 18 February 2013 16:40, Frederic Weisbecker fweisbec@gmail.com wrote:
2013/2/18 Vincent Guittot vincent.guittot@linaro.org:
On 18 February 2013 15:38, Frederic Weisbecker fweisbec@gmail.com wrote:
I pasted the original at: http://pastebin.com/DMm5U8J8
We can clear the idle flag only in the nohz_kick_needed which will not be called if the sched_domain is NULL so the sequence will be
= CPU 0 = = CPU 1=
detach_and_destroy_domain { rcu_assign_pointer(cpu1_dom, NULL); }
dom = new_domain(...) { nr_cpus_busy = 0; set_idle(CPU 1); } dom = rcu_dereference(cpu1_dom) //dom == NULL, return
rcu_assign_pointer(cpu1_dom, dom);
dom =
rcu_dereference(cpu1_dom) //dom != NULL, nohz_kick_needed {
set_idle(CPU 1) dom = rcu_dereference(cpu1_dom)
//dec nr_cpus_busy, }
Vincent
Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is already in the middle of nohz_kick_needed().
Yes nothing prevents the sequence below to occur
= CPU 0 = = CPU 1= dom = rcu_dereference(cpu1_dom) //dom != NULL detach_and_destroy_domain { rcu_assign_pointer(cpu1_dom, NULL); }
dom = new_domain(...) { nr_cpus_busy = 0; //nr_cpus_busy in the new_dom set_idle(CPU 1); } nohz_kick_needed { clear_idle(CPU 1) dom = rcu_dereference(cpu1_dom)
//cpu1_dom == old_dom inc nr_cpus_busy,
//nr_cpus_busy in the old_dom }
rcu_assign_pointer(cpu1_dom, dom); //cpu1_dom == new_dom
I'm not sure that this can happen in practice because CPU1 is in interrupt handler but we don't have any mechanism to prevent the sequence.
The NULL sched_domain can be used to detect this situation and the set_cpu_sd_state_busy function can be modified like below
inline void set_cpu_sd_state_busy { struct sched_domain *sd; int cpu = smp_processor_id(); + int clear = 0;
if (!test_bit(NOHZ_IDLE, nohz_flags(cpu))) return; - clear_bit(NOHZ_IDLE, nohz_flags(cpu));
rcu_read_lock(); for_each_domain(cpu, sd) { atomic_inc(&sd->groups->sgp->nr_busy_cpus); + clear = 1; } rcu_read_unlock(); + + if (likely(clear)) + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); }
The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain attached to the CPU. With this implementation, we still don't need to get the sched_domain for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle
The patch 2 become useless
Vincent