On Mon, Feb 03, 2014 at 12:21:16PM +0530, Viresh Kumar wrote:
Sorry was away for short vacation.
On 28 January 2014 19:20, Frederic Weisbecker fweisbec@gmail.com wrote:
On Thu, Jan 23, 2014 at 07:50:40PM +0530, Viresh Kumar wrote:
Wait, I got the wrong code here. That's wasn't my initial intention. I actually wanted to write something like this:
wake_up_nohz_cpu(cpu);
if (!tbase_get_deferrable(timer->base) || idle_cpu(cpu))
wake_up_nohz_cpu(cpu);
Will that work?
Something is seriously wrong with me, again wrote rubbish code. Let me phrase what I wanted to write :)
"don't send IPI to a idle CPU for a deferrable timer."
Probably I code it correctly this time atleast.
wake_up_nohz_cpu(cpu);
if (!(tbase_get_deferrable(timer->base) && idle_cpu(cpu)))
wake_up_nohz_cpu(cpu);
Yeah but that's racy if the target is nohz full. We may be seeing it idle whereas it woke up lately and run in userspace tickless for a while.
Well, this is going to wake up the target from its idle state, which is what we want to avoid if the timer is deferrable, right?
Yeah, sorry for doing it for second time :(
I'm certainly not blaming you for being confused, that would be the pot calling the kettle black ;)
The simplest thing we want is:
if (!tbase_get_deferrable(timer->base) || tick_nohz_full_cpu(cpu)) wake_up_nohz_cpu(cpu);
This spares the IPI for the common case where the timer is deferrable and we run in periodic or dynticks-idle mode (which should be 99.99% of the existing workloads).
I wasn't looking at this problem with NO_HZ_FULL in mind. As I thought its only about if the CPU is idle or not. And so the solution I was talking about was:
"don't send IPI to a idle CPU for a deferrable timer."
But I see that still failing with the code you wrote. For normal cases where we don't enable NO_HZ_FULL, we will still end up waking up idle CPUs which is what Lei Wen reported initially.
Not with the small change I proposed above. I'm applying it.
Also if a CPU is marked for NO_HZ_FULL and is not idle currently then we wouldn't send a IPI for a deferrable timer. But we actually need that, so that we can reevaluate the timers order again?
Right.