On Thu, Jun 05, 2014 at 12:28:28PM +0530, Viresh Kumar wrote:
When a timer is enqueued or modified on a NO_HZ_FULL target (local or remote), the target is expected to re-evaluate its timer wheel to decide if tick must be restarted to handle timer expirations.
Right but the scope is broader, it concerns all nohz scenarios.
If it doesn't re-evaluate timer wheel and restart tick, it wouldn't be able to call timer's handler on its expiration. It would be delayed until the time tick is restarted again. Currently the max delay can be 1 second as returned by scheduler_tick_max_deferment(), but it can increase in future.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on() but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should get enqueued) by calling get_nohz_timer_target() and it is free to return a NO_HZ_FULL cpu as well. So, we *should* re-evaluate timer wheel there as well, otherwise call to timer's handler might be delayed as explained earlier.
NO_HZ_IDLE is concerned as well. get_nohz_timer_target() can return an idle CPU despite its best effort in this regard. The new target has all the time it wants to become idle until we lock its timer base.
And it matters because NO_HZ_FULL is a corner case while NO_HZ_IDLE adds 1000 pts of reviewer's interest for your patch ;)
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
Previous patch has already made changes to store 'cpu-id' in 'struct tvec_base' and we can get cpu number from there to pass on to wake_up_nohz_cpu().
This patch also does a minor modification: s/timer->base/base while calling tbase_get_deferrable() as base == timer->base here.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/timer.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c index 9e5f4f2..aca5dfe 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -410,6 +410,22 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer) base->next_timer = timer->expires; } base->all_timers++;
- /*
* Check whether the other CPU is in dynticks mode and needs
* to be triggered to reevaluate the timer wheel.
* We are protected against the other CPU fiddling
* with the timer by holding the timer base lock. This also
* makes sure that a CPU on the way to stop its tick can not
* evaluate the timer wheel.
*
* Spare the IPI for deferrable timers on idle targets though.
* The next busy ticks will take care of it. Except full dynticks
* require special care against races with idle_cpu(), lets deal
* with that later.
*/
- if (!tbase_get_deferrable(base) || tick_nohz_full_cpu(base->cpu))
wake_up_nohz_cpu(base->cpu);
} #ifdef CONFIG_TIMER_STATS @@ -949,22 +965,6 @@ void add_timer_on(struct timer_list *timer, int cpu) timer_set_base(timer, base); debug_activate(timer, timer->expires); internal_add_timer(base, timer);
- /*
* Check whether the other CPU is in dynticks mode and needs
* to be triggered to reevaluate the timer wheel.
* We are protected against the other CPU fiddling
* with the timer by holding the timer base lock. This also
* makes sure that a CPU on the way to stop its tick can not
* evaluate the timer wheel.
*
* Spare the IPI for deferrable timers on idle targets though.
* The next busy ticks will take care of it. Except full dynticks
* require special care against races with idle_cpu(), lets deal
* with that later.
*/
- if (!tbase_get_deferrable(timer->base) || tick_nohz_full_cpu(cpu))
wake_up_nohz_cpu(cpu);
- spin_unlock_irqrestore(&base->lock, flags);
Looks good! Thanks!
} EXPORT_SYMBOL_GPL(add_timer_on); -- 2.0.0.rc2