When a timer is enqueued or modified on a NO_HZ_FULL target (local or remote), the target is expected to re-evaluate its timer wheel to decide if tick must be restarted to handle timer expirations.
If it doesn't re-evaluate timer wheel and restart tick, it wouldn't be able to call timer's handler on its expiration. It would be delayed until the time tick is restarted again. Currently the max delay can be 1 second as returned by scheduler_tick_max_deferment(), but it can increase in future.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on() but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should get enqueued) by calling get_nohz_timer_target() and it is free to return a NO_HZ_FULL cpu as well. So, we *should* re-evaluate timer wheel there as well, otherwise call to timer's handler might be delayed as explained earlier.
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
Previous patch has already made changes to store 'cpu-id' in 'struct tvec_base' and we can get cpu number from there to pass on to wake_up_nohz_cpu().
This patch also does a minor modification: s/timer->base/base while calling tbase_get_deferrable() as base == timer->base here.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/timer.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c index 9e5f4f2..aca5dfe 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -410,6 +410,22 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer) base->next_timer = timer->expires; } base->all_timers++; + + /* + * Check whether the other CPU is in dynticks mode and needs + * to be triggered to reevaluate the timer wheel. + * We are protected against the other CPU fiddling + * with the timer by holding the timer base lock. This also + * makes sure that a CPU on the way to stop its tick can not + * evaluate the timer wheel. + * + * Spare the IPI for deferrable timers on idle targets though. + * The next busy ticks will take care of it. Except full dynticks + * require special care against races with idle_cpu(), lets deal + * with that later. + */ + if (!tbase_get_deferrable(base) || tick_nohz_full_cpu(base->cpu)) + wake_up_nohz_cpu(base->cpu); }
#ifdef CONFIG_TIMER_STATS @@ -949,22 +965,6 @@ void add_timer_on(struct timer_list *timer, int cpu) timer_set_base(timer, base); debug_activate(timer, timer->expires); internal_add_timer(base, timer); - /* - * Check whether the other CPU is in dynticks mode and needs - * to be triggered to reevaluate the timer wheel. - * We are protected against the other CPU fiddling - * with the timer by holding the timer base lock. This also - * makes sure that a CPU on the way to stop its tick can not - * evaluate the timer wheel. - * - * Spare the IPI for deferrable timers on idle targets though. - * The next busy ticks will take care of it. Except full dynticks - * require special care against races with idle_cpu(), lets deal - * with that later. - */ - if (!tbase_get_deferrable(timer->base) || tick_nohz_full_cpu(cpu)) - wake_up_nohz_cpu(cpu); - spin_unlock_irqrestore(&base->lock, flags); } EXPORT_SYMBOL_GPL(add_timer_on);