When a timer is enqueued or modified on a NO_HZ_FULL target (local or remote), the target is expected to re-evaluate its timer wheel to decide if tick must be restarted to handle timer expirations.
If it doesn't re-evaluate timer wheel and restart tick, it wouldn't be able to call timer's handler on its expiration. It would be delayed until the time tick is restarted again. Currently the max delay can be 1 second as returned by scheduler_tick_max_deferment(), but it can increase in future.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on() but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should get enqueued) by calling get_nohz_timer_target() and it is free to return a NO_HZ_FULL cpu as well. So, we *should* re-evaluate timer wheel there as well, otherwise call to timer's handler might be delayed as explained earlier.
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
This requires internal_add_timer() to get cpu number from per-cpu object 'base', as all the callers might not have cpu number to pass to internal_add_timer(). For example, in __mod_timer() we find timer's base from 'timer' pointer and not from per-cpu arithmetic.
Thus, this patch adds another field 'cpu' in 'struct tvec_base' which will store cpu number of the cpu it belongs to.
Next patch will then move wake_up_nohz_cpu() to internal_add_timer().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/timer.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/timer.c b/kernel/timer.c index 3bb01a3..9e5f4f2 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -82,6 +82,7 @@ struct tvec_base { unsigned long next_timer; unsigned long active_timers; unsigned long all_timers; + int cpu; struct tvec_root tv1; struct tvec tv2; struct tvec tv3; @@ -1568,6 +1569,7 @@ static int init_timers_cpu(int cpu) } spin_lock_init(&base->lock); tvec_base_done[cpu] = 1; + base->cpu = cpu; } else { base = per_cpu(tvec_bases, cpu); }
When a timer is enqueued or modified on a NO_HZ_FULL target (local or remote), the target is expected to re-evaluate its timer wheel to decide if tick must be restarted to handle timer expirations.
If it doesn't re-evaluate timer wheel and restart tick, it wouldn't be able to call timer's handler on its expiration. It would be delayed until the time tick is restarted again. Currently the max delay can be 1 second as returned by scheduler_tick_max_deferment(), but it can increase in future.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on() but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should get enqueued) by calling get_nohz_timer_target() and it is free to return a NO_HZ_FULL cpu as well. So, we *should* re-evaluate timer wheel there as well, otherwise call to timer's handler might be delayed as explained earlier.
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
Previous patch has already made changes to store 'cpu-id' in 'struct tvec_base' and we can get cpu number from there to pass on to wake_up_nohz_cpu().
This patch also does a minor modification: s/timer->base/base while calling tbase_get_deferrable() as base == timer->base here.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/timer.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c index 9e5f4f2..aca5dfe 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -410,6 +410,22 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer) base->next_timer = timer->expires; } base->all_timers++; + + /* + * Check whether the other CPU is in dynticks mode and needs + * to be triggered to reevaluate the timer wheel. + * We are protected against the other CPU fiddling + * with the timer by holding the timer base lock. This also + * makes sure that a CPU on the way to stop its tick can not + * evaluate the timer wheel. + * + * Spare the IPI for deferrable timers on idle targets though. + * The next busy ticks will take care of it. Except full dynticks + * require special care against races with idle_cpu(), lets deal + * with that later. + */ + if (!tbase_get_deferrable(base) || tick_nohz_full_cpu(base->cpu)) + wake_up_nohz_cpu(base->cpu); }
#ifdef CONFIG_TIMER_STATS @@ -949,22 +965,6 @@ void add_timer_on(struct timer_list *timer, int cpu) timer_set_base(timer, base); debug_activate(timer, timer->expires); internal_add_timer(base, timer); - /* - * Check whether the other CPU is in dynticks mode and needs - * to be triggered to reevaluate the timer wheel. - * We are protected against the other CPU fiddling - * with the timer by holding the timer base lock. This also - * makes sure that a CPU on the way to stop its tick can not - * evaluate the timer wheel. - * - * Spare the IPI for deferrable timers on idle targets though. - * The next busy ticks will take care of it. Except full dynticks - * require special care against races with idle_cpu(), lets deal - * with that later. - */ - if (!tbase_get_deferrable(timer->base) || tick_nohz_full_cpu(cpu)) - wake_up_nohz_cpu(cpu); - spin_unlock_irqrestore(&base->lock, flags); } EXPORT_SYMBOL_GPL(add_timer_on);
On Thu, Jun 05, 2014 at 12:28:28PM +0530, Viresh Kumar wrote:
When a timer is enqueued or modified on a NO_HZ_FULL target (local or remote), the target is expected to re-evaluate its timer wheel to decide if tick must be restarted to handle timer expirations.
Right but the scope is broader, it concerns all nohz scenarios.
If it doesn't re-evaluate timer wheel and restart tick, it wouldn't be able to call timer's handler on its expiration. It would be delayed until the time tick is restarted again. Currently the max delay can be 1 second as returned by scheduler_tick_max_deferment(), but it can increase in future.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on() but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should get enqueued) by calling get_nohz_timer_target() and it is free to return a NO_HZ_FULL cpu as well. So, we *should* re-evaluate timer wheel there as well, otherwise call to timer's handler might be delayed as explained earlier.
NO_HZ_IDLE is concerned as well. get_nohz_timer_target() can return an idle CPU despite its best effort in this regard. The new target has all the time it wants to become idle until we lock its timer base.
And it matters because NO_HZ_FULL is a corner case while NO_HZ_IDLE adds 1000 pts of reviewer's interest for your patch ;)
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
Previous patch has already made changes to store 'cpu-id' in 'struct tvec_base' and we can get cpu number from there to pass on to wake_up_nohz_cpu().
This patch also does a minor modification: s/timer->base/base while calling tbase_get_deferrable() as base == timer->base here.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/timer.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c index 9e5f4f2..aca5dfe 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -410,6 +410,22 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer) base->next_timer = timer->expires; } base->all_timers++;
- /*
* Check whether the other CPU is in dynticks mode and needs
* to be triggered to reevaluate the timer wheel.
* We are protected against the other CPU fiddling
* with the timer by holding the timer base lock. This also
* makes sure that a CPU on the way to stop its tick can not
* evaluate the timer wheel.
*
* Spare the IPI for deferrable timers on idle targets though.
* The next busy ticks will take care of it. Except full dynticks
* require special care against races with idle_cpu(), lets deal
* with that later.
*/
- if (!tbase_get_deferrable(base) || tick_nohz_full_cpu(base->cpu))
wake_up_nohz_cpu(base->cpu);
} #ifdef CONFIG_TIMER_STATS @@ -949,22 +965,6 @@ void add_timer_on(struct timer_list *timer, int cpu) timer_set_base(timer, base); debug_activate(timer, timer->expires); internal_add_timer(base, timer);
- /*
* Check whether the other CPU is in dynticks mode and needs
* to be triggered to reevaluate the timer wheel.
* We are protected against the other CPU fiddling
* with the timer by holding the timer base lock. This also
* makes sure that a CPU on the way to stop its tick can not
* evaluate the timer wheel.
*
* Spare the IPI for deferrable timers on idle targets though.
* The next busy ticks will take care of it. Except full dynticks
* require special care against races with idle_cpu(), lets deal
* with that later.
*/
- if (!tbase_get_deferrable(timer->base) || tick_nohz_full_cpu(cpu))
wake_up_nohz_cpu(cpu);
- spin_unlock_irqrestore(&base->lock, flags);
Looks good! Thanks!
} EXPORT_SYMBOL_GPL(add_timer_on); -- 2.0.0.rc2
On 5 June 2014 18:43, Frederic Weisbecker fweisbec@gmail.com wrote:
NO_HZ_IDLE is concerned as well. get_nohz_timer_target() can return an idle CPU despite its best effort in this regard.
It can return a idle cpu but that can only be current cpu, right?
The new target has all the time it wants to become idle until we lock its timer base.
Yeah..
And it matters because NO_HZ_FULL is a corner case while NO_HZ_IDLE adds 1000 pts of reviewer's interest for your patch ;)
Yeah. I tried rewording changelog, lets see if it is any better :)
When a timer is enqueued or modified on a NO_HZ_{IDLE|FULL} target (local or remote), the target must re-evaluate its timer wheel to decide if clock-event device must be programmed to restart tick on a IDLE cpu in NO_HZ_IDLE case OR on a tickless cpu in NO_HZ_FULL case.
If it doesn't re-evaluate timer wheel, it wouldn't be able to execute timer handlers on expiration. This would be delayed until the time CPU wakes up or tick is restarted again.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on() but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should get enqueued) by calling get_nohz_timer_target() and it is free to return a NO_HZ_FULL cpu as well. Also, the new target has all the time it wants to become idle until we lock its timer base.
So, we *should* re-evaluate timer wheel there as well, otherwise call to timer's handler might be delayed.
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
Previous patch has already made changes to store 'cpu-id' in 'struct tvec_base' and we can get cpu number from there to pass on to wake_up_nohz_cpu().
This patch also does a minor modification: s/timer->base/base while calling tbase_get_deferrable() as base == timer->base here.
On Thu, Jun 05, 2014 at 12:28:27PM +0530, Viresh Kumar wrote:
When a timer is enqueued or modified on a NO_HZ_FULL target (local or remote), the target is expected to re-evaluate its timer wheel to decide if tick must be restarted to handle timer expirations.
If it doesn't re-evaluate timer wheel and restart tick, it wouldn't be able to call timer's handler on its expiration. It would be delayed until the time tick is restarted again. Currently the max delay can be 1 second as returned by scheduler_tick_max_deferment(), but it can increase in future.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on() but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should get enqueued) by calling get_nohz_timer_target() and it is free to return a NO_HZ_FULL cpu as well. So, we *should* re-evaluate timer wheel there as well, otherwise call to timer's handler might be delayed as explained earlier.
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
This requires internal_add_timer() to get cpu number from per-cpu object 'base', as all the callers might not have cpu number to pass to internal_add_timer(). For example, in __mod_timer() we find timer's base from 'timer' pointer and not from per-cpu arithmetic.
Thus, this patch adds another field 'cpu' in 'struct tvec_base' which will store cpu number of the cpu it belongs to.
Next patch will then move wake_up_nohz_cpu() to internal_add_timer().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Except the changelog that should talk about NO_HZ in general, looks good.
Thanks!
kernel/timer.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/timer.c b/kernel/timer.c index 3bb01a3..9e5f4f2 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -82,6 +82,7 @@ struct tvec_base { unsigned long next_timer; unsigned long active_timers; unsigned long all_timers;
- int cpu; struct tvec_root tv1; struct tvec tv2; struct tvec tv3;
@@ -1568,6 +1569,7 @@ static int init_timers_cpu(int cpu) } spin_lock_init(&base->lock); tvec_base_done[cpu] = 1;
} else { base = per_cpu(tvec_bases, cpu); }base->cpu = cpu;
-- 2.0.0.rc2
linaro-kernel@lists.linaro.org