Hi,
In lowres mode, hrtimers are serviced by the tick. But when a hrtimer is enqueued on a idle or full dynticks target, we need to kick it in order to make it reconsider the next tick to schedule, to correctly handle the hrtimer's expiring time.
Now while this kick is correctly performed for periodic timers, hrtimers were a bit neglected.
get_nohz_timer_target() doesn't return a remote idle target but there is a small race window here. The elected target has all the time to turn dynticks idle between the call to get_nohz_timer_target() and the locking of its base. Hence a risk that we enqueue a hrtimer on a dynticks idle destination without kicking it. As a result, the hrtimer might be serviced too late in the future.
Also a target elected by get_nohz_timer_target() can be in full dynticks mode and thus require to be kicked as well. And unlike idle dynticks, this concern both local and remote targets.
This patchset should fix all these issues.
git://git.linaro.org/people/viresh.kumar/linux.git missing-kick
Similar problems with timers were fixed and sent to LKML earlier: https://lkml.org/lkml/2014/6/12/588
(Yeah, I stole logs from there :) )
Viresh Kumar (3): hrtimer: Store cpu-number in 'struct hrtimer_cpu_base' hrtimer: Kick lowres dynticks targets on hrtimer_start*() calls hrtimers: don't check hres_active before calling hrtimer_reprogram()
include/linux/hrtimer.h | 2 ++ kernel/hrtimer.c | 49 +++++++++++++++++++++++++++---------------------- 2 files changed, 29 insertions(+), 22 deletions(-)
In lowres mode, hrtimers are serviced by the tick. But when a hrtimer is enqueued on a idle or full dynticks target, we need to kick it in order to make it reconsider the next tick to schedule, to correctly handle the hrtimer's expiring time.
Now while this kick is correctly performed for periodic timers, hrtimers were a bit neglected.
To prepare for fixing this, we need __hrtimer_start_range_ns() to be able to resolve the CPU target associated to a hrtimer's object 'cpu_base' so that the kick can be centralized there.
So lets store it in the 'struct hrtimer_cpu_base' to resolve the CPU without much overhead. It is set at CPU's online notification.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/hrtimer.h | 2 ++ kernel/hrtimer.c | 1 + 2 files changed, 3 insertions(+)
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index e7a8d3f..bb4ffff 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -165,6 +165,7 @@ enum hrtimer_base_type { * struct hrtimer_cpu_base - the per cpu clock bases * @lock: lock protecting the base and associated clock bases * and timers + * @cpu: cpu number * @active_bases: Bitfield to mark bases with active timers * @clock_was_set: Indicates that clock was set from irq context. * @expires_next: absolute time of the next event which was scheduled @@ -179,6 +180,7 @@ enum hrtimer_base_type { */ struct hrtimer_cpu_base { raw_spinlock_t lock; + unsigned int cpu; unsigned int active_bases; unsigned int clock_was_set; #ifdef CONFIG_HIGH_RES_TIMERS diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index e0501fe..dd6fb1d 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1679,6 +1679,7 @@ static void init_hrtimers_cpu(int cpu) timerqueue_init_head(&cpu_base->clock_base[i].active); }
+ cpu_base->cpu = cpu; hrtimer_init_hres(cpu_base); }
In lowres mode, hrtimers are serviced by the tick. When a hrtimer is enqueued on a target, that CPU must re-evaluate the next tick to service that hrtimer.
Now while we correctly call wake_up_nohz_cpu() for periodic timers, hrtimers aren't supported that well on dynticks targets.
get_nohz_timer_target() doesn't return a remote idle target but there is a small race window here. The elected target has all the time to turn dynticks idle between the call to get_nohz_timer_target() and the locking of its base. Hence a risk that we enqueue a hrtimer on a dynticks idle destination without kicking it. As a result, the hrtimer might be serviced too late in the future.
Also a target elected by get_nohz_timer_target() can be in full dynticks mode and thus require to be kicked as well. And unlike idle dynticks, this concern both local and remote targets.
To fix this, lets centralize dynticks kick to __hrtimer_start_range_ns(), so that it is well handled for all sort of hrtimer enqueue APIs. It will be performed by an IPI kick on the target, exactly the way it is performed for periodic timers currently.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/hrtimer.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index dd6fb1d..8f0c79c 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1013,14 +1013,25 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
leftmost = enqueue_hrtimer(timer, new_base);
- /* - * Only allow reprogramming if the new base is on this CPU. - * (it might still be on another CPU if the timer was pending) - * - * XXX send_remote_softirq() ? - */ - if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases) - && hrtimer_enqueue_reprogram(timer, new_base)) { + if (!leftmost) { + unlock_hrtimer_base(timer, &flags); + return 0; + } + + if (!hrtimer_is_hres_active(timer)) { + /* + * Kicked to reevaluate the timer wheel for both idle and full + * dynticks target. + */ + wake_up_nohz_cpu(base->cpu_base->cpu); + } else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) && + hrtimer_enqueue_reprogram(timer, new_base)) { + /* + * Only allow reprogramming if the new base is on this CPU. + * (it might still be on another CPU if the timer was pending) + * + * XXX send_remote_softirq() ? + */ if (wakeup) { /* * We need to drop cpu_base->lock to avoid a
On Tue, Jun 17, 2014 at 02:18:13PM +0530, Viresh Kumar wrote:
In lowres mode, hrtimers are serviced by the tick. When a hrtimer is enqueued on a target, that CPU must re-evaluate the next tick to service that hrtimer.
Reusing the changelogs of periodic timers kind of works here. But it's a bit awkward to put things that way.
The changelog seem to assume that it's obvious that we must kick the target of a lowres hrtimer whereas we actually never did it.
I mean here the issue is not some missing pieces to be fixed as in periodic timers, we need to zoom out a bit on the whole subsystem support for dynticks.
Maybe that could be reworded to something like:
"In lowres mode, hrtimers are serviced by tick instead of a clock event. Now being serviced by the tick imply that we must deal with dynticks mode, pretty much like timer list timers do.
There are two requirements to correctly handle dynticks:
1) On target's tick stop time, we must not delay the next tick further the next hrtimer 2) On hrtimer queue time. If the tick of the target is stopped, we must wake up that CPU such that it sees the new hrtimer and recalculate the next tick.
The point 1 is well handled currently through get_nohz_timer_interrupt() and cmp_next_hrtimer_event().
But the point 2 isn't handled at all.
Fixing this is easy though as we have the necessary API ready for that. All we need is to call wake_up_nohz_cpu() on a target when a newly enqueued timer requires tick rescheduling, like timer list timer do."
Thanks.
get_nohz_timer_target() doesn't return a remote idle target but there is a small race window here. The elected target has all the time to turn dynticks idle between the call to get_nohz_timer_target() and the locking of its base. Hence a risk that we enqueue a hrtimer on a dynticks idle destination without kicking it. As a result, the hrtimer might be serviced too late in the future.
Also a target elected by get_nohz_timer_target() can be in full dynticks mode and thus require to be kicked as well. And unlike idle dynticks, this concern both local and remote targets.
To fix this, lets centralize dynticks kick to __hrtimer_start_range_ns(), so that it is well handled for all sort of hrtimer enqueue APIs. It will be performed by an IPI kick on the target, exactly the way it is performed for periodic timers currently.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/hrtimer.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index dd6fb1d..8f0c79c 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1013,14 +1013,25 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, leftmost = enqueue_hrtimer(timer, new_base);
- /*
* Only allow reprogramming if the new base is on this CPU.
* (it might still be on another CPU if the timer was pending)
*
* XXX send_remote_softirq() ?
*/
- if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases)
&& hrtimer_enqueue_reprogram(timer, new_base)) {
- if (!leftmost) {
unlock_hrtimer_base(timer, &flags);
return 0;
- }
- if (!hrtimer_is_hres_active(timer)) {
/*
* Kicked to reevaluate the timer wheel for both idle and full
* dynticks target.
*/
wake_up_nohz_cpu(base->cpu_base->cpu);
- } else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) &&
hrtimer_enqueue_reprogram(timer, new_base)) {
/*
* Only allow reprogramming if the new base is on this CPU.
* (it might still be on another CPU if the timer was pending)
*
* XXX send_remote_softirq() ?
if (wakeup) { /* * We need to drop cpu_base->lock to avoid a*/
-- 2.0.0.rc2
On 18 June 2014 18:23, Frederic Weisbecker fweisbec@gmail.com wrote:
Reusing the changelogs of periodic timers kind of works here. But it's a bit awkward to put things that way.
The changelog seem to assume that it's obvious that we must kick the target of a lowres hrtimer whereas we actually never did it.
Correct !!
I have updated logs and pushed them back in my repo I shared with you earlier. I have updated 1/3 as well a bit with what you suggested..
Though I have kept some part from earlier log about how can we get idle/full dynticks CPU from get_nohz_timer_target(). Might be worth mentioning ?
New logs:
hrtimer: Kick lowres dynticks targets on hrtimer_start*() calls
In lowres mode, hrtimers are serviced by tick instead of a clock event. Now being serviced by the tick imply that we must deal with dynticks mode, pretty much like timer list timers do.
There are two requirements to correctly handle dynticks:
1) On target's tick stop time, we must not delay the next tick further the next hrtimer 2) On hrtimer queue time. If the tick of the target is stopped, we must wake up that CPU such that it sees the new hrtimer and recalculate the next tick.
The point 1 is well handled currently through get_nohz_timer_interrupt() and cmp_next_hrtimer_event().
But the point 2 isn't handled at all.
Fixing this is easy though as we have the necessary API ready for that. All we need is to call wake_up_nohz_cpu() on a target when a newly enqueued timer requires tick rescheduling, like timer list timer do."
get_nohz_timer_target() doesn't return a remote idle target but there is a small race window here. The elected target has all the time to turn dynticks idle between the call to get_nohz_timer_target() and the locking of its base. Hence a risk that we enqueue a hrtimer on a dynticks idle destination without kicking it. As a result, the hrtimer might be serviced too late in the future.
Also a target elected by get_nohz_timer_target() can be in full dynticks mode and thus require to be kicked as well. And unlike idle dynticks, this concern both local and remote targets.
To fix this, lets centralize dynticks kick to __hrtimer_start_range_ns(), so that it is well handled for all sort of hrtimer enqueue APIs. It will be performed by an IPI kick on the target, exactly the way it is performed for periodic timers currently.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Any more improvements required?
On Wed, Jun 18, 2014 at 06:47:57PM +0530, Viresh Kumar wrote:
On 18 June 2014 18:23, Frederic Weisbecker fweisbec@gmail.com wrote:
Reusing the changelogs of periodic timers kind of works here. But it's a bit awkward to put things that way.
The changelog seem to assume that it's obvious that we must kick the target of a lowres hrtimer whereas we actually never did it.
Correct !!
I have updated logs and pushed them back in my repo I shared with you earlier. I have updated 1/3 as well a bit with what you suggested..
Though I have kept some part from earlier log about how can we get idle/full dynticks CPU from get_nohz_timer_target(). Might be worth mentioning ?
New logs:
hrtimer: Kick lowres dynticks targets on hrtimer_start*() calls In lowres mode, hrtimers are serviced by tick instead of a clock event. Now being serviced by the tick imply that we must deal with dynticks
mode, pretty much like timer list timers do.
So if you want to mention get_nohz_timer_target(), better put it right here to outline that get_nohz_timer_target() can return either a full dynticks CPU or an idle dynticks CPU dur to the race window.
There are two requirements to correctly handle dynticks: 1) On target's tick stop time, we must not delay the next tick
further the next hrtimer 2) On hrtimer queue time. If the tick of the target is stopped, we must wake up that CPU such that it sees the new hrtimer and recalculate the next tick.
The point 1 is well handled currently through get_nohz_timer_interrupt() and cmp_next_hrtimer_event(). But the point 2 isn't handled at all. Fixing this is easy though as we have the necessary API ready for
that. All we need is to call wake_up_nohz_cpu() on a target when a newly enqueued timer requires tick rescheduling, like timer list timer do."
get_nohz_timer_target() doesn't return a remote idle target but
there is a small race window here. The elected target has all the time to turn dynticks idle between the call to get_nohz_timer_target() and the locking of its base. Hence a risk that we enqueue a hrtimer on a dynticks idle destination without kicking it. As a result, the hrtimer might be serviced too late in the future.
Also a target elected by get_nohz_timer_target() can be in full
dynticks mode and thus require to be kicked as well. And unlike idle dynticks, this concern both local and remote targets.
To fix this, lets centralize dynticks kick to __hrtimer_start_range_ns(), so that it is well handled for all sort of hrtimer enqueue APIs. It will be performed by an IPI kick on the target, exactly the way it is performed for periodic timers currently. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Any more improvements required?
Nope, looks good, thanks!
We call hrtimer_enqueue_reprogram() only if we are in high resolution mode and don't need to check that again in hrtimer_enqueue_reprogram().
Once that is removed, hrtimer_enqueue_reprogram() turns to be a wrapper over hrtimer_reprogram() and can be dropped.
Drop hrtimer_enqueue_reprogram() as well and use hrtimer_reprogram() instead.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/hrtimer.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 8f0c79c..f72fc67 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -602,6 +602,11 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) * timers, we have to check, whether it expires earlier than the timer for * which the clock event device was armed. * + * Note, that in case the state has HRTIMER_STATE_CALLBACK set, no reprogramming + * and no expiry check happens. The timer gets enqueued into the rbtree. The + * reprogramming and expiry check is done in the hrtimer_interrupt or in the + * softirq. + * * Called with interrupts disabled and base->cpu_base.lock held */ static int hrtimer_reprogram(struct hrtimer *timer, @@ -662,18 +667,6 @@ static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base) base->hres_active = 0; }
-/* - * When High resolution timers are active, try to reprogram. Note, that in case - * the state has HRTIMER_STATE_CALLBACK set, no reprogramming and no expiry - * check happens. The timer gets enqueued into the rbtree. The reprogramming - * and expiry check is done in the hrtimer_interrupt or in the softirq. - */ -static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer, - struct hrtimer_clock_base *base) -{ - return base->cpu_base->hres_active && hrtimer_reprogram(timer, base); -} - static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base) { ktime_t *offs_real = &base->clock_base[HRTIMER_BASE_REALTIME].offset; @@ -755,8 +748,8 @@ static inline int hrtimer_is_hres_enabled(void) { return 0; } static inline int hrtimer_switch_to_hres(void) { return 0; } static inline void hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { } -static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer, - struct hrtimer_clock_base *base) +static inline int hrtimer_reprogram(struct hrtimer *timer, + struct hrtimer_clock_base *base) { return 0; } @@ -1025,7 +1018,7 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, */ wake_up_nohz_cpu(base->cpu_base->cpu); } else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) && - hrtimer_enqueue_reprogram(timer, new_base)) { + hrtimer_reprogram(timer, new_base)) { /* * Only allow reprogramming if the new base is on this CPU. * (it might still be on another CPU if the timer was pending)
linaro-kernel@lists.linaro.org