Hi,
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next. When no timers/hrtimers are pending to be serviced, the expiry time is set to a special value: KTIME_MAX.
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the unused clockevent device should be stopped or its interrupts should be masked.
Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508.
First patch implements the required infrastructure to start/stop clockevent device. The second patch stops/starts clockevent devices as and when required.
It is also pushed here:
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git tick/oneshot-stopped
@Preeti: I have applied your Reviewed-by only for the first commit as others have changed significantly. Feel free to give it again :)
V1->V2: - Peter & Ingo suggested to to start/stop clockevent device from tick_program_event() instead of the call sites that called tick_program_event().
-- viresh
Viresh Kumar (2): clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state Clockevents: Stop unused clockevent devices
include/linux/clockchips.h | 7 ++++++- kernel/time/clockevents.c | 18 +++++++++++++++++- kernel/time/hrtimer.c | 6 ++---- kernel/time/tick-oneshot.c | 16 ++++++++++++++++ kernel/time/timer_list.c | 6 ++++++ 5 files changed, 47 insertions(+), 6 deletions(-)
When no timers/hrtimers are pending, the expiry time is set to a special value: 'KTIME_MAX'. This normally happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the clockevent device should be stopped or its interrupts should be masked.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this.
However, Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508.
This patch adds support for ONESHOT_STOPPED state in clockevents core. It will only be available to drivers that implement the state-specific callbacks instead of the legacy ->set_mode() callback.
Reviewed-by: Preeti U. Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/clockchips.h | 7 ++++++- kernel/time/clockevents.c | 14 +++++++++++++- kernel/time/timer_list.c | 6 ++++++ 3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index caeca76d7b32..ee63865d3cd7 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -49,12 +49,15 @@ enum clock_event_mode { * reached from DETACHED or SHUTDOWN. * ONESHOT: Device is programmed to generate event only once. Can be reached * from DETACHED or SHUTDOWN. + * ONESHOT_STOPPED: Device was programmed in ONESHOT mode and is temporarily + * stopped. */ enum clock_event_state { CLOCK_EVT_STATE_DETACHED, CLOCK_EVT_STATE_SHUTDOWN, CLOCK_EVT_STATE_PERIODIC, CLOCK_EVT_STATE_ONESHOT, + CLOCK_EVT_STATE_ONESHOT_STOPPED, };
/* @@ -102,6 +105,7 @@ enum clock_event_state { * @set_mode: legacy set mode function, only for modes <= CLOCK_EVT_MODE_RESUME. * @set_state_periodic: switch state to periodic, if !set_mode * @set_state_oneshot: switch state to oneshot, if !set_mode + * @set_state_oneshot_stopped: switch state to oneshot_stopped, if !set_mode * @set_state_shutdown: switch state to shutdown, if !set_mode * @tick_resume: resume clkevt device, if !set_mode * @broadcast: function to broadcast events @@ -133,11 +137,12 @@ struct clock_event_device { * State transition callback(s): Only one of the two groups should be * defined: * - set_mode(), only for modes <= CLOCK_EVT_MODE_RESUME. - * - set_state_{shutdown|periodic|oneshot}(), tick_resume(). + * - set_state_{shutdown|periodic|oneshot|oneshot_stopped}(), tick_resume(). */ void (*set_mode)(enum clock_event_mode mode, struct clock_event_device *); int (*set_state_periodic)(struct clock_event_device *); int (*set_state_oneshot)(struct clock_event_device *); + int (*set_state_oneshot_stopped)(struct clock_event_device *); int (*set_state_shutdown)(struct clock_event_device *); int (*tick_resume)(struct clock_event_device *);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 7af614829da1..49ed2ea4294c 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -138,6 +138,17 @@ static int __clockevents_set_state(struct clock_event_device *dev, return -ENOSYS; return dev->set_state_oneshot(dev);
+ case CLOCK_EVT_STATE_ONESHOT_STOPPED: + /* Core internal bug */ + if (WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT, + "Current state: %d\n", dev->state)) + return -EINVAL; + + if (dev->set_state_oneshot_stopped) + return dev->set_state_oneshot_stopped(dev); + else + return -ENOSYS; + default: return -ENOSYS; } @@ -449,7 +460,8 @@ static int clockevents_sanity_check(struct clock_event_device *dev) if (dev->set_mode) { /* We shouldn't be supporting new modes now */ WARN_ON(dev->set_state_periodic || dev->set_state_oneshot || - dev->set_state_shutdown || dev->tick_resume); + dev->set_state_shutdown || dev->tick_resume || + dev->set_state_oneshot_stopped);
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED); return 0; diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index e878c2e0ba45..fbc902ccd72a 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -251,6 +251,12 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu) SEQ_printf(m, "\n"); }
+ if (dev->set_state_oneshot_stopped) { + SEQ_printf(m, " oneshot stopped: "); + print_name_offset(m, dev->set_state_oneshot_stopped); + SEQ_printf(m, "\n"); + } + if (dev->tick_resume) { SEQ_printf(m, " resume: "); print_name_offset(m, dev->tick_resume);
To avoid getting spurious interrupts on a tickless CPU, clockevent device can now be stopped by switching to ONESHOT_STOPPED state.
The natural place for handling this transition is tick_program_event().
On 'expires == KTIME_MAX', we skip programming the event and so we need to fix such call sites as well, to always call tick_program_event() irrespective of the expires value.
Once the clockevent device is required again, check if it was earlier put into ONESHOT_STOPPED state. If yes, switch its state to ONESHOT before programming its event.
To make sure we haven't missed any corner case, add a WARN() for the case where we try to reprogram clockevent device while we aren't configured in ONESHOT_STOPPED state.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/clockevents.c | 4 ++++ kernel/time/hrtimer.c | 6 ++---- kernel/time/tick-oneshot.c | 16 ++++++++++++++++ 3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 49ed2ea4294c..8ff54495f60b 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -335,6 +335,10 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, if (dev->state == CLOCK_EVT_STATE_SHUTDOWN) return 0;
+ /* We must be in ONESHOT state here */ + WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT, "Current state: %d\n", + dev->state); + /* Shortcut for clockevent devices that can deal with ktime. */ if (dev->features & CLOCK_EVT_FEAT_KTIME) return dev->set_next_ktime(expires, dev); diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 721d29b99d10..d1be3eda83fa 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -543,8 +543,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) if (cpu_base->hang_detected) return;
- if (cpu_base->expires_next.tv64 != KTIME_MAX) - tick_program_event(cpu_base->expires_next, 1); + tick_program_event(cpu_base->expires_next, 1); }
/* @@ -1308,8 +1307,7 @@ void hrtimer_interrupt(struct clock_event_device *dev) raw_spin_unlock(&cpu_base->lock);
/* Reprogramming necessary ? */ - if (expires_next.tv64 == KTIME_MAX || - !tick_program_event(expires_next, 0)) { + if (!tick_program_event(expires_next, 0)) { cpu_base->hang_detected = 0; return; } diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c index 67a64b1670bf..f8de75715c2f 100644 --- a/kernel/time/tick-oneshot.c +++ b/kernel/time/tick-oneshot.c @@ -28,6 +28,22 @@ int tick_program_event(ktime_t expires, int force) { struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
+ if (unlikely(expires.tv64 == KTIME_MAX)) { + /* + * We don't need the clock event device any more, stop it. + */ + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); + return 0; + } + + if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED)) { + /* + * We need the clock event again, configure it in ONESHOT mode + * before using it. + */ + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT); + } + return clockevents_program_event(dev, expires, force); }
On 3 April 2015 at 09:04, Viresh Kumar viresh.kumar@linaro.org wrote:
Hi,
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next. When no timers/hrtimers are pending to be serviced, the expiry time is set to a special value: KTIME_MAX.
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the unused clockevent device should be stopped or its interrupts should be masked.
Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508.
First patch implements the required infrastructure to start/stop clockevent device. The second patch stops/starts clockevent devices as and when required.
It is also pushed here:
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git tick/oneshot-stopped
@Preeti: I have applied your Reviewed-by only for the first commit as others have changed significantly. Feel free to give it again :)
V1->V2:
- Peter & Ingo suggested to to start/stop clockevent device from tick_program_event() instead of the call sites that called tick_program_event().
Hi Thomas,
Gentle reminder ping...
-- viresh
linaro-kernel@lists.linaro.org