Hi Thomas/Ingo/Peter,
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next. When no timers/hrtimers are pending to be serviced, the expiry time is set to a special value: KTIME_MAX.
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the clockevent device should be stopped or its interrupts should be masked.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this.
However, Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508.
First patch implements the required infrastructure to start/stop clockevent device. Third patch stops clockevent devices when no longer required and Second patch starts them again as and when required.
The review order can be 1,3,2 for better understanding. But patch 2 was required before 3 to keep 'git bisect' happy, otherwise clockevent device might never get restarted after stopping it :)
It is also pushed here:
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git tick/oneshot-stopped
Viresh Kumar (3): clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state clockevents: Restart clockevent device before using it again clockevents: Switch state to ONESHOT_STOPPED for unused clockevent devices
include/linux/clockchips.h | 7 ++++++- kernel/time/clockevents.c | 18 +++++++++++++++- kernel/time/hrtimer.c | 51 ++++++++++++++++++++++++++++++++++++++++++---- kernel/time/tick-sched.c | 17 +++++++++++++++- kernel/time/timer_list.c | 6 ++++++ 5 files changed, 92 insertions(+), 7 deletions(-)
When no timers/hrtimers are pending, the expiry time is set to a special value: 'KTIME_MAX'. This normally happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the clockevent device should be stopped or its interrupts should be masked.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this.
However, Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508.
This patch adds support for ONESHOT_STOPPED state in clockevents core. It will only be available to drivers that implement the state-specific callbacks instead of the legacy ->set_mode() callback.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/clockchips.h | 7 ++++++- kernel/time/clockevents.c | 14 +++++++++++++- kernel/time/timer_list.c | 6 ++++++ 3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index e20232c3320a..33ad247f8662 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -51,12 +51,15 @@ enum clock_event_mode { * reached from DETACHED or SHUTDOWN. * ONESHOT: Device is programmed to generate event only once. Can be reached * from DETACHED or SHUTDOWN. + * ONESHOT_STOPPED: Device was programmed in ONESHOT mode and is temporarily + * stopped. */ enum clock_event_state { CLOCK_EVT_STATE_DETACHED = 0, CLOCK_EVT_STATE_SHUTDOWN, CLOCK_EVT_STATE_PERIODIC, CLOCK_EVT_STATE_ONESHOT, + CLOCK_EVT_STATE_ONESHOT_STOPPED, };
/* @@ -103,6 +106,7 @@ enum clock_event_state { * @set_mode: legacy set mode function, only for modes <= CLOCK_EVT_MODE_RESUME. * @set_state_periodic: switch state to periodic, if !set_mode * @set_state_oneshot: switch state to oneshot, if !set_mode + * @set_state_oneshot_stopped: switch state to oneshot_stopped, if !set_mode * @set_state_shutdown: switch state to shutdown, if !set_mode * @tick_resume: resume clkevt device, if !set_mode * @broadcast: function to broadcast events @@ -136,12 +140,13 @@ struct clock_event_device { * State transition callback(s): Only one of the two groups should be * defined: * - set_mode(), only for modes <= CLOCK_EVT_MODE_RESUME. - * - set_state_{shutdown|periodic|oneshot}(), tick_resume(). + * - set_state_{shutdown|periodic|oneshot|oneshot_stopped}(), tick_resume(). */ void (*set_mode)(enum clock_event_mode mode, struct clock_event_device *); int (*set_state_periodic)(struct clock_event_device *); int (*set_state_oneshot)(struct clock_event_device *); + int (*set_state_oneshot_stopped)(struct clock_event_device *); int (*set_state_shutdown)(struct clock_event_device *); int (*tick_resume)(struct clock_event_device *);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 73689df1e4b8..04f6c3433f8e 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -138,6 +138,17 @@ static int __clockevents_set_state(struct clock_event_device *dev, return -ENOSYS; return dev->set_state_oneshot(dev);
+ case CLOCK_EVT_STATE_ONESHOT_STOPPED: + /* Core internal bug */ + if (WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT, + "Current state: %d\n", dev->state)) + return -EINVAL; + + if (dev->set_state_oneshot_stopped) + return dev->set_state_oneshot_stopped(dev); + else + return -ENOSYS; + default: return -ENOSYS; } @@ -449,7 +460,8 @@ static int clockevents_sanity_check(struct clock_event_device *dev) if (dev->set_mode) { /* We shouldn't be supporting new modes now */ WARN_ON(dev->set_state_periodic || dev->set_state_oneshot || - dev->set_state_shutdown || dev->tick_resume); + dev->set_state_shutdown || dev->tick_resume || + dev->set_state_oneshot_stopped);
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED); return 0; diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index 05aa5590106a..98d77969ba85 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -251,6 +251,12 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu) SEQ_printf(m, "\n"); }
+ if (dev->set_state_oneshot_stopped) { + SEQ_printf(m, " oneshot stopped: "); + print_name_offset(m, dev->set_state_oneshot_stopped); + SEQ_printf(m, "\n"); + } + if (dev->tick_resume) { SEQ_printf(m, " resume: "); print_name_offset(m, dev->tick_resume);
On 03/27/2015 10:44 PM, Viresh Kumar wrote:
When no timers/hrtimers are pending, the expiry time is set to a special value: 'KTIME_MAX'. This normally happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the clockevent device should be stopped or its interrupts should be masked.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this.
However, Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508.
This patch adds support for ONESHOT_STOPPED state in clockevents core. It will only be available to drivers that implement the state-specific callbacks instead of the legacy ->set_mode() callback.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
include/linux/clockchips.h | 7 ++++++- kernel/time/clockevents.c | 14 +++++++++++++- kernel/time/timer_list.c | 6 ++++++ 3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index e20232c3320a..33ad247f8662 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -51,12 +51,15 @@ enum clock_event_mode {
reached from DETACHED or SHUTDOWN.
- ONESHOT: Device is programmed to generate event only once. Can be reached
from DETACHED or SHUTDOWN.
- ONESHOT_STOPPED: Device was programmed in ONESHOT mode and is temporarily
*/
stopped.
enum clock_event_state { CLOCK_EVT_STATE_DETACHED = 0, CLOCK_EVT_STATE_SHUTDOWN, CLOCK_EVT_STATE_PERIODIC, CLOCK_EVT_STATE_ONESHOT,
- CLOCK_EVT_STATE_ONESHOT_STOPPED,
};
/* @@ -103,6 +106,7 @@ enum clock_event_state {
- @set_mode: legacy set mode function, only for modes <= CLOCK_EVT_MODE_RESUME.
- @set_state_periodic: switch state to periodic, if !set_mode
- @set_state_oneshot: switch state to oneshot, if !set_mode
- @set_state_oneshot_stopped: switch state to oneshot_stopped, if !set_mode
- @set_state_shutdown: switch state to shutdown, if !set_mode
- @tick_resume: resume clkevt device, if !set_mode
- @broadcast: function to broadcast events
@@ -136,12 +140,13 @@ struct clock_event_device { * State transition callback(s): Only one of the two groups should be * defined: * - set_mode(), only for modes <= CLOCK_EVT_MODE_RESUME.
* - set_state_{shutdown|periodic|oneshot}(), tick_resume().
*/ void (*set_mode)(enum clock_event_mode mode, struct clock_event_device *); int (*set_state_periodic)(struct clock_event_device *); int (*set_state_oneshot)(struct clock_event_device *);* - set_state_{shutdown|periodic|oneshot|oneshot_stopped}(), tick_resume().
- int (*set_state_oneshot_stopped)(struct clock_event_device *); int (*set_state_shutdown)(struct clock_event_device *); int (*tick_resume)(struct clock_event_device *);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 73689df1e4b8..04f6c3433f8e 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -138,6 +138,17 @@ static int __clockevents_set_state(struct clock_event_device *dev, return -ENOSYS; return dev->set_state_oneshot(dev);
- case CLOCK_EVT_STATE_ONESHOT_STOPPED:
/* Core internal bug */
if (WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT,
"Current state: %d\n", dev->state))
return -EINVAL;
if (dev->set_state_oneshot_stopped)
return dev->set_state_oneshot_stopped(dev);
else
return -ENOSYS;
- default: return -ENOSYS; }
@@ -449,7 +460,8 @@ static int clockevents_sanity_check(struct clock_event_device *dev) if (dev->set_mode) { /* We shouldn't be supporting new modes now */ WARN_ON(dev->set_state_periodic || dev->set_state_oneshot ||
dev->set_state_shutdown || dev->tick_resume);
dev->set_state_shutdown || dev->tick_resume ||
dev->set_state_oneshot_stopped);
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED); return 0;
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index 05aa5590106a..98d77969ba85 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -251,6 +251,12 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu) SEQ_printf(m, "\n"); }
if (dev->set_state_oneshot_stopped) {
SEQ_printf(m, " oneshot stopped: ");
print_name_offset(m, dev->set_state_oneshot_stopped);
SEQ_printf(m, "\n");
}
- if (dev->tick_resume) { SEQ_printf(m, " resume: "); print_name_offset(m, dev->tick_resume);
With all the precursor patches gone in, this series looks good to me.
Reviewed-by: Preeti U. Murthy preeti@linux.vnet.ibm.com
Viresh Kumar viresh.kumar@linaro.org writes:
When no timers/hrtimers are pending, the expiry time is set to a special value: 'KTIME_MAX'. This normally happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the clockevent device should be stopped or its interrupts should be masked.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this.
However, Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508.
This patch adds support for ONESHOT_STOPPED state in clockevents core. It will only be available to drivers that implement the state-specific callbacks instead of the legacy ->set_mode() callback.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Reviewed-by: Kevin Hilman khilman@linaro.org
with a minor nit...
[...]
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 73689df1e4b8..04f6c3433f8e 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -138,6 +138,17 @@ static int __clockevents_set_state(struct clock_event_device *dev, return -ENOSYS; return dev->set_state_oneshot(dev);
- case CLOCK_EVT_STATE_ONESHOT_STOPPED:
/* Core internal bug */
This comment is not useful at all (nor are all the other ones already in this file.) IMO, the comment should say something like: "ONESHOT_STOPPED is only valid when currently in the ONESHOT state." or something similar.
if (WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT,
"Current state: %d\n", dev->state))
Similarily this output will not be useful, and should say something like: "Can only enter ONESHOT_STOPPED from ONESHOT. Current state: %d\n".
return -EINVAL;
if (dev->set_state_oneshot_stopped)
return dev->set_state_oneshot_stopped(dev);
else
return -ENOSYS;
- default: return -ENOSYS; }
Kevin
Next commit would update clockevents core to switch clockevent devices to ONESHOT_STOPPED state to avoid getting spurious interrupts on a tickless CPU.
Before programming the clockevent device for next event, we must switch its state to ONESHOT.
Switch state back to ONESHOT from ONESHOT_STOPPED at following places:
1.) NOHZ_MODE_LOWRES Mode
Timers & hrtimers are dependent on tick for their working in this mode and the only place from where clockevent device is programmed is the tick-code.
Two routines can restart ticks in LOWRES mode: tick_nohz_restart() and tick_nohz_stop_sched_tick().
2.) NOHZ_MODE_HIGHRES Mode
Tick & timers are dependent on hrtimers for their working in this mode and the only place from where clockevent device is programmed is the hrtimer-code.
Only hrtimer_reprogram() is responsible for programming the clockevent device for next event, if the clockevent device is stopped earlier. And updating that alone is sufficient here.
To make sure we haven't missed any corner case, add a WARN() for the case where we try to reprogram clockevent device while we aren't configured in ONESHOT_STOPPED state.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/clockevents.c | 4 ++++ kernel/time/hrtimer.c | 5 +++++ kernel/time/tick-sched.c | 14 +++++++++++++- 3 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 04f6c3433f8e..e95896fd21fd 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -335,6 +335,10 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, if (dev->state == CLOCK_EVT_STATE_SHUTDOWN) return 0;
+ /* We must be in ONESHOT state here */ + WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT, "Current state: %d\n", + dev->state); + /* Shortcut for clockevent devices that can deal with ktime. */ if (dev->features & CLOCK_EVT_FEAT_KTIME) return dev->set_next_ktime(expires, dev); diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index bee0c1f78091..045ba7e2be6c 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -566,6 +566,7 @@ static int hrtimer_reprogram(struct hrtimer *timer, { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset); + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); int res;
WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0); @@ -610,6 +611,10 @@ static int hrtimer_reprogram(struct hrtimer *timer, if (cpu_base->hang_detected) return 0;
+ /* Switchback to ONESHOT state */ + if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED)) + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT); + /* * Clockevents returns -ETIME, when the event was in the past. */ diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index a4c4edac4528..47c04edd07df 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -694,8 +694,14 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, /* Check, if the timer was already in the past */ if (hrtimer_active(&ts->sched_timer)) goto out; - } else if (!tick_program_event(expires, 0)) + } else { + /* Switchback to ONESHOT state */ + if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED)) + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT); + + if (!tick_program_event(expires, 0)) goto out; + } /* * We are past the event already. So we crossed a * jiffie boundary. Update jiffies and raise the @@ -873,6 +879,8 @@ ktime_t tick_nohz_get_sleep_length(void)
static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) { + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); + hrtimer_cancel(&ts->sched_timer); hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
@@ -887,6 +895,10 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) if (hrtimer_active(&ts->sched_timer)) break; } else { + /* Switchback to ONESHOT state */ + if (likely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED)) + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT); + if (!tick_program_event( hrtimer_get_expires(&ts->sched_timer), 0)) break;
On 03/27/2015 10:44 PM, Viresh Kumar wrote:
Next commit would update clockevents core to switch clockevent devices to ONESHOT_STOPPED state to avoid getting spurious interrupts on a tickless CPU.
Before programming the clockevent device for next event, we must switch its state to ONESHOT.
Switch state back to ONESHOT from ONESHOT_STOPPED at following places:
1.) NOHZ_MODE_LOWRES Mode
Timers & hrtimers are dependent on tick for their working in this mode and the only place from where clockevent device is programmed is the tick-code.
Two routines can restart ticks in LOWRES mode: tick_nohz_restart() and tick_nohz_stop_sched_tick().
2.) NOHZ_MODE_HIGHRES Mode
Tick & timers are dependent on hrtimers for their working in this mode and the only place from where clockevent device is programmed is the hrtimer-code.
Only hrtimer_reprogram() is responsible for programming the clockevent device for next event, if the clockevent device is stopped earlier. And updating that alone is sufficient here.
To make sure we haven't missed any corner case, add a WARN() for the case where we try to reprogram clockevent device while we aren't configured in ONESHOT_STOPPED state.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/time/clockevents.c | 4 ++++ kernel/time/hrtimer.c | 5 +++++ kernel/time/tick-sched.c | 14 +++++++++++++- 3 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 04f6c3433f8e..e95896fd21fd 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -335,6 +335,10 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, if (dev->state == CLOCK_EVT_STATE_SHUTDOWN) return 0;
- /* We must be in ONESHOT state here */
- WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT, "Current state: %d\n",
dev->state);
- /* Shortcut for clockevent devices that can deal with ktime. */ if (dev->features & CLOCK_EVT_FEAT_KTIME) return dev->set_next_ktime(expires, dev);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index bee0c1f78091..045ba7e2be6c 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -566,6 +566,7 @@ static int hrtimer_reprogram(struct hrtimer *timer, { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); int res;
WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0);
@@ -610,6 +611,10 @@ static int hrtimer_reprogram(struct hrtimer *timer, if (cpu_base->hang_detected) return 0;
- /* Switchback to ONESHOT state */
- if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED))
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT);
- /*
*/
- Clockevents returns -ETIME, when the event was in the past.
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index a4c4edac4528..47c04edd07df 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -694,8 +694,14 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, /* Check, if the timer was already in the past */ if (hrtimer_active(&ts->sched_timer)) goto out;
} else if (!tick_program_event(expires, 0))
} else {
/* Switchback to ONESHOT state */
if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED))
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT);
if (!tick_program_event(expires, 0)) goto out;
/*}
- We are past the event already. So we crossed a
- jiffie boundary. Update jiffies and raise the
@@ -873,6 +879,8 @@ ktime_t tick_nohz_get_sleep_length(void)
static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) {
- struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
- hrtimer_cancel(&ts->sched_timer); hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
@@ -887,6 +895,10 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) if (hrtimer_active(&ts->sched_timer)) break; } else {
/* Switchback to ONESHOT state */
if (likely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED))
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT);
if (!tick_program_event( hrtimer_get_expires(&ts->sched_timer), 0)) break;
Reviewed-by: Preeti U. Murthy preeti@linux.vnet.ibm.com
On Fri, Mar 27, 2015 at 10:44:28PM +0530, Viresh Kumar wrote:
Only hrtimer_reprogram() is responsible for programming the clockevent device for next event, if the clockevent device is stopped earlier. And updating that alone is sufficient here.
+++ b/kernel/time/hrtimer.c @@ -566,6 +566,7 @@ static int hrtimer_reprogram(struct hrtimer *timer, { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
- struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); int res;
WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0); @@ -610,6 +611,10 @@ static int hrtimer_reprogram(struct hrtimer *timer, if (cpu_base->hang_detected) return 0;
- /* Switchback to ONESHOT state */
- if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED))
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT);
- /*
*/
- Clockevents returns -ETIME, when the event was in the past.
Should we not do this in tick_program_event() instead? Note that there are a few more places that call that, the two in the hrtimer_interrupt() should be safe because if we're handling the interrupt its cannot be stopped anyhow.
hrtimer_force_reprogram() seems to need the annotation regardless.
Furthermore, by putting it in tick_program_event() you also don't need to fixup tick_nohz_restart().
Or am I completely missing something?
On 2 April 2015 at 19:04, Peter Zijlstra peterz@infradead.org wrote:
On Fri, Mar 27, 2015 at 10:44:28PM +0530, Viresh Kumar wrote:
+++ b/kernel/time/hrtimer.c @@ -566,6 +566,7 @@ static int hrtimer_reprogram(struct hrtimer *timer, {
/* Switchback to ONESHOT state */
if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED))
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT);
/* * Clockevents returns -ETIME, when the event was in the past. */
Should we not do this in tick_program_event() instead? Note that there are a few more places that call that, the two in the hrtimer_interrupt() should be safe because if we're handling the interrupt its cannot be stopped anyhow.
hrtimer_force_reprogram() seems to need the annotation regardless.
Do you mean that we need the same modification here as well? In order to save myself against any bugs, I have added following to clockevents_program_event():
+ /* We must be in ONESHOT state here */ + WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT, "Current state: %d\n", + dev->state);
And I never faced this WARN, or any such reports from Fengguang. So probably after all the hrtimers are gone, we will always call hrtimer_reprogram().
Furthermore, by putting it in tick_program_event() you also don't need to fixup tick_nohz_restart().
Or am I completely missing something?
So yes, if we would have done that in tick_program_event(), it would have been a single place for doing this change..
But, when Thomas ranted [1] at me on this earlier, he said:
" No, we are not doing a state change behind the scene and a magic restore.
2B) Implement the ONESHOT_STOPPED logic and make sure all of the core code is aware of it. "
And so I did it explicitly, wherever it is required.
-- viresh
On Thu, Apr 02, 2015 at 07:20:50PM +0530, Viresh Kumar wrote:
Or am I completely missing something?
So yes, if we would have done that in tick_program_event(), it would have been a single place for doing this change..
But, when Thomas ranted [1] at me on this earlier, he said:
" No, we are not doing a state change behind the scene and a magic restore.
2B) Implement the ONESHOT_STOPPED logic and make sure all of the core code is aware of it. "
lkml.org didn't work for me, alternative link:
http://marc.info/?l=linux-kernel&m=139966616803683&w=2
So I've read that (several times) and I thing Thomas meant something else.
So I think he disliked what you did to the clockevent layer, not so much you touching tick_program_event(). But the last_state thing (which was broken), and you imposing the SHUTDOWN policy for everybody.
But with the optional ONESHOT_STOPPED state both those are gone, and we'd end up with the much simpler patch below.
Further note that tick-oneshot is the natural place to do this, that is the glue layer between the (oneshot) clockevents stuff and the timer stuff. Pulling clockevents into hrtimers feels wrong.
Ingo, do you agree with that after reading Thomas' email?
--- --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -335,6 +335,9 @@ int clockevents_program_event(struct clo if (dev->state == CLOCK_EVT_STATE_SHUTDOWN) return 0;
+ WARN_ONCE(dev->state != CLOCK_EVT_STATE_ONESHOT, + "Current state: %d\n", dev->state); + /* Shortcut for clockevent devices that can deal with ktime. */ if (dev->features & CLOCK_EVT_FEAT_KTIME) return dev->set_next_ktime(expires, dev); --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -543,8 +543,7 @@ hrtimer_force_reprogram(struct hrtimer_c if (cpu_base->hang_detected) return;
- if (cpu_base->expires_next.tv64 != KTIME_MAX) - tick_program_event(cpu_base->expires_next, 1); + tick_program_event(cpu_base->expires_next, 1); }
/* @@ -1308,8 +1307,7 @@ void hrtimer_interrupt(struct clock_even raw_spin_unlock(&cpu_base->lock);
/* Reprogramming necessary ? */ - if (expires_next.tv64 == KTIME_MAX || - !tick_program_event(expires_next, 0)) { + if (!tick_program_event(expires_next, 0)) { cpu_base->hang_detected = 0; return; } --- a/kernel/time/tick-oneshot.c +++ b/kernel/time/tick-oneshot.c @@ -28,6 +28,13 @@ int tick_program_event(ktime_t expires, { struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
+ if (expires.tv64 == KTIME_MAX) { + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); + return 0; + } else if (unlikely(dev->state == CLOCK_EVT_STATE_ONESHOT_STOPPED)) { + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT); + } + return clockevents_program_event(dev, expires, force); }
* Peter Zijlstra peterz@infradead.org wrote:
On Thu, Apr 02, 2015 at 07:20:50PM +0530, Viresh Kumar wrote:
Or am I completely missing something?
So yes, if we would have done that in tick_program_event(), it would have been a single place for doing this change..
But, when Thomas ranted [1] at me on this earlier, he said:
" No, we are not doing a state change behind the scene and a magic restore.
2B) Implement the ONESHOT_STOPPED logic and make sure all of the core code is aware of it. "
lkml.org didn't work for me, alternative link:
http://marc.info/?l=linux-kernel&m=139966616803683&w=2
So I've read that (several times) and I thing Thomas meant something else.
So I think he disliked what you did to the clockevent layer, not so much you touching tick_program_event(). But the last_state thing (which was broken), and you imposing the SHUTDOWN policy for everybody.
But with the optional ONESHOT_STOPPED state both those are gone, and we'd end up with the much simpler patch below.
Further note that tick-oneshot is the natural place to do this, that is the glue layer between the (oneshot) clockevents stuff and the timer stuff. Pulling clockevents into hrtimers feels wrong.
Ingo, do you agree with that after reading Thomas' email?
Yeah, I suppose ...
Thanks,
Ingo
Clockevent device can now be stopped by switching to ONESHOT_STOPPED state, to avoid getting spurious interrupts on a tickless CPU.
Switch state to ONESHOT_STOPPED at following places:
1.) NOHZ_MODE_LOWRES Mode
Timers & hrtimers are dependent on tick for their working in this mode and the only place from where clockevent device is programmed is the tick-code.
In LOWRES mode we skip reprogramming the clockevent device in tick_nohz_stop_sched_tick() if expires == KTIME_MAX. In addition to that we must also switch the clockevent device to ONESHOT_STOPPED state to avoid all spurious interrupts that may follow.
2.) NOHZ_MODE_HIGHRES Mode
Tick & timers are dependent on hrtimers for their working in this mode and the only place from where clockevent device is programmed is the hrtimer-code.
There are two places here from which we reprogram the clockevent device or skip reprogramming it on expires == KTIME_MAX.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/hrtimer.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- kernel/time/tick-sched.c | 3 +++ 2 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 045ba7e2be6c..89d4b593dfc0 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -543,8 +543,19 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) if (cpu_base->hang_detected) return;
- if (cpu_base->expires_next.tv64 != KTIME_MAX) + if (cpu_base->expires_next.tv64 != KTIME_MAX) { tick_program_event(cpu_base->expires_next, 1); + } else { + struct clock_event_device *dev = + __this_cpu_read(tick_cpu_device.evtdev); + /* + * Don't need clockevent device anymore, stop it. + * + * We reach here only for NOHZ_MODE_HIGHRES mode and we are + * guaranteed that no timers/hrtimers are enqueued on this cpu. + */ + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); + } }
/* @@ -1312,9 +1323,36 @@ void hrtimer_interrupt(struct clock_event_device *dev) cpu_base->in_hrtirq = 0; raw_spin_unlock(&cpu_base->lock);
- /* Reprogramming necessary ? */ - if (expires_next.tv64 == KTIME_MAX || - !tick_program_event(expires_next, 0)) { + if (expires_next.tv64 == KTIME_MAX) { + struct clock_event_device *dev = + __this_cpu_read(tick_cpu_device.evtdev); + + cpu_base->hang_detected = 0; + + /* + * Don't need clockevent device anymore, stop it. + * + * We reach here only for NOHZ_MODE_HIGHRES mode and we are + * guaranteed that no timers/hrtimers are enqueued on this cpu. + * + * Most of the scenarios will be covered by similar code + * present in hrtimer_force_reprogram(), as we always try to + * evaluate tick requirement on idle/irq exit and cancel + * tick-sched hrtimer when tick isn't required anymore. + * + * It is required here as well as a special case. + * + * Last hrtimer fires on a tickless CPU and doesn't rearm + * itself. tick_nohz_irq_exit() reevaluates next event and it + * gets expires == KTIME_MAX. Because tick was already stopped, + * and last expires == new_expires, we return early. And the + * clockevent device is never stopped. + */ + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); + return; + } + + if (!tick_program_event(expires_next, 0)) { cpu_base->hang_detected = 0; return; } diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 47c04edd07df..ff271a26fa4d 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -685,6 +685,9 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); + else + /* stop clock event device */ + clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); goto out; }
On 03/27/2015 10:44 PM, Viresh Kumar wrote:
Clockevent device can now be stopped by switching to ONESHOT_STOPPED state, to avoid getting spurious interrupts on a tickless CPU.
Switch state to ONESHOT_STOPPED at following places:
1.) NOHZ_MODE_LOWRES Mode
Timers & hrtimers are dependent on tick for their working in this mode and the only place from where clockevent device is programmed is the tick-code.
In LOWRES mode we skip reprogramming the clockevent device in tick_nohz_stop_sched_tick() if expires == KTIME_MAX. In addition to that we must also switch the clockevent device to ONESHOT_STOPPED state to avoid all spurious interrupts that may follow.
2.) NOHZ_MODE_HIGHRES Mode
Tick & timers are dependent on hrtimers for their working in this mode and the only place from where clockevent device is programmed is the hrtimer-code.
There are two places here from which we reprogram the clockevent device or skip reprogramming it on expires == KTIME_MAX.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/time/hrtimer.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- kernel/time/tick-sched.c | 3 +++ 2 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 045ba7e2be6c..89d4b593dfc0 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -543,8 +543,19 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) if (cpu_base->hang_detected) return;
- if (cpu_base->expires_next.tv64 != KTIME_MAX)
- if (cpu_base->expires_next.tv64 != KTIME_MAX) { tick_program_event(cpu_base->expires_next, 1);
- } else {
struct clock_event_device *dev =
__this_cpu_read(tick_cpu_device.evtdev);
/*
* Don't need clockevent device anymore, stop it.
*
* We reach here only for NOHZ_MODE_HIGHRES mode and we are
* guaranteed that no timers/hrtimers are enqueued on this cpu.
*/
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED);
- }
}
/* @@ -1312,9 +1323,36 @@ void hrtimer_interrupt(struct clock_event_device *dev) cpu_base->in_hrtirq = 0; raw_spin_unlock(&cpu_base->lock);
- /* Reprogramming necessary ? */
- if (expires_next.tv64 == KTIME_MAX ||
!tick_program_event(expires_next, 0)) {
- if (expires_next.tv64 == KTIME_MAX) {
struct clock_event_device *dev =
__this_cpu_read(tick_cpu_device.evtdev);
cpu_base->hang_detected = 0;
/*
* Don't need clockevent device anymore, stop it.
*
* We reach here only for NOHZ_MODE_HIGHRES mode and we are
* guaranteed that no timers/hrtimers are enqueued on this cpu.
*
* Most of the scenarios will be covered by similar code
* present in hrtimer_force_reprogram(), as we always try to
* evaluate tick requirement on idle/irq exit and cancel
* tick-sched hrtimer when tick isn't required anymore.
*
* It is required here as well as a special case.
*
* Last hrtimer fires on a tickless CPU and doesn't rearm
* itself. tick_nohz_irq_exit() reevaluates next event and it
* gets expires == KTIME_MAX. Because tick was already stopped,
* and last expires == new_expires, we return early. And the
* clockevent device is never stopped.
*/
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED);
return;
- }
- if (!tick_program_event(expires_next, 0)) { cpu_base->hang_detected = 0; return; }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 47c04edd07df..ff271a26fa4d 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -685,6 +685,9 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer);
else
/* stop clock event device */
}clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); goto out;
Reviewed-by: Preeti U. Murthy preeti@linux.vnet.ibm.com
On Fri, Mar 27, 2015 at 10:44:29PM +0530, Viresh Kumar wrote:
kernel/time/hrtimer.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- kernel/time/tick-sched.c | 3 +++ 2 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 045ba7e2be6c..89d4b593dfc0 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -543,8 +543,19 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) if (cpu_base->hang_detected) return;
- if (cpu_base->expires_next.tv64 != KTIME_MAX)
- if (cpu_base->expires_next.tv64 != KTIME_MAX) { tick_program_event(cpu_base->expires_next, 1);
- } else {
struct clock_event_device *dev =
__this_cpu_read(tick_cpu_device.evtdev);
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED);
- }
} /* @@ -1312,9 +1323,36 @@ void hrtimer_interrupt(struct clock_event_device *dev) cpu_base->in_hrtirq = 0; raw_spin_unlock(&cpu_base->lock);
- /* Reprogramming necessary ? */
- if (expires_next.tv64 == KTIME_MAX ||
!tick_program_event(expires_next, 0)) {
- if (expires_next.tv64 == KTIME_MAX) {
struct clock_event_device *dev =
__this_cpu_read(tick_cpu_device.evtdev);
cpu_base->hang_detected = 0;
clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED);
return;
- }
- if (!tick_program_event(expires_next, 0)) { cpu_base->hang_detected = 0; return; }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 47c04edd07df..ff271a26fa4d 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -685,6 +685,9 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer);
else
}clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); goto out;
Should we teach tick_program_event() about KTIME_MAX instead of adding it to its callers?
On 2 April 2015 at 19:07, Peter Zijlstra peterz@infradead.org wrote:
Should we teach tick_program_event() about KTIME_MAX instead of adding it to its callers?
That's how I would have done, but Thomas (obviously for good reasons) rejected that :).. See reply to 2/3 for further details.
linaro-kernel@lists.linaro.org