This is the next planned step of "add ->set_dev_mode" patchset..
Its not being sent out (before earlier patchset is accepted by all) to receive *more* criticism (I already got enough :)), but to give an overall view of where we are heading.
You can choose to skip reviewing this and concentrate on the first patchset instead unless that is upstreamed :)
Oh man, I am too scared now :)
Okay, here we go:
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next. When no timers/hrtimers are pending to be serviced, the expiry time is set to a special value: KTIME_MAX. This means that no events are required for indefinite amount of time.
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When expiry == KTIME_MAX, either we cancel the tick-sched hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from tick-handler for next tick.
So, the clockevent device will fire one more time. In NOHZ_MODE_HIGHRES, we will consider it as a spurious interrupt and just return from hrtimer_interrupt(). In NOHZ_MODE_LOWRES, we schedule the next tick again from tick_nohz_handler()? (This is what I could read from the code, not very sure though. Otherwise, it means that in NOHZ_MODE_LOWRES we are never tickless).
Ideally, as the clock event device is programmed in ONESHOT mode it should just fire one more time and that's it. But many implementations (like arm_arch_timer, etc) only have PERIODIC mode available and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts at tick rate and that will hurt our tickless-ness badly.
At this time the clockevent device should be stopped, or its interrupts may be masked in order to get these issues fixed.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this. Also, NOHZ_MODE_LOWRES problem mentioned above wouldn't be fixed by this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED (lkml.org/lkml/2014/5/9/508) to solve this problem.
First patch implements the required infrastructure to start/stop clockevent device. Third patch stops clockevent devices when no longer required and Second patch starts them again once required.
The review order can be 1,3,2 for better understanding. Patch 2 was required before 3 to keep 'git bisect' happy :)
Fourth patch is there to catch corner cases where we try to set next event while being in ONESHOT_STOPPED mode. We will do a WARN_ON_ONCE() then. The last patch modifies a sample driver (arm_arch_timer) to demonstrate/test this patchset.
Other drivers would be updated later.
Viresh Kumar (5): clockevents: Introduce CLOCK_EVT_MODE_ONESHOT_STOPPED mode tick-sched: switchback to ONESHOT mode if clockevent device is stopped tick-sched: stop clockevent device when no longer required clockevents: Catch event programming in ONESHOT_STOPPED mode clocksource: arm_arch_timer: Add support for CLOCK_EVT_MODE_ONESHOT_STOPPED
drivers/clocksource/arm_arch_timer.c | 1 + include/linux/clockchips.h | 1 + include/linux/tick.h | 2 ++ kernel/hrtimer.c | 53 +++++++++++++++++++++++++++++++++--- kernel/time/clockevents.c | 17 ++++++++++-- kernel/time/tick-oneshot.c | 20 ++++++++++++++ kernel/time/tick-sched.c | 4 +++ 7 files changed, 92 insertions(+), 6 deletions(-)
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next. When no timers/hrtimers are pending to be serviced, the expiry time is set to a special value: KTIME_MAX. This means that no events are required for indefinite amount of time.
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When expiry == KTIME_MAX, either we cancel the tick-sched hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from tick-handler for next tick.
So, the clockevent device will fire one more time. In NOHZ_MODE_HIGHRES, we will consider it as a spurious interrupt and just return from hrtimer_interrupt(). In NOHZ_MODE_LOWRES, we schedule the next tick again from tick_nohz_handler()? (This is what I could read from the code, not very sure though. Otherwise, it means that in NOHZ_MODE_LOWRES we are never tickless).
Ideally, as the clock event device is programmed in ONESHOT mode it should just fire one more time and that's it. But many implementations (like arm_arch_timer, etc) only have PERIODIC mode available and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts at tick rate and that will hurt our tickless-ness badly.
At this time the clockevent device should be stopped, or its interrupts may be masked in order to get these issues fixed.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this. Also, NOHZ_MODE_LOWRES problem mentioned above wouldn't be fixed by this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED (lkml.org/lkml/2014/5/9/508) to solve this problem.
This patch adds support for this new mode in clockevents core. clockevents_set_mode() would do a WARN() if this mode is supported and we failed to switch to it. If it isn't supported for some platform (->set_dev_mode() returns -ENOSYS), we will not update dev->mode and return early.
Two new APIs tick_stop_event() and tick_restart_event() are also implemented.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/clockchips.h | 1 + include/linux/tick.h | 2 ++ kernel/time/clockevents.c | 14 ++++++++++++-- kernel/time/tick-oneshot.c | 20 ++++++++++++++++++++ 4 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 08b203e..d8a108a 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -38,6 +38,7 @@ enum clock_event_mode { CLOCK_EVT_MODE_SHUTDOWN, CLOCK_EVT_MODE_PERIODIC, CLOCK_EVT_MODE_ONESHOT, + CLOCK_EVT_MODE_ONESHOT_STOPPED, CLOCK_EVT_MODE_RESUME, };
diff --git a/include/linux/tick.h b/include/linux/tick.h index b84773c..f9bc979 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -81,6 +81,8 @@ extern struct tick_device *tick_get_device(int cpu); # ifdef CONFIG_HIGH_RES_TIMERS extern int tick_init_highres(void); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_setup_sched_timer(void); # endif
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index eaf5b0d..9348da1 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -107,8 +107,18 @@ void clockevents_set_mode(struct clock_event_device *dev, if (dev->mode != mode) { int ret = dev->set_dev_mode(mode, dev);
- /* Currently available modes shouldn't fail */ - WARN_ONCE(ret, "Requested mode: %d, error: %d\n", mode, ret); + if (ret) { + /* + * Supported Modes shouldn't fail and only + * ONESTOP_STOPPED is optional + */ + if (mode == CLOCK_EVT_MODE_ONESHOT_STOPPED && + ret == -ENOSYS) + return; + else + WARN_ONCE(1, "Requested mode: %d, error: %d\n", + mode, ret); + }
dev->mode = mode;
diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c index 8241090..a3404e6 100644 --- a/kernel/time/tick-oneshot.c +++ b/kernel/time/tick-oneshot.c @@ -22,6 +22,26 @@ #include "tick-internal.h"
/** + * tick_stop_event + */ +void tick_stop_event(void) +{ + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); + + /* stop clock event device */ + clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT_STOPPED); +} + +/** + * tick_restart_event + */ +void tick_restart_event(void) +{ + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); + clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT); +} + +/** * tick_program_event */ int tick_program_event(ktime_t expires, int force)
On 9 June 2014 13:49, Viresh Kumar viresh.kumar@linaro.org wrote:
diff --git a/include/linux/tick.h b/include/linux/tick.h index b84773c..f9bc979 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -81,6 +81,8 @@ extern struct tick_device *tick_get_device(int cpu); # ifdef CONFIG_HIGH_RES_TIMERS extern int tick_init_highres(void); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_setup_sched_timer(void); # endif
This isn't enough as we need dummy definitions for tick-sched.c file as well for configurations where ONESHOT isn't present..
Following diff is folded into the original commit:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index 7ab92b1..6ca2b75 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -41,6 +41,8 @@ extern void tick_setup_oneshot(struct clock_event_device *newdev, void (*handler)(struct clock_event_device *), ktime_t nextevt); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_oneshot_notify(void); extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *)); extern void tick_resume_oneshot(void); @@ -82,6 +84,14 @@ static inline int tick_program_event(ktime_t expires, int force) { return 0; } +static inline void tick_stop_event(void) +{ + BUG(); +} +static inline void tick_restart_event(void) +{ + BUG(); +} static inline void tick_oneshot_notify(void) { } static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc) {
On Tue, Jun 10, 2014 at 03:10:45PM +0530, Viresh Kumar wrote:
On 9 June 2014 13:49, Viresh Kumar viresh.kumar@linaro.org wrote:
diff --git a/include/linux/tick.h b/include/linux/tick.h index b84773c..f9bc979 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -81,6 +81,8 @@ extern struct tick_device *tick_get_device(int cpu); # ifdef CONFIG_HIGH_RES_TIMERS extern int tick_init_highres(void); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_setup_sched_timer(void); # endif
This isn't enough as we need dummy definitions for tick-sched.c file as well for configurations where ONESHOT isn't present..
Following diff is folded into the original commit:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index 7ab92b1..6ca2b75 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -41,6 +41,8 @@ extern void tick_setup_oneshot(struct clock_event_device *newdev, void (*handler)(struct clock_event_device *), ktime_t nextevt); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_oneshot_notify(void); extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *)); extern void tick_resume_oneshot(void); @@ -82,6 +84,14 @@ static inline int tick_program_event(ktime_t expires, int force) { return 0; } +static inline void tick_stop_event(void) +{
BUG();
+} +static inline void tick_restart_event(void) +{
BUG();
+} static inline void tick_oneshot_notify(void) { } static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc) {
If the goal is to ensure these are never called in a given Kconfig off case, IMHO the best is to not define them at all in that case.
That way the mistake is spotted at build time instead of boot time and is way easier to identify.
On 10-Jun-2014, at 6:24 pm, Frederic Weisbecker fweisbec@gmail.com wrote:
On Tue, Jun 10, 2014 at 03:10:45PM +0530, Viresh Kumar wrote:
On 9 June 2014 13:49, Viresh Kumar viresh.kumar@linaro.org wrote: diff --git a/include/linux/tick.h b/include/linux/tick.h index b84773c..f9bc979 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -81,6 +81,8 @@ extern struct tick_device *tick_get_device(int cpu); # ifdef CONFIG_HIGH_RES_TIMERS extern int tick_init_highres(void); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_setup_sched_timer(void); # endif
This isn't enough as we need dummy definitions for tick-sched.c file as well for configurations where ONESHOT isn't present..
Following diff is folded into the original commit:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index 7ab92b1..6ca2b75 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -41,6 +41,8 @@ extern void tick_setup_oneshot(struct clock_event_device *newdev, void (*handler)(struct clock_event_device *), ktime_t nextevt); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_oneshot_notify(void); extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *)); extern void tick_resume_oneshot(void); @@ -82,6 +84,14 @@ static inline int tick_program_event(ktime_t expires, int force) { return 0; } +static inline void tick_stop_event(void) +{
BUG();
+} +static inline void tick_restart_event(void) +{
BUG();
+} static inline void tick_oneshot_notify(void) { } static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc) {
If the goal is to ensure these are never called in a given Kconfig off case, IMHO the best is to not define them at all in that case.
That way the mistake is spotted at build time instead of boot time and is way easier to identify.
When I thought again it looks like the dummy implementations are never used as tick-sched is only compiled with ONESHOT configuration. :)
Viresh Kumar viresh.kumar@linaro.org writes:
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next.
Just drop this intro sentence. It adds more confusion than value, IMO.
When no timers/hrtimers are pending to be serviced, the expiry time is set to a
drop "to be serviced"
special value: KTIME_MAX. This means that no events are required for indefinite amount of time.
s/required for indefinite amount of time/pending/
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When expiry == KTIME_MAX, either we cancel the tick-sched hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from tick-handler for next tick.
So, the clockevent device will fire one more time. In NOHZ_MODE_HIGHRES, we will consider it as a spurious interrupt and just return from hrtimer_interrupt(). In NOHZ_MODE_LOWRES, we schedule the next tick again from tick_nohz_handler()? (This is what I could read from the code, not very sure though. Otherwise, it means that in NOHZ_MODE_LOWRES we are never tickless).
Ideally, as the clock event device is programmed in ONESHOT mode it should just fire one more time and that's it.
Is this really the "ideal" case? Even if it fires one more time, it's still a spurious interrupt that causes an unncessary wakeup, right?
But many implementations (like arm_arch_timer, etc) only have PERIODIC mode available and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts at tick rate and that will hurt our tickless-ness badly.
At this time the clockevent device should be stopped, or its interrupts may be
At which time? The copy/paste from the cover letter here adds confusion.
masked in order to get these issues fixed.
See comment in cover letter.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this. Also, NOHZ_MODE_LOWRES problem mentioned above wouldn't be fixed by this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED (lkml.org/lkml/2014/5/9/508) to solve this problem.
This patch adds support for this new mode in clockevents core. clockevents_set_mode() would do a WARN() if this mode is supported and we failed to switch to it. If it isn't supported for some platform (->set_dev_mode() returns -ENOSYS), we will not update dev->mode and return early.
Two new APIs tick_stop_event() and tick_restart_event() are also implemented.
Uh, Why?
They are introduced here without explanation and without any users. This part should be a separate patch, introduced along with their users.
IMO, you should be following the sequence as discussed earlier
- introduce new ->set_dev_mode() - convert some clockevent devices to use it - introduce ONESHOT_STOPPED (and any dependencies) - convert a platform to use it.
The ordering of this is all mixed up in this series and rather confusing to review.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
include/linux/clockchips.h | 1 + include/linux/tick.h | 2 ++ kernel/time/clockevents.c | 14 ++++++++++++-- kernel/time/tick-oneshot.c | 20 ++++++++++++++++++++ 4 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 08b203e..d8a108a 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -38,6 +38,7 @@ enum clock_event_mode { CLOCK_EVT_MODE_SHUTDOWN, CLOCK_EVT_MODE_PERIODIC, CLOCK_EVT_MODE_ONESHOT,
- CLOCK_EVT_MODE_ONESHOT_STOPPED, CLOCK_EVT_MODE_RESUME,
}; diff --git a/include/linux/tick.h b/include/linux/tick.h index b84773c..f9bc979 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -81,6 +81,8 @@ extern struct tick_device *tick_get_device(int cpu); # ifdef CONFIG_HIGH_RES_TIMERS extern int tick_init_highres(void); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_setup_sched_timer(void); # endif diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index eaf5b0d..9348da1 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -107,8 +107,18 @@ void clockevents_set_mode(struct clock_event_device *dev, if (dev->mode != mode) { int ret = dev->set_dev_mode(mode, dev);
/* Currently available modes shouldn't fail */
WARN_ONCE(ret, "Requested mode: %d, error: %d\n", mode, ret);
this code is not in mainline (or -next), so I'm not sure what this series applies to. Normally, it's good practice to state in the cover letter what a series applies to.
However, the above removed code has never been upstream AFAICT, so it looks to me like this series applies on top of some earlier version of your patchset, which means this series has never actually been applied to mainline and compile tested. Ugh.
Viresh, these are the kind of things that really get on the nerves of reviewers and maintainers. It's obvious on review that this patch has not been compile tested against upstream, and it's not stated in the cover letter what is should apply to, which makes me wonder how much more time I should spend on it.
Combined with the fact that I don't think you followed my previous advice about breaking up the series into the right sequence, I'm quickly arriving at burnout on this series.
Kevin
Kevin Hilman khilman@linaro.org writes:
Viresh Kumar viresh.kumar@linaro.org writes:
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next.
Just drop this intro sentence. It adds more confusion than value, IMO.
When no timers/hrtimers are pending to be serviced, the expiry time is set to a
drop "to be serviced"
special value: KTIME_MAX. This means that no events are required for indefinite amount of time.
s/required for indefinite amount of time/pending/
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When expiry == KTIME_MAX, either we cancel the tick-sched hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from tick-handler for next tick.
So, the clockevent device will fire one more time. In NOHZ_MODE_HIGHRES, we will consider it as a spurious interrupt and just return from hrtimer_interrupt(). In NOHZ_MODE_LOWRES, we schedule the next tick again from tick_nohz_handler()? (This is what I could read from the code, not very sure though. Otherwise, it means that in NOHZ_MODE_LOWRES we are never tickless).
Ideally, as the clock event device is programmed in ONESHOT mode it should just fire one more time and that's it.
Is this really the "ideal" case? Even if it fires one more time, it's still a spurious interrupt that causes an unncessary wakeup, right?
But many implementations (like arm_arch_timer, etc) only have PERIODIC mode available and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts at tick rate and that will hurt our tickless-ness badly.
At this time the clockevent device should be stopped, or its interrupts may be
At which time? The copy/paste from the cover letter here adds confusion.
masked in order to get these issues fixed.
See comment in cover letter.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this. Also, NOHZ_MODE_LOWRES problem mentioned above wouldn't be fixed by this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED (lkml.org/lkml/2014/5/9/508) to solve this problem.
This patch adds support for this new mode in clockevents core. clockevents_set_mode() would do a WARN() if this mode is supported and we failed to switch to it. If it isn't supported for some platform (->set_dev_mode() returns -ENOSYS), we will not update dev->mode and return early.
Two new APIs tick_stop_event() and tick_restart_event() are also implemented.
Uh, Why?
They are introduced here without explanation and without any users. This part should be a separate patch, introduced along with their users.
IMO, you should be following the sequence as discussed earlier
- introduce new ->set_dev_mode()
- convert some clockevent devices to use it
- introduce ONESHOT_STOPPED (and any dependencies)
- convert a platform to use it.
The ordering of this is all mixed up in this series and rather confusing to review.
sigh, nevermind this part of my review (or the rant below). I misunderstood that this series is supposed to go on top of the ->set_dev_mode() series.
FWIW, the cover letter didn't help me understand that because it's mostly a cut/paste from the set_dev_mode series (and should not be.)
Kevin
On 11 June 2014 04:47, Kevin Hilman khilman@linaro.org wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next.
Just drop this intro sentence. It adds more confusion than value, IMO.
When no timers/hrtimers are pending to be serviced, the expiry time is set to a
drop "to be serviced"
special value: KTIME_MAX. This means that no events are required for indefinite amount of time.
s/required for indefinite amount of time/pending/
Thanks for all above suggestions, will fix those.
Ideally, as the clock event device is programmed in ONESHOT mode it should just fire one more time and that's it.
Is this really the "ideal" case?
I thought so as core has asked driver to program device in ONESHOT mode and it should stop by itself after firing once. Many drivers have taken care of this even when they have emulated ONESHOT over PERIODIC mode by disabling clockevent device from irq handler.
Even if it fires one more time, it's still a spurious interrupt that causes an unncessary wakeup, right?
Yes, still a problem but not as worst as getting these spurious interrupts at tick-rate. And so thought maybe we should mention this.
But many implementations (like arm_arch_timer, etc) only have PERIODIC mode available and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts at tick rate and that will hurt our tickless-ness badly.
At this time the clockevent device should be stopped, or its interrupts may be
At which time? The copy/paste from the cover letter here adds confusion.
Hmm..
masked in order to get these issues fixed.
See comment in cover letter.
Ok.
Two new APIs tick_stop_event() and tick_restart_event() are also implemented.
Uh, Why?
They are introduced here without explanation and without any users. This part should be a separate patch, introduced along with their users.
Correct.
IMO, you should be following the sequence as discussed earlier
- introduce new ->set_dev_mode()
- convert some clockevent devices to use it
- introduce ONESHOT_STOPPED (and any dependencies)
- convert a platform to use it.
The ordering of this is all mixed up in this series and rather confusing to review.
Okay. Will combine both patchsets for next version and make it as light-weight as possible ..
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index eaf5b0d..9348da1 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -107,8 +107,18 @@ void clockevents_set_mode(struct clock_event_device *dev, if (dev->mode != mode) { int ret = dev->set_dev_mode(mode, dev);
/* Currently available modes shouldn't fail */
WARN_ONCE(ret, "Requested mode: %d, error: %d\n", mode, ret);
this code is not in mainline (or -next), so I'm not sure what this series applies to. Normally, it's good practice to state in the cover letter what a series applies to.
I mentioned this in cover-letter: "This is the next planned step of "add ->set_dev_mode" patchset.."
and didn't mention explicitly that it is based of "set_dev_mode" patchset. Should have mentioned that clearly ..
However, the above removed code has never been upstream AFAICT, so it looks to me like this series applies on top of some earlier version of your patchset, which means this series has never actually been applied to mainline and compile tested. Ugh.
As this was based over set_dev_mode patchset, these problems weren't around. It was actually tested on my exynos-dual A15 board and we could get rid of these spurious interrupts.
Combined with the fact that I don't think you followed my previous advice about breaking up the series into the right sequence, I'm quickly arriving at burnout on this series.
Did I get the sequence wrong?
Yes, i haven't sent a light-weight (i.e. modifying few drivers only) V6 of "set_dev_mode" patchset as I was worried that people may want to review V5 (though I should have sent sent a light-weight version during V5 only).
Also I did made this patchset light-weight by just updating "arm_arch_timer" driver and not touching anything else.
Sorry for the confusion.
Viresh Kumar viresh.kumar@linaro.org writes:
A clockevent device is used to service timers/hrtimers requests and the next event (when it should fire) is decided by the timer/hrtimer expiring next. When no timers/hrtimers are pending to be serviced, the expiry time is set to a special value: KTIME_MAX. This means that no events are required for indefinite amount of time.
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When expiry == KTIME_MAX, either we cancel the tick-sched hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from tick-handler for next tick.
So, the clockevent device will fire one more time. In NOHZ_MODE_HIGHRES, we will consider it as a spurious interrupt and just return from hrtimer_interrupt(). In NOHZ_MODE_LOWRES, we schedule the next tick again from tick_nohz_handler()? (This is what I could read from the code, not very sure though. Otherwise, it means that in NOHZ_MODE_LOWRES we are never tickless).
Ideally, as the clock event device is programmed in ONESHOT mode it should just fire one more time and that's it. But many implementations (like arm_arch_timer, etc) only have PERIODIC mode available and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts at tick rate and that will hurt our tickless-ness badly.
At this time the clockevent device should be stopped, or its interrupts may be masked in order to get these issues fixed.
A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this. Also, NOHZ_MODE_LOWRES problem mentioned above wouldn't be fixed by this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED (lkml.org/lkml/2014/5/9/508) to solve this problem.
This patch adds support for this new mode in clockevents core. clockevents_set_mode() would do a WARN() if this mode is supported and we failed to switch to it. If it isn't supported for some platform (->set_dev_mode() returns -ENOSYS), we will not update dev->mode and return early.
Two new APIs tick_stop_event() and tick_restart_event() are also implemented.
but they aren't used here, so it's better to implment them (and explain/justify them) where they are used.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
include/linux/clockchips.h | 1 + include/linux/tick.h | 2 ++ kernel/time/clockevents.c | 14 ++++++++++++-- kernel/time/tick-oneshot.c | 20 ++++++++++++++++++++ 4 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 08b203e..d8a108a 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -38,6 +38,7 @@ enum clock_event_mode { CLOCK_EVT_MODE_SHUTDOWN, CLOCK_EVT_MODE_PERIODIC, CLOCK_EVT_MODE_ONESHOT,
- CLOCK_EVT_MODE_ONESHOT_STOPPED, CLOCK_EVT_MODE_RESUME,
}; diff --git a/include/linux/tick.h b/include/linux/tick.h index b84773c..f9bc979 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -81,6 +81,8 @@ extern struct tick_device *tick_get_device(int cpu); # ifdef CONFIG_HIGH_RES_TIMERS extern int tick_init_highres(void); extern int tick_program_event(ktime_t expires, int force); +extern void tick_stop_event(void); +extern void tick_restart_event(void); extern void tick_setup_sched_timer(void); # endif diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index eaf5b0d..9348da1 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -107,8 +107,18 @@ void clockevents_set_mode(struct clock_event_device *dev, if (dev->mode != mode) { int ret = dev->set_dev_mode(mode, dev);
/* Currently available modes shouldn't fail */
WARN_ONCE(ret, "Requested mode: %d, error: %d\n", mode, ret);
if (ret) {
/*
* Supported Modes shouldn't fail and only
* ONESTOP_STOPPED is optional
*/
This patch needs to be very clear about what "optional" means. Based on the current code, optional seems to mean that can fail and we dont' do anything about it...
if (mode == CLOCK_EVT_MODE_ONESHOT_STOPPED &&
ret == -ENOSYS)
return;
else
WARN_ONCE(1, "Requested mode: %d, error: %d\n",
mode, ret);
}
... However, since we know that it's very likely that unsupported modes will leave the timers disabled, there needs to be some some error recovery here. e.g. it should be checking what mode the clockevent device is in before attempting to change mode, and putting it back into that mode upon failure.
Also, I suggest that you test this error recovery approach for the platform you're testing on *before* the final patch which adds ONESHOT_STOPPED support to the clockevent device, and then document in the cover letter how the error path was tested, and on which platform(s).
Kevin
On 11 June 2014 05:29, Kevin Hilman khilman@linaro.org wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
Two new APIs tick_stop_event() and tick_restart_event() are also implemented.
but they aren't used here, so it's better to implment them (and explain/justify them) where they are used.
Okay.
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index eaf5b0d..9348da1 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -107,8 +107,18 @@ void clockevents_set_mode(struct clock_event_device *dev, if (dev->mode != mode) { int ret = dev->set_dev_mode(mode, dev);
/* Currently available modes shouldn't fail */
WARN_ONCE(ret, "Requested mode: %d, error: %d\n", mode, ret);
if (ret) {
/*
* Supported Modes shouldn't fail and only
* ONESTOP_STOPPED is optional
*/
This patch needs to be very clear about what "optional" means. Based on the current code, optional seems to mean that can fail and we dont' do anything about it...
Yes, comment surely needs some improvement but does it look wrong codewise as well ?
We are returning early (without updating dev->mode) if ONESHOT isn't implemented by driver as it returned -ENOSYS. Otherwise we are always doing a WARN() (even if ONESHOT failed).
If the code looks fine, what about this update to above comment
/* * Switching to any mode *must not* fail. * * Only ONESTOP_STOPPED is a optional and drivers should * return -ENOSYS if they don't implement it. */
if (mode == CLOCK_EVT_MODE_ONESHOT_STOPPED &&
ret == -ENOSYS)
return;
else
WARN_ONCE(1, "Requested mode: %d, error: %d\n",
mode, ret);
}
... However, since we know that it's very likely that unsupported modes will leave the timers disabled, there needs to be some some error recovery here. e.g. it should be checking what mode the clockevent device is in before attempting to change mode, and putting it back into that mode upon failure.
You meant this only for the if block above, right? As only ONESHOT_STOPPED is optional.
Yeah, we should try to revert to ONESHOT (we must be in that mode earlier) mode here.
Also, I suggest that you test this error recovery approach for the platform you're testing on *before* the final patch which adds ONESHOT_STOPPED support to the clockevent device, and then document in the cover letter how the error path was tested, and on which platform(s).
Will do that.
We already have ONESHOT_STOPPED mode available now and can use that to disable spurious interrupts hurting our tickless-ness.
In order to not break 'git bisect', lets modify code to restart of clockevent device first, otherwise between the commits we might never come back from tickless state.
Before reprogramming clockevent device it should be switched back to ONESHOT. It might be switched to ONESHOT_STOPPED mode earlier to avoid getting spurious interrupts on a tickless CPU.
So here are the scenarios I could figure out (relevant for both NO_HZ_IDLE and NO_HZ_FULL cases):
1: Mode: NOHZ_MODE_LOWRES
Ticks are restarted from tick_nohz_restart() when we exit from tickless state. In LOWRES mode ticks are handled directly by programming clockevent device and not with hrtimers.
Restart clockevent device by switching to ONESHOT mode first when we re-enable ticks.
If timers are added to a tickless CPU, we restart ticks first by calling wake_up_nohz_cpu() and so we don't need to restart clockevent device in that case.
We aren't doing anything when hrtimers are added on a tickless CPU and that must be handled separately.
2. Mode: NOHZ_MODE_HIGHRES
In HIGHRES mode, ticks are handled with help of a special hrtimer: tick-sched. hrtimer_start() finally calls hrtimer_reprogram() if we need to reprogram clockevent device. This would happen in two cases:
- hrtimers queue was empty and we are adding the first hrtimer. - New hrtimer's expiry is before all enqueued timers.
We want to reconfigure mode only for the first case. There doesn't exist any infrastructure to identify second case. It wouldn't hurt if we call tick_restart_event() (eventually clockevents_set_mode()) unconditionally. Because we will return without doing anything when current mode is same as new mode.
This will also handle the case where a hrtimer is added on a tickless cpu, as we will restart clockevent device for the first hrtimer.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/hrtimer.c | 6 ++++++ kernel/time/tick-sched.c | 2 ++ 2 files changed, 8 insertions(+)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 3ab2899..3c0bff2 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -645,6 +645,12 @@ static int hrtimer_reprogram(struct hrtimer *timer, return 0;
/* + * clockevent device might be in ONESHOT_STOPPED mode, switchback to + * ONESHOT mode. + */ + tick_restart_event(); + + /* * Clockevents returns -ETIME, when the event was in the past. */ res = tick_program_event(expires, 0); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6558b7a..2c9a145 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -855,6 +855,8 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) if (hrtimer_active(&ts->sched_timer)) break; } else { + /* Switchback clockevent dev to ONESHOT mode */ + tick_restart_event(); if (!tick_program_event( hrtimer_get_expires(&ts->sched_timer), 0)) break;
We already have ONESHOT_STOPPED mode available now and can use that to disable spurious interrupts hurting our tickless CPU.
Thomas suggested not to do this 'behind the scene' and make sure all of the core code is aware of it.
So here are the scenarios I could figure out (relevant for both NO_HZ_IDLE and NO_HZ_FULL cases):
1: Mode: NOHZ_MODE_LOWRES
We don't reprogram clock event device from tick_nohz_stop_sched_tick() when 'expires == KTIME_MAX'. But tick_nohz_handler() have already reprogrammed it for next tick (and will continue doing that for all spurious ticks). So, we must stop clock-event device from here.
Once tickless, we restart tick as soon as request for any timer comes. We need not stop clockevent device again when that timer expires as tick_nohz_stop_sched_tick() will be called again.
What if hrtimer is added on a tickless system? It looks we aren't handling that right now and should be fixed separately.
2. Mode: NOHZ_MODE_HIGHRES
We cancel tick-sched hrtimer from tick_nohz_stop_sched_tick() when 'expires == KTIME_MAX'. But tick_sched_timer() has already reprogrammed clockevent device for next tick and so must be stopped. hrtimer_cancel() would finally call hrtimer_force_reprogram() and we reevaluate need to reprogram clockevent device there. Currently if 'expires == KTIME_MAX', we skip reprogramming clockevent device. At this point, we are guaranteed that no timers/hrtimers are pending and so stopping clockevent device from here shouldn't hurt.
Also if a hrtimer is added and then removed on a tickless CPU, we are guaranteed to stop clockevent device from here.
Now left the case where a hrtimer is added on a tickless CPU and it expires.
From tick_nohz_irq_exit() we will call tick_nohz_stop_sched_tick() which must
reevaluate pending timers to check if tick must be disabled again. We will call hrtimer_cancel() when we decide to go tickless again, BUT the tick-sched hrtimer was never re-queued on a tickless CPU and so we wouldn't reach hrtimer_force_reprogram().
To get this fixed, some special handling is also added to hrtimer_interrupt().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/hrtimer.c | 47 +++++++++++++++++++++++++++++++++++++++++++---- kernel/time/tick-sched.c | 2 ++ 2 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 3c0bff2..519c229 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -591,8 +591,22 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) if (cpu_base->hang_detected) return;
- if (cpu_base->expires_next.tv64 != KTIME_MAX) + if (cpu_base->expires_next.tv64 != KTIME_MAX) { tick_program_event(cpu_base->expires_next, 1); + } else { + /* + * Don't need clockevent device anymore, stop it. + * + * We reach here only for NOHZ_MODE_HIGHRES mode and we are + * guaranteed that no timers/hrtimers are enqueued on this cpu. + * + * We will reach here: + * - when tick-sched hrtimer is cancelled to enter into tickless + * or CPU idle state + * - or a hrtimer is cancelled on a tickless cpu + */ + tick_stop_event(); + } }
/* @@ -1371,9 +1385,34 @@ retry: cpu_base->expires_next = expires_next; raw_spin_unlock(&cpu_base->lock);
- /* Reprogramming necessary ? */ - if (expires_next.tv64 == KTIME_MAX || - !tick_program_event(expires_next, 0)) { + if (expires_next.tv64 == KTIME_MAX) { + cpu_base->hang_detected = 0; + + /* + * Don't need clockevent device anymore, stop it. + * + * We reach here only for NOHZ_MODE_HIGHRES and we are + * guaranteed to not have any pending timers/hrtimers on this + * cpu. Most of the scenarios will be covered by similar code + * present in hrtimer_force_reprogram(), as we always try to + * evaluate tick requirement on idle/irq exit and cancel + * tick-sched hrtimer when tick isn't required anymore. + * + * It is required here as well for the extreme corner case. + * + * Suppose last hrtimer fires on a tickless CPU, we reach + * tick_nohz_irq_exit(). While reevaluating we will get expires + * == KTIME_MAX and try to cancel tick-sched hrtimer. BUT, the + * tick-sched hrtimer was never queued on a tickless CPU and so + * we wouldn't reach hrtimer_force_reprogram(). + * + * So, disable Clockevent device here as well. + */ + tick_stop_event(); + return; + } + + if (!tick_program_event(expires_next, 0)) { cpu_base->hang_detected = 0; return; } diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 2c9a145..93f6f2e 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -652,6 +652,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, if (unlikely(expires.tv64 == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); + else + tick_stop_event(); goto out; }
Clockevent devices switch to ONESHOT_STOPPED mode when they are no longer required, i.e. expires == KTIME_MAX. We have taken care of all the call sites (we were aware of), where clockevent device must be switched back to ONESHOT mode.
To make sure we haven't missed any corner case, do WARN() when we try to program next event while we are still configured in ONESHOT_STOPPED mode.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/clockevents.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 9348da1..d04da72 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -267,6 +267,9 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, if (dev->mode == CLOCK_EVT_MODE_SHUTDOWN) return 0;
+ /* Must switch to ONESHOT mode before programming next event */ + WARN_ON_ONCE(dev->mode == CLOCK_EVT_MODE_ONESHOT_STOPPED); + /* Shortcut for clockevent devices that can deal with ktime. */ if (dev->features & CLOCK_EVT_FEAT_KTIME) return dev->set_next_ktime(expires, dev);
Viresh Kumar viresh.kumar@linaro.org writes:
Clockevent devices switch to ONESHOT_STOPPED mode when they are no longer required, i.e. expires == KTIME_MAX. We have taken care of all the call sites (we were aware of), where clockevent device must be switched back to ONESHOT mode.
To make sure we haven't missed any corner case, do WARN() when we try to program next event while we are still configured in ONESHOT_STOPPED mode.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
IMO, this one should probably just be folded into 2/5.
Kevin
On 11 June 2014 05:17, Kevin Hilman khilman@linaro.org wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
Clockevent devices switch to ONESHOT_STOPPED mode when they are no longer required, i.e. expires == KTIME_MAX. We have taken care of all the call sites (we were aware of), where clockevent device must be switched back to ONESHOT mode.
To make sure we haven't missed any corner case, do WARN() when we try to program next event while we are still configured in ONESHOT_STOPPED mode.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
IMO, this one should probably just be folded into 2/5.
Right.
arm_arch_timer driver emulates ONESHOT mode over PERIODIC mode and so even if the clockevent device isn't used for some CPU, we will continue getting spurious interrupts from the clockevent device.
Clockevents core supports a CLOCK_EVT_MODE_ONESHOT_STOPPED now which will be called when the clockevent device running in ONESHOT mode is required to be stopped.
Lets add support to handle that in this driver. This was tested on 'Samsung-Exynos' (dual A15) board.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/clocksource/arm_arch_timer.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index 44f1eaa..3589bbc 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -186,6 +186,7 @@ static __always_inline int timer_set_mode(const int access, int mode, switch (mode) { case CLOCK_EVT_MODE_UNUSED: case CLOCK_EVT_MODE_SHUTDOWN: + case CLOCK_EVT_MODE_ONESHOT_STOPPED: ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk); ctrl &= ~ARCH_TIMER_CTRL_ENABLE; arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
linaro-kernel@lists.linaro.org