When expiry is set to KTIME_MAX, we cancel the 'tick-sched' hrtimer in highres mode and skip reprogramming clockevent device in lowres mode. But, the clockevent device is already reprogrammed from tick-handler.
We will surely get interrupted atleast one more time at tick-interval.
In highres mode, as there is no tick-sched hrtimer to service, we wouldn't restart the tick.
But in lowres mode, we unconditionally reschedule the tick every time tick_nohz_handler() is called. So, even if 'tick_stopped' is set to '1', we will never be in full dynticks mode.
To fix this, lets skip rescheduling next tick from tick-handler when we are running tickless.
'tick_stopped' can be set to '1' in one more case, i.e. when expires isn't equal to KTIME_MAX but is greater than equal to two jiffies. We can still avoid reprogramming clock device from the tick in this case as it's going to be reprogrammed from irq_exit() anyway. So we could avoid one useless device write.
Similar piece of code was present initially when dynticks functionality was first added: 79bf2bb3.
But later commit fb02fbc removed this check to keep jiffies updated for the sake of long running softirqs, but was later again reverted in ae99286b due to spurious wakeups. However it was not entirely reverted. The long running softirqs were asked to fix themselves.
Introducing this change again shouldn't result in 'long running softirqs' issue discussed in fb02fbc, as tick_nohz_kick_tick() is still commented out.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- V1->V2: - Updated comments a bit for this one and added new patch 2/2. - Rebased over 3.16-rc3
Pushed here: git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git tick/lowres-go-tickless
kernel/time/tick-sched.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6558b7a..bb7b736 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -956,6 +956,12 @@ static void tick_nohz_handler(struct clock_event_device *dev) tick_sched_do_timer(now); tick_sched_handle(ts, regs);
+ /* + * Skip reprogramming next event if we are running tickless. + */ + if (unlikely(ts->tick_stopped)) + return; + while (tick_nohz_reprogram(ts, now)) { now = ktime_get(); tick_do_update_jiffies64(now);
In idle or full dynticks mode, the tick can be stopped for few jiffies to avoid unnecessary events.
In highres mode, when the tick fires again we reprogram the 'tick-sched' hrtimer from tick_sched_timer() (by returning HRTIMER_RESTART to hrtimer core).
Because the tick was stopped earlier, irq_exit() would eventually call tick_nohz_full_stop_tick() and it will surely reprogram hrtimer again.
We can avoid double reprogramming of tick-sched hrtimer by returning 'HRTIMER_NORESTART' from tick_sched_timer() when 'tick_stopped' is set to '1'.
Similar piece of code was present initially when dynticks functionality was first added: 79bf2bb3.
But later commit fb02fbc removed this check to keep jiffies updated for the sake of long running softirqs, but was later again reverted in ae99286b due to spurious wakeups. However it was not entirely reverted. The long running softirqs were asked to fix themselves.
Introducing this change again shouldn't result in 'long running softirqs' issue discussed in fb02fbc, as tick_nohz_kick_tick() is still commented out.
Suggested-by: Frederic Weisbecker fweisbec@gmail.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/tick-sched.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index bb7b736..2604eec 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1089,6 +1089,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) if (regs) tick_sched_handle(ts, regs);
+ /* Do not restart, when we are in idle or full dynticks mode */ + if (unlikely(ts->tick_stopped)) + return HRTIMER_NORESTART; + hrtimer_forward(timer, now, tick_period);
return HRTIMER_RESTART;
On Thu, Jul 03, 2014 at 11:12:52AM +0530, Viresh Kumar wrote:
When expiry is set to KTIME_MAX, we cancel the 'tick-sched' hrtimer in highres mode and skip reprogramming clockevent device in lowres mode. But, the clockevent device is already reprogrammed from tick-handler.
We will surely get interrupted atleast one more time at tick-interval.
In highres mode, as there is no tick-sched hrtimer to service, we wouldn't restart the tick.
But in lowres mode, we unconditionally reschedule the tick every time tick_nohz_handler() is called. So, even if 'tick_stopped' is set to '1', we will never be in full dynticks mode.
To fix this, lets skip rescheduling next tick from tick-handler when we are running tickless.
So the patch looks good but I wonder if the changelog is a bit misleading. It fixes the fact that in lowres mode, the tick is rescheduled at every tick.
But after this patch, don't we still have at one remaining spurious tick in this scenario:
tick_nohz_handler() { reprogram next tick }
irq_exit() { tick_nohz_stop_sched_tick() { expires = KTIME_MAX; //skip reprogramming, but still have a pending tick } }
//useless tick tick_nohz_handle()
We probably also need a cancel for low-res mode as well.
On 4 July 2014 17:43, Frederic Weisbecker fweisbec@gmail.com wrote:
So the patch looks good but I wonder if the changelog is a bit misleading. It fixes the fact that in lowres mode, the tick is rescheduled at every tick.
But after this patch, don't we still have at one remaining spurious tick in this scenario:
tick_nohz_handler() { reprogram next tick } irq_exit() { tick_nohz_stop_sched_tick() { expires = KTIME_MAX; //skip reprogramming, but still have a pending tick } } //useless tick tick_nohz_handle()
Yes, its still there and should have been mentioned clearly in logs.
We probably also need a cancel for low-res mode as well.
How do we cancel this? We are programming the clock-event device directly and there is no way to cancel that.
Similar problem stays in hres mode as well. Though we do cancel hrtimer, but we don't cancel/stop clkevt device.
And that will be fixed by ONESHOT_STOPPED series, Kevin has just sent few patches of ->set_dev_mode() stuff.
Makes sense? Or am I too tired staring at tick core this whole day? I probably caught one more issue, would share only after confirming :)
-- viresh
On Fri, Jul 04, 2014 at 06:03:21PM +0530, Viresh Kumar wrote:
On 4 July 2014 17:43, Frederic Weisbecker fweisbec@gmail.com wrote:
So the patch looks good but I wonder if the changelog is a bit misleading. It fixes the fact that in lowres mode, the tick is rescheduled at every tick.
But after this patch, don't we still have at one remaining spurious tick in this scenario:
tick_nohz_handler() { reprogram next tick } irq_exit() { tick_nohz_stop_sched_tick() { expires = KTIME_MAX; //skip reprogramming, but still have a pending tick } } //useless tick tick_nohz_handle()
Yes, its still there and should have been mentioned clearly in logs.
Yep.
We probably also need a cancel for low-res mode as well.
How do we cancel this? We are programming the clock-event device directly and there is no way to cancel that.
Similar problem stays in hres mode as well. Though we do cancel hrtimer, but we don't cancel/stop clkevt device.
Right, the issue only seem to appear in lowres.
And that will be fixed by ONESHOT_STOPPED series, Kevin has just sent few patches of ->set_dev_mode() stuff.
Makes sense? Or am I too tired staring at tick core this whole day? I probably caught one more issue, would share only after confirming :)
No, that sounds good :)
Note the patch is fine. Just the changelog needs to mention that the issue is only partially fixed.
-- viresh
While we are debating on changelogs, here are a few more structural concerns:
On Thu, Jul 03, 2014 at 11:12:52AM +0530, Viresh Kumar wrote:
When expiry is set to KTIME_MAX, we cancel the 'tick-sched' hrtimer in highres mode and skip reprogramming clockevent device in lowres mode. But, the clockevent device is already reprogrammed from tick-handler.
We will surely get interrupted atleast one more time at tick-interval.
In highres mode, as there is no tick-sched hrtimer to service, we wouldn't restart the tick.
But in lowres mode, we unconditionally reschedule the tick every time tick_nohz_handler() is called. So, even if 'tick_stopped' is set to '1', we will never be in full dynticks mode.
So above we have all the required informations. They are just a bit disorganized. Maybe try to synthesize things a bit. I tend to use the following changelog architecture, lets hope I'm not proposing something even more confusing...
1) Initial situation
When we reach the end of the tick handler, we unconditionally reschedule the next tick to the next jiffy. Then on irq exit, the nohz code overrides that setting if needed and defers the next tick as far away in the future as possible.
Now when we actually don't need any tick in the future (ie: expires == KTIME_MAX), low-res and high-res behave differently. In high-res mode we cancel the next tick that was previously scheduled from the tick handler. That's what we want. OTOH in low-res mode we don't do anything and the next tick is still scheduled to jiffies + 1.
2) Resulting issue
As a result when no tick is needed in low-res dynticks mode, we can recursively get a spurious tick every jiffy because then the next tick is always reprogrammed from the tick handler and is never cancelled. And this can happen indefinetly until some subsystem actually needs a precise tick in the future and only then we eventually overwrite the tick handler setting and defer the next one.
3) Solution:
To fix this, lets skip rescheduling next tick from tick-handler when we are running tickless.
'tick_stopped' can be set to '1' in one more case, i.e. when expires isn't equal to KTIME_MAX but is greater than equal to two jiffies. We can still avoid reprogramming clock device from the tick in this case as it's going to be reprogrammed from irq_exit() anyway. So we could avoid one useless device write.
Similar piece of code was present initially when dynticks functionality was first added: 79bf2bb3.
But later commit fb02fbc removed this check to keep jiffies updated for the sake of long running softirqs, but was later again reverted in ae99286b due to spurious wakeups. However it was not entirely reverted. The long running softirqs were asked to fix themselves.
Introducing this change again shouldn't result in 'long running softirqs' issue discussed in fb02fbc, as tick_nohz_kick_tick() is still commented out.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
V1->V2:
- Updated comments a bit for this one and added new patch 2/2.
- Rebased over 3.16-rc3
Pushed here: git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git tick/lowres-go-tickless
kernel/time/tick-sched.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6558b7a..bb7b736 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -956,6 +956,12 @@ static void tick_nohz_handler(struct clock_event_device *dev) tick_sched_do_timer(now); tick_sched_handle(ts, regs);
- /*
* Skip reprogramming next event if we are running tickless.
*/
- if (unlikely(ts->tick_stopped))
return;
- while (tick_nohz_reprogram(ts, now)) { now = ktime_get(); tick_do_update_jiffies64(now);
-- 2.0.0.rc2
On 4 July 2014 21:44, Frederic Weisbecker fweisbec@gmail.com wrote:
While we are debating on changelogs, here are a few more structural concerns:
So above we have all the required informations. They are just a bit disorganized. Maybe try to synthesize things a bit. I tend to use the following changelog architecture, lets hope I'm not proposing something even more confusing...
Initial situation
When we reach the end of the tick handler, we unconditionally reschedule the next tick to the next jiffy. Then on irq exit, the nohz code overrides that setting if needed and defers the next tick as far away in the future as possible.
Now when we actually don't need any tick in the future (ie: expires == KTIME_MAX), low-res and high-res behave differently. In high-res mode we cancel the next tick that was previously scheduled from the tick handler. That's what we want. OTOH in low-res mode we don't do anything and the next tick is still scheduled to jiffies + 1.
Resulting issue
As a result when no tick is needed in low-res dynticks mode, we can recursively get a spurious tick every jiffy because then the next tick is always reprogrammed from the tick handler and is never cancelled. And this can happen indefinetly until some subsystem actually needs a precise tick in the future and only then we eventually overwrite the tick handler setting and defer the next one.
Solution:
+1.
Thanks.
Hi Frederic,
On 07/04/2014 09:44 PM, Frederic Weisbecker wrote:
While we are debating on changelogs, here are a few more structural concerns:
On Thu, Jul 03, 2014 at 11:12:52AM +0530, Viresh Kumar wrote:
When expiry is set to KTIME_MAX, we cancel the 'tick-sched' hrtimer in highres mode and skip reprogramming clockevent device in lowres mode. But, the clockevent device is already reprogrammed from tick-handler.
We will surely get interrupted atleast one more time at tick-interval.
In highres mode, as there is no tick-sched hrtimer to service, we wouldn't restart the tick.
But in lowres mode, we unconditionally reschedule the tick every time tick_nohz_handler() is called. So, even if 'tick_stopped' is set to '1', we will never be in full dynticks mode.
So above we have all the required informations. They are just a bit disorganized. Maybe try to synthesize things a bit. I tend to use the following changelog architecture, lets hope I'm not proposing something even more confusing...
Initial situation
When we reach the end of the tick handler, we unconditionally reschedule the next tick to the next jiffy. Then on irq exit, the nohz code overrides that setting if needed and defers the next tick as far away in the future as possible.
Now when we actually don't need any tick in the future (ie: expires == KTIME_MAX), low-res and high-res behave differently. In high-res mode we cancel the next tick that was previously scheduled from the tick handler. That's what we want. OTOH in low-res mode we don't do anything and the next tick is still scheduled to jiffies + 1.
Resulting issue
As a result when no tick is needed in low-res dynticks mode, we can recursively get a spurious tick every jiffy because then the next tick is always reprogrammed from the tick handler and is never cancelled. And this can happen indefinetly until some subsystem actually needs a precise tick in the future and only then we eventually overwrite the tick handler setting and defer the next one.
Solution:
Thank you very much for the above changelog! :) Its true that this issue has to be explained in a detailed fashion since it touches aspects of both low and high resolution modes in subtle ways.
Regards Preeti U Murthy
To fix this, lets skip rescheduling next tick from tick-handler when we are running tickless.
'tick_stopped' can be set to '1' in one more case, i.e. when expires isn't equal to KTIME_MAX but is greater than equal to two jiffies. We can still avoid reprogramming clock device from the tick in this case as it's going to be reprogrammed from irq_exit() anyway. So we could avoid one useless device write.
Similar piece of code was present initially when dynticks functionality was first added: 79bf2bb3.
But later commit fb02fbc removed this check to keep jiffies updated for the sake of long running softirqs, but was later again reverted in ae99286b due to spurious wakeups. However it was not entirely reverted. The long running softirqs were asked to fix themselves.
Introducing this change again shouldn't result in 'long running softirqs' issue discussed in fb02fbc, as tick_nohz_kick_tick() is still commented out.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
V1->V2:
- Updated comments a bit for this one and added new patch 2/2.
- Rebased over 3.16-rc3
Pushed here: git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git tick/lowres-go-tickless
kernel/time/tick-sched.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6558b7a..bb7b736 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -956,6 +956,12 @@ static void tick_nohz_handler(struct clock_event_device *dev) tick_sched_do_timer(now); tick_sched_handle(ts, regs);
- /*
* Skip reprogramming next event if we are running tickless.
*/
- if (unlikely(ts->tick_stopped))
return;
- while (tick_nohz_reprogram(ts, now)) { now = ktime_get(); tick_do_update_jiffies64(now);
-- 2.0.0.rc2
linaro-kernel@lists.linaro.org