While we are debating on changelogs, here are a few more structural concerns:
On Thu, Jul 03, 2014 at 11:12:52AM +0530, Viresh Kumar wrote:
When expiry is set to KTIME_MAX, we cancel the 'tick-sched' hrtimer in highres mode and skip reprogramming clockevent device in lowres mode. But, the clockevent device is already reprogrammed from tick-handler.
We will surely get interrupted atleast one more time at tick-interval.
In highres mode, as there is no tick-sched hrtimer to service, we wouldn't restart the tick.
But in lowres mode, we unconditionally reschedule the tick every time tick_nohz_handler() is called. So, even if 'tick_stopped' is set to '1', we will never be in full dynticks mode.
So above we have all the required informations. They are just a bit disorganized. Maybe try to synthesize things a bit. I tend to use the following changelog architecture, lets hope I'm not proposing something even more confusing...
1) Initial situation
When we reach the end of the tick handler, we unconditionally reschedule the next tick to the next jiffy. Then on irq exit, the nohz code overrides that setting if needed and defers the next tick as far away in the future as possible.
Now when we actually don't need any tick in the future (ie: expires == KTIME_MAX), low-res and high-res behave differently. In high-res mode we cancel the next tick that was previously scheduled from the tick handler. That's what we want. OTOH in low-res mode we don't do anything and the next tick is still scheduled to jiffies + 1.
2) Resulting issue
As a result when no tick is needed in low-res dynticks mode, we can recursively get a spurious tick every jiffy because then the next tick is always reprogrammed from the tick handler and is never cancelled. And this can happen indefinetly until some subsystem actually needs a precise tick in the future and only then we eventually overwrite the tick handler setting and defer the next one.
3) Solution:
To fix this, lets skip rescheduling next tick from tick-handler when we are running tickless.
'tick_stopped' can be set to '1' in one more case, i.e. when expires isn't equal to KTIME_MAX but is greater than equal to two jiffies. We can still avoid reprogramming clock device from the tick in this case as it's going to be reprogrammed from irq_exit() anyway. So we could avoid one useless device write.
Similar piece of code was present initially when dynticks functionality was first added: 79bf2bb3.
But later commit fb02fbc removed this check to keep jiffies updated for the sake of long running softirqs, but was later again reverted in ae99286b due to spurious wakeups. However it was not entirely reverted. The long running softirqs were asked to fix themselves.
Introducing this change again shouldn't result in 'long running softirqs' issue discussed in fb02fbc, as tick_nohz_kick_tick() is still commented out.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
V1->V2:
- Updated comments a bit for this one and added new patch 2/2.
- Rebased over 3.16-rc3
Pushed here: git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git tick/lowres-go-tickless
kernel/time/tick-sched.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6558b7a..bb7b736 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -956,6 +956,12 @@ static void tick_nohz_handler(struct clock_event_device *dev) tick_sched_do_timer(now); tick_sched_handle(ts, regs);
- /*
* Skip reprogramming next event if we are running tickless.
*/
- if (unlikely(ts->tick_stopped))
return;
- while (tick_nohz_reprogram(ts, now)) { now = ktime_get(); tick_do_update_jiffies64(now);
-- 2.0.0.rc2