Hi,
'active_bases' indicates which clock-base have active timers. While it is updated (almost) correctly, it is hardly used.
And so this is an attempt to improve the code that iterates over all clock-bases.
The first patch fixes a bug that only shows up after the second commit, and the second commit creates a macro for_each_active_base() and uses it at multiple places.
Viresh Kumar (2): hrtimer: update '->active_bases' before calling hrtimer_force_reprogram() hrtimer: create for_each_active_base() to iterate over active clock-bases
kernel/time/hrtimer.c | 58 ++++++++++++++++++++++++++++----------------------- 1 file changed, 32 insertions(+), 26 deletions(-)
'active_bases' indicates which clock-base have active timers. While it is updated (almost) correctly, it is hardly used. Next commit will start using it to make code more efficient, but before that we need to fix a problem.
While removing hrtimers, in __remove_hrtimer(): - We first remove the hrtimer from the queue. - Then reprogram clockevent device if required (hrtimer_force_reprogram()). - And then finally clear 'active_bases', if no more timers are pending on the current clock base (from which we are removing the hrtimer).
hrtimer_force_reprogram() needs to loop over all active clock bases to find the next expiry event, and while doing so it will use 'active_bases' (after next commit). And it will find the current base active, as we haven't cleared it until now, even if current clock base has no more hrtimers queued.
To fix this issue, clear active_bases before calling hrtimer_force_reprogram().
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/hrtimer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index bee0c1f78091..3152f327c988 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -879,6 +879,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
next_timer = timerqueue_getnext(&base->active); timerqueue_del(&base->active, &timer->node); + if (!timerqueue_getnext(&base->active)) + base->cpu_base->active_bases &= ~(1 << base->index); + if (&timer->node == next_timer) { #ifdef CONFIG_HIGH_RES_TIMERS /* Reprogram the clock event device. if enabled */ @@ -892,8 +895,6 @@ static void __remove_hrtimer(struct hrtimer *timer, } #endif } - if (!timerqueue_getnext(&base->active)) - base->cpu_base->active_bases &= ~(1 << base->index); out: timer->state = newstate; }
On Thu, Apr 02, 2015 at 04:21:21PM +0530, Viresh Kumar wrote:
'active_bases' indicates which clock-base have active timers. While it is updated (almost) correctly, it is hardly used. Next commit will start using it to make code more efficient, but before that we need to fix a problem.
While removing hrtimers, in __remove_hrtimer():
- We first remove the hrtimer from the queue.
- Then reprogram clockevent device if required (hrtimer_force_reprogram()).
- And then finally clear 'active_bases', if no more timers are pending on the current clock base (from which we are removing the hrtimer).
hrtimer_force_reprogram() needs to loop over all active clock bases to find the next expiry event, and while doing so it will use 'active_bases' (after next commit). And it will find the current base active, as we haven't cleared it until now, even if current clock base has no more hrtimers queued.
To fix this issue, clear active_bases before calling hrtimer_force_reprogram().
This is a small inefficiency right? Not an actual bug or anything.
On 2 April 2015 at 19:17, Peter Zijlstra peterz@infradead.org wrote:
On Thu, Apr 02, 2015 at 04:21:21PM +0530, Viresh Kumar wrote:
'active_bases' indicates which clock-base have active timers. While it is updated (almost) correctly, it is hardly used. Next commit will start using it to make code more efficient, but before that we need to fix a problem.
While removing hrtimers, in __remove_hrtimer():
- We first remove the hrtimer from the queue.
- Then reprogram clockevent device if required (hrtimer_force_reprogram()).
- And then finally clear 'active_bases', if no more timers are pending on the current clock base (from which we are removing the hrtimer).
hrtimer_force_reprogram() needs to loop over all active clock bases to find the next expiry event, and while doing so it will use 'active_bases' (after next commit). And it will find the current base active, as we haven't cleared it until now, even if current clock base has no more hrtimers queued.
To fix this issue, clear active_bases before calling hrtimer_force_reprogram().
This is a small inefficiency right? Not an actual bug or anything.
So, what's explained in this patch is a BUG, which isn't harming us today.
But overall both patches combined are about improving on the small inefficiency :)
Sorry for the political answer :)
On Thu, Apr 02, 2015 at 07:23:31PM +0530, Viresh Kumar wrote:
On 2 April 2015 at 19:17, Peter Zijlstra peterz@infradead.org wrote:
On Thu, Apr 02, 2015 at 04:21:21PM +0530, Viresh Kumar wrote:
'active_bases' indicates which clock-base have active timers. While it is updated (almost) correctly, it is hardly used. Next commit will start using it to make code more efficient, but before that we need to fix a problem.
While removing hrtimers, in __remove_hrtimer():
- We first remove the hrtimer from the queue.
- Then reprogram clockevent device if required (hrtimer_force_reprogram()).
- And then finally clear 'active_bases', if no more timers are pending on the current clock base (from which we are removing the hrtimer).
hrtimer_force_reprogram() needs to loop over all active clock bases to find the next expiry event, and while doing so it will use 'active_bases' (after next commit). And it will find the current base active, as we haven't cleared it until now, even if current clock base has no more hrtimers queued.
To fix this issue, clear active_bases before calling hrtimer_force_reprogram().
This is a small inefficiency right? Not an actual bug or anything.
So, what's explained in this patch is a BUG, which isn't harming us today.
So then I'm not seeing how its a bug. Sure __hrtimer_get_next_event() will iterate all the bases again, and it will not skip the just empty one. But I don't see how that is anything but an inefficiency. By virtue of the base being empty it cannot find an event there, so its a pointless check.
What am I missing?
On 2 April 2015 at 19:46, Peter Zijlstra peterz@infradead.org wrote:
So then I'm not seeing how its a bug. Sure __hrtimer_get_next_event() will iterate all the bases again, and it will not skip the just empty one. But I don't see how that is anything but an inefficiency. By virtue of the base being empty it cannot find an event there, so its a pointless check.
What am I missing?
Hmm. It was a bug for me because I was doing this unconditionally: timer = container_of(timerqueue_getnext(&base->active), + struct hrtimer, node);
And this will give a container-of over NULL, as timerqueue_getnext() can return NULL..
And so it will crash in my case.
But I understand your point, its inefficiency only :(
At several instances we iterate over all possible clock-bases for a particular cpu-base. Whereas, we only need to iterate over active bases.
We already have per cpu-base 'active_bases' field, which is updated on addition/removal of hrtimers.
This patch creates for_each_active_base(), which uses 'active_bases' to iterate only over active bases.
This also updates code which iterates over clock-bases.
Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/hrtimer.c | 53 ++++++++++++++++++++++++++++----------------------- 1 file changed, 29 insertions(+), 24 deletions(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 3152f327c988..dad8274623a0 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -111,6 +111,19 @@ static inline int hrtimer_clockid_to_base(clockid_t clock_id)
/* + * for_each_active_base: iterate over all active clock bases + * @_index: 'int' variable for internal purpose + * @_base: holds pointer to a active clock base + * @_cpu_base: cpu base to iterate on + * @_active_bases: 'unsigned int' variable for internal purpose + */ +#define for_each_active_base(_index, _base, _cpu_base, _active_bases) \ + for ((_active_bases) = (_cpu_base)->active_bases; \ + (_index) = ffs(_active_bases), \ + (_base) = (_cpu_base)->clock_base + (_index) - 1, (_index); \ + (_active_bases) &= ~(1 << ((_index) - 1))) + +/* * Get the coarse grained time at the softirq based on xtime and * wall_to_monotonic. */ @@ -443,19 +456,15 @@ static inline void debug_deactivate(struct hrtimer *timer) #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS) static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) { - struct hrtimer_clock_base *base = cpu_base->clock_base; + struct hrtimer_clock_base *base; ktime_t expires, expires_next = { .tv64 = KTIME_MAX }; + struct hrtimer *timer; + unsigned int active_bases; int i;
- for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++, base++) { - struct timerqueue_node *next; - struct hrtimer *timer; - - next = timerqueue_getnext(&base->active); - if (!next) - continue; - - timer = container_of(next, struct hrtimer, node); + for_each_active_base(i, base, cpu_base, active_bases) { + timer = container_of(timerqueue_getnext(&base->active), + struct hrtimer, node); expires = ktime_sub(hrtimer_get_expires(timer), base->offset); if (expires.tv64 < expires_next.tv64) expires_next = expires; @@ -1245,6 +1254,8 @@ void hrtimer_interrupt(struct clock_event_device *dev) { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); ktime_t expires_next, now, entry_time, delta; + struct hrtimer_clock_base *base; + unsigned int active_bases; int i, retries = 0;
BUG_ON(!cpu_base->hres_active); @@ -1264,15 +1275,10 @@ void hrtimer_interrupt(struct clock_event_device *dev) */ cpu_base->expires_next.tv64 = KTIME_MAX;
- for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) { - struct hrtimer_clock_base *base; + for_each_active_base(i, base, cpu_base, active_bases) { struct timerqueue_node *node; ktime_t basenow;
- if (!(cpu_base->active_bases & (1 << i))) - continue; - - base = cpu_base->clock_base + i; basenow = ktime_add(now, base->offset);
while ((node = timerqueue_getnext(&base->active))) { @@ -1435,16 +1441,13 @@ void hrtimer_run_queues(void) struct timerqueue_node *node; struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); struct hrtimer_clock_base *base; + unsigned int active_bases; int index, gettime = 1;
if (hrtimer_hres_active()) return;
- for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) { - base = &cpu_base->clock_base[index]; - if (!timerqueue_getnext(&base->active)) - continue; - + for_each_active_base(index, base, cpu_base, active_bases) { if (gettime) { hrtimer_get_softirq_time(cpu_base); gettime = 0; @@ -1665,6 +1668,8 @@ static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base, static void migrate_hrtimers(int scpu) { struct hrtimer_cpu_base *old_base, *new_base; + struct hrtimer_clock_base *clock_base; + unsigned int active_bases; int i;
BUG_ON(cpu_online(scpu)); @@ -1680,9 +1685,9 @@ static void migrate_hrtimers(int scpu) raw_spin_lock(&new_base->lock); raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
- for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) { - migrate_hrtimer_list(&old_base->clock_base[i], - &new_base->clock_base[i]); + for_each_active_base(i, clock_base, old_base, active_bases) { + migrate_hrtimer_list(clock_base, + &new_base->clock_base[clock_base->index]); }
raw_spin_unlock(&old_base->lock);
On Thu, Apr 02, 2015 at 04:21:22PM +0530, Viresh Kumar wrote:
+#define for_each_active_base(_index, _base, _cpu_base, _active_bases) \
- for ((_active_bases) = (_cpu_base)->active_bases; \
(_index) = ffs(_active_bases), \
(_base) = (_cpu_base)->clock_base + (_index) - 1, (_index); \
(_active_bases) &= ~(1 << ((_index) - 1)))
Can't use ffs here, some people end up using asm-generic/bitops/ffs.h and that sucks.
Esp for small vectors like here, the unconditional iteration is faster.
On 2 April 2015 at 19:15, Peter Zijlstra peterz@infradead.org wrote:
On Thu, Apr 02, 2015 at 04:21:22PM +0530, Viresh Kumar wrote:
+#define for_each_active_base(_index, _base, _cpu_base, _active_bases) \
for ((_active_bases) = (_cpu_base)->active_bases; \
(_index) = ffs(_active_bases), \
(_base) = (_cpu_base)->clock_base + (_index) - 1, (_index); \
(_active_bases) &= ~(1 << ((_index) - 1)))
Can't use ffs here, some people end up using asm-generic/bitops/ffs.h and that sucks.
Esp for small vectors like here, the unconditional iteration is faster.
Okay what about this instead (This is the best I could write :).) ?
+static inline int __next_bit(unsigned int active_bases, int bit) +{ + do { + if (active_bases & (1 << bit)) + return bit; + } while (++bit < HRTIMER_MAX_CLOCK_BASES); + + /* We should never reach here */ + return 0; +} +/* + * for_each_active_base: iterate over all active clock bases + * @_bit: 'int' variable for internal purpose + * @_base: holds pointer to a active clock base + * @_cpu_base: cpu base to iterate on + * @_active_bases: 'unsigned int' variable for internal purpose + */ +#define for_each_active_base(_bit, _base, _cpu_base, _active_bases) \ + for ((_active_bases) = (_cpu_base)->active_bases, (_bit) = -1; \ + (_active_bases) && \ + ((_bit) = __next_bit(_active_bases, ++_bit), \ + (_base) = (_cpu_base)->clock_base + _bit); \ + (_active_bases) &= ~(1 << (_bit))) +
Tested it well with the help of: http://pastebin.com/cYyB513D, with inputs from 0 to 15.
I will send it formally if it looks fine to you ..
linaro-kernel@lists.linaro.org