* Ingo Molnar mingo@kernel.org wrote:
- Peter Zijlstra peterz@infradead.org wrote:
On Thu, Apr 09, 2015 at 08:28:41AM +0200, Ingo Molnar wrote:
Btw., does cpu_base->active_bases even make sense? hrtimer bases are fundamentally percpu, and to check whether there are any pending timers is a very simple check:
base->active->next != NULL
Yeah, that's 3 pointer dereferences from cpu_base, iow you traded a single bit test on an already loaded word for 3 potential cacheline misses.
But the clock bases are not aligned to cachelines, and we have 4 of them. So in practice when we access one, we'll load the next one anyway.
Furthermore the simplification is measurable, and a fair bit of it is in various fast paths. I'd rather trade a bit of a cacheline footprint for less overall complexity and faster code.
Plus, look at this code in hrtimer_run_queues():
for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) { base = &cpu_base->clock_base[index]; if (!base->active.next) continue;
if (gettime) { hrtimer_get_softirq_time(cpu_base); gettime = 0; }
if at least one base is active (on my fairly standard system all cpus have at least one active hrtimer base all the time - and many cpus have two bases active), then we run hrtimer_get_softirq_time(), which dirties the cachelines of all 4 clock bases:
base->clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim; base->clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono; base->clock_base[HRTIMER_BASE_BOOTTIME].softirq_time = boot; base->clock_base[HRTIMER_BASE_TAI].softirq_time = tai;
so in practice we not only touch every cacheline in every timer interrupt, but we _dirty_ them, even the inactive ones.
So I'd strongly argue in favor of this patch series of simplification: it makes the code simpler and faster, and won't impact cache footprint in practice.
Thanks,
Ingo