On Thu, Apr 09, 2015 at 09:09:17AM +0200, Ingo Molnar wrote:
- Peter Zijlstra peterz@infradead.org wrote:
On Thu, Apr 09, 2015 at 08:28:41AM +0200, Ingo Molnar wrote:
Btw., does cpu_base->active_bases even make sense? hrtimer bases are fundamentally percpu, and to check whether there are any pending timers is a very simple check:
base->active->next != NULL
Yeah, that's 3 pointer dereferences from cpu_base, iow you traded a single bit test on an already loaded word for 3 potential cacheline misses.
But the clock bases are not aligned to cachelines, and we have 4 of them. So in practice when we access one, we'll load the next one anyway.
$ pahole -C hrtimer_clock_base defconfig-build/kernel/time/timer.o struct hrtimer_clock_base { struct hrtimer_cpu_base * cpu_base; /* 0 8 */ int index; /* 8 4 */ clockid_t clockid; /* 12 4 */ struct timerqueue_head active; /* 16 16 */ ktime_t resolution; /* 32 8 */ ktime_t (*get_time)(void); /* 40 8 */ ktime_t softirq_time; /* 48 8 */ ktime_t offset; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */
/* size: 64, cachelines: 1, members: 8 */ };
They _should_ be aligned :-)
Furthermore the simplification is measurable, and a fair bit of it is in various fast paths. I'd rather trade a bit of a cacheline footprint for less overall complexity and faster code.
cacheline misses hurt a lot, and the bitmask isn't really complex.