On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote:
On Fri, Mar 27, 2015 at 10:19:54AM +0530, Viresh Kumar wrote:
On 27 March 2015 at 01:48, Andrew Morton akpm@linux-foundation.org wrote:
Shouldn't this be viewed as a shortcoming of the core timer code?
Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but they are rejected for obviously reasons [1].
vmstat_shepherd() is merely rescheduling itself with schedule_delayed_work(). That's a dead bog simple operation and if it's producing suboptimal behaviour then we shouldn't be fixing it with elaborate workarounds in the caller?
I understand that, and that's why I sent it as an RFC to get the discussion started. Does anyone else have got another (acceptable) idea to get this resolved ?
So the issue seems to be that we need base->running_timer in order to tell if a callback is running, right?
We could align the base on 8 bytes to gain an extra bit in the pointer and use that bit to indicate the running state. Then these sites can spin on that bit while we can change the actual base pointer.
Even though tvec_base has ____cacheline_aligned stuck on, most are allocated using kzalloc_node() which does not actually respect that but already guarantees a minimum u64 alignment, so I think we can use that third bit without too much magic.