On Tue, Dec 03, 2013 at 01:57:37PM +0530, Viresh Kumar wrote:
Hi Frederic/Kevin,
I was doing some work where I was required to use NO_HZ_FULL on core 1 on a dual core ARM machine.
I observed that I was able to isolate the second core using cpusets but whenever the tick occurs, it occurs twice. i.e. Timer count gets updated by two every time my core is disturbed.
I tried to trace it (output attached) and found this sequence (Talking only about core 1 here):
- Single task was running on Core 1 (using cpusets)
- got an arch_timer interrupt
- started servicing vmstat stuff
- so came out of NO_HZ_FULL domain as there is more than
one task on Core
- queued work again and went to the existing single task (stress)
- again got arch_timer interrupt after 5 ms (HZ=200)
Right, looking at the details, the 2nd interrupt is caused by workqueue delayed work bdi writeback.
- got "tick_stop" event and went into NO_HZ_FULL domain again..
- Got isolated again for long duration..
So the query is: why don't we check that at the end of servicing vmstat stuff and migrating back to "stress" ??
I fear I don't understand your question. Do you mean why don't we prevent from that bdi writeback work to run when we are in full dynticks mode?
We can't just ignore workqueues and timers callback when they are scheduled otherwise the kernel is going to behave randomly.
OTOH what we can do is to work on these per cpu workqueues and timers and do what's necessary to avoid them to fire, as explained in detail there Documentation/kernel-per-CPU-kthreads.txt
There is also the problem of unbound workqueues for which we don't have a solution yet. But the idea is that we could tweak their affinity from sysfs.
Thanks.
-- viresh