On Wed, 2015-04-22 at 17:29 +0200, Peter Zijlstra wrote:
Hmm, that sounds unfortunate, this would wreck life for the power aware laptop/tablet etc.. people.
There is already a sysctl to manage this, is that not enough to mitigate this problem on the server side of things?
The thing is : 99% of networking timers never fire.
But when they _do_ fire, and host is under attack, they all fire on unrelated cpu and this one can not keep up.
Added latencies fire monitoring alerts.
Check commit 4a8e320c929991c9480 ("net: sched: use pinned timers") for a specific example of the problems that can be raised.
When we set a timer to fire in 10 seconds, knowing the _current_ idle state for cpus is of no help.
Add to this that softirq processing is not considered as making current cpu as non idle.
networking tried hard to use cpu affinities (and all techniques described in Documentation/networking/scaling.txt), but /proc/sys/kernel/timer_migration adds a fair overhead in many workloads.
get_nohz_timer_target() has to touch 3 cache lines per cpu...
Its in the top 10 in "perf top" profiles on servers with 72 threads.
This /proc/sys/kernel/timer_migration should have been instead :
/proc/sys/kernel/timer_on_a_single_cpu_for_laptop_sake