On Fri, 2014-10-03 at 09:50 +0200, Peter Zijlstra wrote:
On Fri, Oct 03, 2014 at 08:23:04AM +0200, Mike Galbraith wrote:
A generic boo hiss aimed in the general direction of all of this let's go look at every possibility on every wakeup stuff. Less is more.
I hear you, can you see actual slowdown with the patch? While the worst case doesn't change, it does make the average case equal to the worst case iteration -- where we previously would average out at inspecting half the CPUs before finding an idle one, we'd now always inspect all of them in order to compare all idle ones on their properties.
Also, with the latest generation of Haswell Xeons having 18 cores (36 threads) this is one massively painful loop for sure.
Yeah, the things are getting too damn big. I didn't try the patch and measure anything, my gut instantly said "nope, not worth it".
I'm just not sure what to do about it.. I suppose we can artificially split it into smaller groups, but I bet that'll hurt some, but if we can show it gains more we might still be able to do it. The only real problem is actual numbers/workloads (isn't it always) :/
One thing I suppose we could try is keeping a 'busy' flag at the llc domain which is set when all CPUs are busy (we'll clear it from new_idle) that way we can avoid the entire iteration if we know its pointless.
On one of those huge packages, heck, even on a 8 core that could save a substantial number of busy box cycles.
-Mike