On Fri, Oct 03, 2014 at 11:37:31AM -0400, Rik van Riel wrote:
Some more brainstorming points...
We should probably (lazily/batched?) propagate load information up the sched_group tree. This will be useful for wake_affine, load_balancing, find_idlest_cpu, and select_idle_sibling
With both find_idlest_cpu and select_idle_sibling walking down the tree from the LLC level, they could probably share code
Counting both blocked and runnable load may give better long term stability of loads, resulting in a reduction in work preserving behaviour, but an improvement in locality - this could be worthwhile, but it is hard to say in advance
We can be pretty sure that CPU makers are not going to stop at a mere 18 cores. We need to subdivide things below the LLC level, turning select_idle_sibling and find_idlest_cpu into a tree walk.
This means whatever selection criteria are used by these need to be propagated up the sched_group tree. This, in turn, means we probably need to restrict ourselves to things that do not get changed/updated too often.
Am I overlooking anything?
Well, we can certainly try something like that; but your last point seems like a contradition; seeing how _the_ important point for select_idle_sibling() is the actual idle state, and that per definition is something that can change/update often.
But yes, the only viable option is some artificial breakup of the topology and we can indeed try and bridge the gap with some caching.