* Arjan van de Ven arjan@linux.intel.com wrote:
enumeration of idle states
how long it takes to enter+exit a particular idle state
[ perhaps information about how destructive to CPU caches that particular idle state is. ]
new driver entry point that allows the scheduler to enter any of the enumerated idle states. Platform code will not change this state, all policy decisions and the idle state is decided at the power saving policy level.
All of this combines into a 'cost to enter and exit an idle state' estimation plus a way to enter idle states. It should be presented to the scheduler in a platform independent fashion, but without policy embedded: a low level platform driver interface in essence.
you're missing an aspect.
Deeper idle states on one core, allow (on Intel and AMD at least) the other cores to go faster. So it's not so simple as "if I want more performance, go less deep". By going less deep you also reduce overall performance of the system... as well as increase the power usage.
This aspect really really cannot be ignored, it's quite significant today, and going forward is only going to get more and more significant.
I'm not missing turbo mode, just wanted to keep the above discussion simple. For turbo mode the "go for performance" constraints are simply different, more global. We have similar concerns in the scheduler already - for example system-global scheduling decisions for NUMA balancing.
Turbo mode in fact shows _why_ it's important to decide this on a higher, unified level to achieve best results: as the contraints and interdependencies become more complex it's not a simple CPU-local CPU-resource utilization decision anymore, but a system-wide one, where broad kinds of scheduling information is needed to make a good guess.
Thanks,
Ingo