On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
My understanding from the recent discussions is that the scheduler should decide directly on the C-state (or rather the deepest C-state possible since we don't want to duplicate the backend logic for synchronising CPUs going up or down). This means that the scheduler needs to know about C-state target residency, wake-up latency (I think we can leave coupled C-states to the backend, there is some complex synchronisation which I wouldn't duplicate).
Alternatively (my preferred approach), we get the scheduler to predict and pass the expected residency and latency requirements down to a power driver and read back the actual C-states for making task placement decisions. Some of the menu governor prediction logic could be turned into a library and used by the scheduler. Basically what this tries to achieve is better scheduler awareness of the current C-states decided by a cpuidle/power driver based on the scheduler constraints.
Ah yes.. so I _think_ the scheduler wants to eventually know about idle topology constraints. But we can get there in a gradual fashion I hope.
Like the package C states on x86 -- for those to be effective the scheduler needs to pack tasks and keep entire packages idle for as long as possible.