On Sat, Jul 13, 2013 at 11:23:51AM +0100, Catalin Marinas wrote:
This looks like a userspace hotplug deamon approach lifted to kernel space :/
The difference is that this is faster. We even had hotplug in mind some years ago for big.LITTLE but it wouldn't give the performance we need (hotplug is incredibly slow even if driven from the kernel).
faster, slower, still horrid :-)
That's what we've been pushing for. From a big.LITTLE perspective, I would probably vote for Vincent's patches but I guess we could probably adapt any of the other options.
But then we got Ingo NAK'ing all these approaches. Taking the best bits from the current load balancing patches would create yet another set of patches which don't fall under Ingo's requirements (at least as I understand them).
Right, so Ingo is currently away as well -- should be back 'today' or tomorrow. But I suspect he mostly fell over the presentation.
I've never known Ingo to object to doing incremental development; in fact he often suggests doing so.
So don't present the packing thing as a power aware scheduler; that presentation suggests its the complete deal. Give instead a complete description of the problem; and tell how the current patch set fits into that and which aspect it solves; and that further patches will follow to sort the other issues.
That keeps the entire thing much clearer.
Then worry about power thingies.
To quote Ingo: "To create a new low level idle driver mechanism the scheduler could use and integrate proper power saving / idle policy into the scheduler."
That's unless we all agree (including Ingo) that the above requirement is orthogonal to task packing and, as a *separate* project, we look at better integrating the cpufreq/cpuidle with the scheduler, possibly with a new driver model and governors as libraries used by such drivers. In which case the current packing patches shouldn't be NAK'ed but reviewed so that they can be improved further or rewritten.
Right, so first thing would be to list all the thing that need doing:
- integrate idle guestimator - intergrate cpufreq stats - fix per entity runtime vs cpufreq - intrgrate/redo cpufreq - add packing features - {all the stuff I forgot}
Then see what is orthogonal and what is most important and get people to agree to an order. Then go..
I agree in general but there is the intel_pstate.c driver which has it's own separate statistics that the scheduler does not track.
Right, question is how much of that will survive Arjan next-gen effort.
We could move to invariant task load tracking which uses aperf/mperf (and could do similar things with perf counters on ARM). As I understand from Arjan, the new pstate driver will be different, so we don't know exactly what it requires.
Right, so part of the effort should be understanding what the various parties want/need. As far as I understand the Intel stuff, P states are basically useless and the only useful state to ever program is the max one -- although I'm sure Arjan will eventually explain how that is wrong :-)
We could do optional things; I'm not much for 'requiring' stuff that other arch simply cannot support, or only support at great effort/cost.
Stealing PMU counters for sched work would be crossing the line for me, that must be optional.