On 6/19/2013 10:00 AM, Morten Rasmussen wrote:
On Wed, Jun 19, 2013 at 04:39:39PM +0100, Arjan van de Ven wrote:
On 6/18/2013 10:47 AM, David Lang wrote:
It's bad enough trying to guess the needs of the processes, but if you also are reduced to guessing the capabilities of the cores, how can anything be made to work?
btw one way to look at this is to assume that (with some minimal hinting) the CPU driver will do the right thing and get you just about the best performance you can get (that is appropriate for the task at hand)... ... and don't do anything in the scheduler proactively.
If I understand correctly, you mean if your hardware/firmware is fully
hardware, firmware and the driver
in control of the p-state selection and changes it fast enough to match the current load, the scheduler doesn't have to care? By fast enough I mean, faster than the scheduler would notice if a cpu was temporarily overloaded at a low p-state. In that case, you wouldn't need cpufreq/p-state hints, and the scheduler would only move tasks between cpus when cpus are fully loaded at their max p-state.
with the migration hint, I'm pretty sure we'll be there today typically. we'll notice within 10 msec regardless, but the migration hint will take the edge of those 10 msec normally.
I would argue that the "at their max p-state" in your sentence needs to go away. since you don't know what you actually are except in hindsight. And even then you don't know if you could have gone higher or not.
the hints I have in mind are not all that complex; we have the biggest issues today around task migration (the task migrates to a cold cpu... so a simple notifier chain on the new cpu as it is accepting a task and we can bump it up), real time tasks (again, simple notifier chain to get you to a predictably high performance level) and we're a long way better than we are today in terms of actual problems.
For all the talk of ondemand (as ARM still uses that today)... that guy puts you in either the lowest or highest frequency over 95% of the time. Other non-cpufreq solutions like on Intel are bit more advanced (and will grow more so over time), but even there, in the grand scheme of things, the scheduler shouldn't have to care anymore with those two notifiers in place.
You would need more than a few hints to implement more advanced capacity management like proposed for the power scheduler. I believe that Intel would benefit as well from guiding the scheduler to idle the right cpu to enable deeper idle states and/or enable turbo-boost for other cpus.
that's an interesting theory. I've yet to see any way to actually have that do something useful.
yes there is some value in grouping a lot of very short tasks together. not a lot of value, but at least some.
and there is some value in the grouping within a package (to a degree) thing.
(both are basically "statistically, sort left" as policy)
more finegrained than that (esp tied to P states).. not so much.