On Wed, Jun 19, 2013 at 06:08:29PM +0100, Arjan van de Ven wrote:
On 6/19/2013 10:00 AM, Morten Rasmussen wrote:
On Wed, Jun 19, 2013 at 04:39:39PM +0100, Arjan van de Ven wrote:
On 6/18/2013 10:47 AM, David Lang wrote:
It's bad enough trying to guess the needs of the processes, but if you also are reduced to guessing the capabilities of the cores, how can anything be made to work?
btw one way to look at this is to assume that (with some minimal hinting) the CPU driver will do the right thing and get you just about the best performance you can get (that is appropriate for the task at hand)... ... and don't do anything in the scheduler proactively.
If I understand correctly, you mean if your hardware/firmware is fully
hardware, firmware and the driver
in control of the p-state selection and changes it fast enough to match the current load, the scheduler doesn't have to care? By fast enough I mean, faster than the scheduler would notice if a cpu was temporarily overloaded at a low p-state. In that case, you wouldn't need cpufreq/p-state hints, and the scheduler would only move tasks between cpus when cpus are fully loaded at their max p-state.
with the migration hint, I'm pretty sure we'll be there today typically.
A hint when a task is moved to a new cpu is too late if the migration shouldn't have happened at all. If the scheduler knows that the cpu is able to switch to a higher p-state it can decide to wait for the p-state change instead of migrating the task and waking up another cpu.
we'll notice within 10 msec regardless, but the migration hint will take the edge of those 10 msec normally.
I'm not sure if 10 msec is fast enough for the scheduler to not notice. Real use-case studies will tell.
I would argue that the "at their max p-state" in your sentence needs to go away. since you don't know what you actually are except in hindsight. And even then you don't know if you could have gone higher or not.
Yes. What I meant was that if your p-state selection is responsive enough the scheduler would only see the cpu as overloaded when it is in its highest available p-state. That may determined dynamically by power, thermal, and other factors.
the hints I have in mind are not all that complex; we have the biggest issues today around task migration (the task migrates to a cold cpu... so a simple notifier chain on the new cpu as it is accepting a task and we can bump it up), real time tasks (again, simple notifier chain to get you to a predictably high performance level) and we're a long way better than we are today in terms of actual problems.
For all the talk of ondemand (as ARM still uses that today)... that guy puts you in either the lowest or highest frequency over 95% of the time. Other non-cpufreq solutions like on Intel are bit more advanced (and will grow more so over time), but even there, in the grand scheme of things, the scheduler shouldn't have to care anymore with those two notifiers in place.
You would need more than a few hints to implement more advanced capacity management like proposed for the power scheduler. I believe that Intel would benefit as well from guiding the scheduler to idle the right cpu to enable deeper idle states and/or enable turbo-boost for other cpus.
that's an interesting theory. I've yet to see any way to actually have that do something useful.
yes there is some value in grouping a lot of very short tasks together. not a lot of value, but at least some.
and there is some value in the grouping within a package (to a degree) thing.
(both are basically "statistically, sort left" as policy)
The proposed task packing patches have shown significant benefits for scenarios with many short tasks. This is a typical scenario on android.
Morten