Re: [Eas-dev] [RFC internal v2 4/4] sched: cpufreq_sched_cfs: PELT-based cpu frequency scaling

6 May 2015


      Hi Mike,
On 06/05/15 01:58, Mike Turquette wrote:
...
On Tue, May 5, 2015 at 3:12 AM, Juri Lelli juri.lelli@arm.com wrote:
...
Hi Ashwin,
On 04/05/15 14:41, Ashwin Chaugule wrote:
...
Hi Juri,
On 29 April 2015 at 05:39, Juri Lelli juri.lelli@arm.com wrote:
...
On 29/04/15 09:32, Michael Turquette wrote:
...
Quoting Juri Lelli (2015-04-28 10:48:27)
...
Hi Mike,
I apologize in advance for the long email, but I'd still want to share
with you today's thoughts :).
On 28/04/15 05:02, Michael Turquette wrote:
> Quoting Juri Lelli (2015-04-27 10:09:50)
[snip]
>>> +
>>> +       wake_up_process(gd->task);
>>
>> So, we always wake up the kthread, even when we know that we won't
>> need a freq change. This might be, I fear, an almost certain source of
>> reasonable complain and pushback. I understand that we might not want
>> to start optimizing things, but IMHO this point deserves some more
>> thought before posting. Don't you think we could do some level of
>> aggregation before kicking the kthread? In task_tick_fair(), for
>> example, we could just check if we are beyond the 25% threshold and kick
>> the kthread only in that case.
>
> This patch does not check against a threshold. It always requests a rate
> based on the current utilization plus 25%.
>
> On systems with discretized cpu frequencies (opps) we will often target
> the same opp, occasionally crossing the boundary into another opp. On
> systems with continuous cpu frequencies we will continually give
> ourselves "room to grow".
>
Can you make an example of such systems?
CPPC-based systems.
I thought a lot about all of the feedback that my v1 patchset got last
week on eas-dev. Two comments in particular colored my views on
supporting continuous frequency bands and not relying on a threshold.
First is Ashwins' comment here:
https://lists.linaro.org/pipermail/eas-dev/2015-April/000093.html
Second is Morten's reply here:
https://lists.linaro.org/pipermail/eas-dev/2015-April/000094.html
If we decide that we only care about opps then it is easy to create a
threshold for the opp "bucket" that we are currently in. But in a
continuous system creating a threshold is more difficult. E.g. if we
have decide to use an 80% threshold for a continuous system, we can
easily determine if our current utilization exceeds this threshold at
our current capacity/frequency. But what is the new frequency target?
Without a table to guide us we have to just make something up!
Right, but I'm still not sure that we still want to continuously adapt
to the current usage (plus the margin) as we might introduce too much
overhead. Also, is it really worthy when we have to activate all this
just to save a little more power or go a little more fast? This is
really blue sky, but maybe a trade-off would be to try to discretize
such systems (if it makes sense to control them from the scheduler).
Yes, we already have an activation threshold, but I'm not sure this is
enough.
IIUC, the optimization you're getting at is to suppress the CPU freq
requests when it falls within some range of the current OPP? I think
this may hamper certain latency sensitive workloads, since the freq
ramp up could be potentially slowed down. So, theres some merit in
making the request path as quick as possible and allow for continuous
adaptation. I need to look at your patches in more detail, but
eyeballing it seems like you're trying to achieve that.
So, the energy model (and please mind that the patches on top of Mike's
patchset don't have that yet) currently gives you these "capacity
bands". The idea is to try to adapt the OPP selection to the usage you
see on your CPU/cluster. Since the usage signal is subject to
saturation, what I'm trying to do is to avoid this condition by jumping
up to the max available OPP when we realize that we are going to
saturate a particular OPP. After we run for a small interval of time
(say a tick) at that max OPP we can better estimate the real usage and
directly select an OPP ("capacity band") that suits it.
I'm not sure about jumping to the max frequency when we detect that
the signal is saturated.
Ondemand has similar behavior to this and many vendors have
implemented out-of-tree solutions that do something like setting the
frequency to an "intermediate" rate (maybe 2/3 of the total
performance band) and then re-evaluate if they need to jump to max
performance after another sampling period.
So at some point you might face the same issue where vendors find this
approach too aggressive and wastes too much power, thus some
intermediate level will be introduced. I'm not providing you any
solutions here, but I'm saying that designing a policy algorithm that
works well for everyone is super hard.
No doubt about this :).
I got your point, but I guess it should be fairly easy to make this
freq at which we jump somewhat "configurable". Makes sense to me,
considering the variety of shapes power-perf curves can have, for
example.
...
...
I see your point, though. I think the two approaches differ for how we
get to the desired capacity: ramping up from bottom vs. selecting from
top.
...
From the energy model perspective, can a continuous performance band
be supported at all or is it a hard requirement to have a discretized
table?
I don't think it's a hard requirement (Morten or Dietmar may correct
me here), but just an abstraction of the systems we develop onto today.
I guess we would need to compute some formulas at run time, instead of
reading tabular values, if we want to have continuous performance
bands. Food for thought :).
We could also tablify continuous frequency domains based on some
reasonable factor like 50Mhz or something. I guess that factor could
even be supplied by the driver.
Agree. That's what I was thinking with "discretize continuous systems".
Best,
- Juri

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [RFC internal v2 4/4] sched: cpufreq_sched_cfs: PELT-based cpu frequency scaling