Re: [Eas-dev] cpufreq_sched policy for combining requests from multiple sched classes

15 Oct 2015


      On Thu, Oct 15, 2015 at 06:24:16PM +0200, Vincent Guittot wrote:
...
On 15 October 2015 at 18:10, Morten Rasmussen morten.rasmussen@arm.com wrote:
...
On Thu, Oct 15, 2015 at 10:24:27AM +0100, Patrick Bellasi wrote:
...
On Thu, Oct 15, 2015 at 09:23:58AM +0200, Vincent Guittot wrote:
...
On 14 October 2015 at 21:58, Steve Muckle steve.muckle@linaro.org wrote:
...
On 10/14/2015 01:58 AM, Patrick Bellasi wrote:
...
> From my experience race to idle has never panned out as an
> energy-efficient strategy, presumably due to the nonlinear increase in
> power cost as performance increases.
I agree with you that "race-to-idle" is not (always) a good
energy-efficient strategy. However, is the _main_ goal of sched-DVFS
to be energy-efficient?
I'd say the primary goal of sched-dvfs is to manage CPU frequency to
offer the required performance for a platform at the best possible
consumption of energy.
...
In this case, what should we do for platforms where the lower OPP are
less energy-efficient than some higher OPP?
We just discovered from some discussions at Connect that there are
many platforms adopting that strategy for certain different reasons.
If a lower OPP is less energy efficient than a higher one, I'd expect it
to be removed from the devicetree configuration of available frequencies
for the governor to choose from.
I agree on that point too and i think this has also been discussed on
LKML. Having a low OPP that is less power efficicent than an higher
one doesn't make any sense for both power and performance pov
Actually, I think it could make sense from a power standpoint.
For example, if you are under a thermal constraint but still want
to progress with your workload there are only two possibilities:
 a) throttle an energy-efficient OPP
 b) switch to a low-power OPP, even if less energy-efficient
    i.e. reduced the F without lowering the V
I have heard the same argument several times from thermal management
people. Despite their lower energy efficiency they still rely on their
lower power to stay within the thermal budget.
...
I'm personally more for the first solution and a suitable usage of
bandwidth control could be just enough to provide such a solution.
Agreed, using inefficient OPPs is not really desirable, but is the only
option available at the moment. I think that ideally, we should have
hardware implemented idle-injection (throttling) instead and let the
thermal framework specify the duty cycle. Throttling through software
could work, but it isn't feasible to do tricks like aligning the
throttling across all cpus in a cluster to enter a deeper idle-state and
make the throttling even more efficient.
Back to reality... We do have the choice to avoid those inefficient
state in the OPP table. We could hide them from schedDVFS/cpufreq
governors when the system isn't thermally constrained and only enable
them when strictly needed. I think somebody (Steve?) suggested recently
that we could do something like reordered the list OPPs to be ordered by
decreasing energy efficiency and always start the search for an
appropriate OPP from the the beginning. That way we should never pick
and inefficient one unless thermal contraints forces us to it. I think
that could be a good start.
Looks like we are aligned with the behavior :-)
Not sure what is meant by reordering the OPP list But i'm not sure
that we really need to order them. We just need to ensure that at each
moment, an enable OPP must be more power efficient that an OPP with
higher freq.
By default, we enable the OPP with max capacity wrt the efficiency
Then, we can let thermal manager to enable/disable some OPPs because
of thermal mitigation but it msut ensue that the previous rule is
always true
I think we are saying the same thing :-) We want to make sure we don't
pick inefficient OPPs unless were are forced to do it due to thermal
contraints. The rest is just implementation details.
It looks like we can't get rid of those OPPs anytime soon, so we better
not assume that aren't there and deal with them instead.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] cpufreq_sched policy for combining requests from multiple sched classes