Re: [Eas-dev] cpufreq_sched policy for combining requests from multiple sched classes

15 Oct 2015


      On 14 October 2015 at 21:58, Steve Muckle steve.muckle@linaro.org wrote:
...
On 10/14/2015 01:58 AM, Patrick Bellasi wrote:
...
...
From my experience race to idle has never panned out as an
energy-efficient strategy, presumably due to the nonlinear increase in
power cost as performance increases.
I agree with you that "race-to-idle" is not (always) a good
energy-efficient strategy. However, is the _main_ goal of sched-DVFS
to be energy-efficient?
I'd say the primary goal of sched-dvfs is to manage CPU frequency to
offer the required performance for a platform at the best possible
consumption of energy.
...
In this case, what should we do for platforms where the lower OPP are
less energy-efficient than some higher OPP?
We just discovered from some discussions at Connect that there are
many platforms adopting that strategy for certain different reasons.
If a lower OPP is less energy efficient than a higher one, I'd expect it
to be removed from the devicetree configuration of available frequencies
for the governor to choose from.
I agree on that point too and i think this has also been discussed on
LKML. Having a low OPP that is less power efficicent than an higher
one doesn't make any sense for both power and performance pov
...
...
IMHO one of the "main" goal of sched-DVFS is to contribute to provide
(as much as possible) deterministic behaviors. We have the chance to
refactor CPUFreq to better integrate with the scheduler and thus we
should try to exploit this opportunity to improve the overall
determines of the solution.
From this viewpoint, I think it's not so fare away from reality that,
if you schedule a task as FIFO or BATCH real-time, you care most about
Sorry I'm not sure what you mean by BATCH real-time - did you mean
SCHED_RR? I'm just aware of two real-time policies, FIFO and RR. AFAICS
BATCH is very similar to regular CFS.
...
latencies, or you _should_ care about latencies.
Specifically the time to completion of a task. If this is true the
race-to-idle is the only "deterministic" way to achieve such a goal.
I don't believe determinism is part of the semantics of the RT class
today. RT just offers the capability for strict prioritization of work.
Given that getting EAS/sched-dvfs accepted is such a herculean task I
think any semantic changes should be avoided at least until the
foundation is upstream and being used. Especially if they may have a
significant impact on energy or performance.
...
...
Because of this I think a policy of increasing the OPP when RT tasks
are runnable will cause a net increase in energy consumption,
I would argue that this is hard to define in general. We actually do
not know if running at a lower OPP could be more/less energy
efficient. It depends from many other (possibly external) factors,
e.g. OPP curves definition, interaction with I/O devices...
Quite sure instead we will increase power consumption.
Agreed it's hard to define or know for sure but in general for the
purposes of energy, I think it's fair to say that usually you should run
at the lowest OPP which meets the performance requirements of the
usecase. This assumes that OPPs which consume more or equal power to
others while providing less performance have been removed. The typical
device configuration out there today supports this conclusion IMO (usage
of ondemand/interactive governor).
...
But again, the goal of sched-DVFS is to be energy-efficient?
Partly yes, as energy-efficient as possible while satisfying the demand
for performance.
...
I think that this responsibility should be better assigned to other
players, i.e. scheduling classes.
I'd agree in as much as if a workload wants a strict determinism
guarantee it should migrate to SCHED_DEADLINE.
...
...
which need not be incurred since RT tasks do not
receive this preferential OPP treatment today.
Do they not receive such a preferred treatment just because CPUFreq as
always been completely decoupled from scheduler specific information?
If we use just the "average CPU idle time" to select the OPP once a
while, that's according to me the reason why FIFO/BATCH don't get a
specific treatment.
I think that on this specific point we should better get involved RT
guys and ask them if a race-to-idle strategy could better match their
expectations.
If you look at current implementation, we don't aks for max freq or a
specific freq as soon as rt task is involved but we use the cpufreq
governor policy as for any other task.
So we should keep the same behavior with sched-dvfs as a 1st step: rt
sched-class will provide is requirement according to current  RT task
load. Thenwe will see for some improvement but this fall back in a
policy and schedTune could probably help in this area so we can
"boost" RT class
...
It can be debated whether the limitations of CPUfreq established the
semantics of the RT class or vice-versa, but either way having RT affect
the OPP in this way would be a major semantic/policy change that will
almost certainly have significant repercussions in power profiling.
I agree that more broad discussion would be a good before going further.
Beyond just the RT guys though I think a community-wide discussion on
lkml and linux-pm would be appropriate.
...
Maybe I'm wrong but I have the impression that once you schedule a
task as FIFO/BATCH, sometimes you also need to "hack" into CPUFreq to
ensure a minimum OPP which allows to match your tasks demands in terms
of time-to-completion.
I've not seen this specific issue. The boosting I've seen is typically
associated with CFS tasks. RT tasks on the platforms I've worked with
are usually small enough that they can be satisfied regardless of the OPP.
...
Here the problem is that with the frameworks we have right now people
need to use/combine features of different frameworks to achieve their
goals. This sounds to me like something which could be improved,
provided that we start by splitting responsibility and let user know
which tool should be used to achieve a specific goal.
Specifically, if you care about responsiveness and energy-efficiency,
you should better use DEADLINE instead of FIFO/RR. While if you go for
this last class, than you should be aware that you get a race-to-idle
behavior, whatever this means from an energy/power standpoint.
If there's broad consensus that this semantic/policy change is what
folks want then I'm all for it, but I'd expect pushback.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] cpufreq_sched policy for combining requests from multiple sched classes