Re: [Eas-dev] cpufreq_sched policy for combining requests from multiple sched classes

15 Oct 2015


      On Wed, Oct 14, 2015 at 12:58:31PM -0700, Steve Muckle wrote:
...
On 10/14/2015 01:58 AM, Patrick Bellasi wrote:
...
...
From my experience race to idle has never panned out as an
energy-efficient strategy, presumably due to the nonlinear increase in
power cost as performance increases.
I agree with you that "race-to-idle" is not (always) a good
energy-efficient strategy. However, is the _main_ goal of sched-DVFS
to be energy-efficient?
I'd say the primary goal of sched-dvfs is to manage CPU frequency to
offer the required performance for a platform at the best possible
consumption of energy.
...
In this case, what should we do for platforms where the lower OPP are
less energy-efficient than some higher OPP?
We just discovered from some discussions at Connect that there are
many platforms adopting that strategy for certain different reasons.
If a lower OPP is less energy efficient than a higher one, I'd expect it
to be removed from the devicetree configuration of available frequencies
for the governor to choose from.
I personally have the same view, but talking with some partners at
Connect I had the impression that this is not always possible.
...
...
IMHO one of the "main" goal of sched-DVFS is to contribute to provide
(as much as possible) deterministic behaviors. We have the chance to
refactor CPUFreq to better integrate with the scheduler and thus we
should try to exploit this opportunity to improve the overall
determines of the solution.
From this viewpoint, I think it's not so fare away from reality that,
if you schedule a task as FIFO or BATCH real-time, you care most about
Sorry I'm not sure what you mean by BATCH real-time - did you mean
SCHED_RR? I'm just aware of two real-time policies, FIFO and RR. AFAICS
BATCH is very similar to regular CFS.
You right, actually I mean FIFO and RR.
...
...
latencies, or you _should_ care about latencies.
Specifically the time to completion of a task. If this is true the
race-to-idle is the only "deterministic" way to achieve such a goal.
I don't believe determinism is part of the semantics of the RT class
today. RT just offers the capability for strict prioritization of work.
Right, "determinism" is not the proper term.
My focus is on "time-to-completion" (TtC), what I was meaning is that the
shortest TtC can be achieved just by running at the highest OPP.
However I agree whit you that this semantics is not clearly stated
today for FIFO/RR tasks. I'm just wondering if it could make sense.
...
Given that getting EAS/sched-dvfs accepted is such a herculean task I
think any semantic changes should be avoided at least until the
foundation is upstream and being used. Especially if they may have a
significant impact on energy or performance.
Ok, I agree on that tactics. However, I'm still more on the idea that
a long term strategy should be that to better define the role of each
scheduling class from an performance/power standpoint too.
...
...
...
Because of this I think a policy of increasing the OPP when RT tasks
are runnable will cause a net increase in energy consumption,
I would argue that this is hard to define in general. We actually do
not know if running at a lower OPP could be more/less energy
efficient. It depends from many other (possibly external) factors,
e.g. OPP curves definition, interaction with I/O devices...
Quite sure instead we will increase power consumption.
Agreed it's hard to define or know for sure but in general for the
purposes of energy, I think it's fair to say that usually you should run
at the lowest OPP which meets the performance requirements of the
usecase. This assumes that OPPs which consume more or equal power to
others while providing less performance have been removed. The typical
device configuration out there today supports this conclusion IMO (usage
of ondemand/interactive governor).
Maybe I'm wrong but, as already told, at Connect I had the impression
that this assumption is not always true for reasons that sometimes
they cannot or don't want to explain.
...
...
But again, the goal of sched-DVFS is to be energy-efficient?
Partly yes, as energy-efficient as possible while satisfying the demand
for performance.
...
I think that this responsibility should be better assigned to other
players, i.e. scheduling classes.
I'd agree in as much as if a workload wants a strict determinism
guarantee it should migrate to SCHED_DEADLINE.
...
...
which need not be incurred since RT tasks do not
receive this preferential OPP treatment today.
Do they not receive such a preferred treatment just because CPUFreq as
always been completely decoupled from scheduler specific information?
If we use just the "average CPU idle time" to select the OPP once a
while, that's according to me the reason why FIFO/BATCH don't get a
specific treatment.
I think that on this specific point we should better get involved RT
guys and ask them if a race-to-idle strategy could better match their
expectations.
It can be debated whether the limitations of CPUfreq established the
semantics of the RT class or vice-versa, but either way having RT affect
the OPP in this way would be a major semantic/policy change that will
almost certainly have significant repercussions in power profiling.
I agree that more broad discussion would be a good before going further.
Beyond just the RT guys though I think a community-wide discussion on
lkml and linux-pm would be appropriate.
Right, any specific proposal?
...
...
Maybe I'm wrong but I have the impression that once you schedule a
task as FIFO/BATCH, sometimes you also need to "hack" into CPUFreq to
ensure a minimum OPP which allows to match your tasks demands in terms
of time-to-completion.
I've not seen this specific issue. The boosting I've seen is typically
associated with CFS tasks. RT tasks on the platforms I've worked with
are usually small enough that they can be satisfied regardless of the OPP.
Ok, that's an interesting point. If this is the general use-case for
FIFO/RT tasks, i.e. they can always be "completed fast enough" by
running at the lowest OPP, then asking sched-DVFS for the cumulative
RT load should always work.
...
...
Here the problem is that with the frameworks we have right now people
need to use/combine features of different frameworks to achieve their
goals. This sounds to me like something which could be improved,
provided that we start by splitting responsibility and let user know
which tool should be used to achieve a specific goal.
Specifically, if you care about responsiveness and energy-efficiency,
you should better use DEADLINE instead of FIFO/RR. While if you go for
this last class, than you should be aware that you get a race-to-idle
behavior, whatever this means from an energy/power standpoint.
If there's broad consensus that this semantic/policy change is what
folks want then I'm all for it, but I'd expect pushback.
Ok, so let's have this discussion on the list perhaps once we have the
first initial rework which integrate FIFO/RR without changing the
current semitic.
Cheers Patrick
-- 
#include <best/regards.h>

Patrick Bellasi

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] cpufreq_sched policy for combining requests from multiple sched classes