On Wed, Oct 14, 2015 at 12:58:31PM -0700, Steve Muckle wrote:
On 10/14/2015 01:58 AM, Patrick Bellasi wrote:
From my experience race to idle has never panned out as an energy-efficient strategy, presumably due to the nonlinear increase in power cost as performance increases.
I agree with you that "race-to-idle" is not (always) a good energy-efficient strategy. However, is the _main_ goal of sched-DVFS to be energy-efficient?
I'd say the primary goal of sched-dvfs is to manage CPU frequency to offer the required performance for a platform at the best possible consumption of energy.
In this case, what should we do for platforms where the lower OPP are less energy-efficient than some higher OPP? We just discovered from some discussions at Connect that there are many platforms adopting that strategy for certain different reasons.
If a lower OPP is less energy efficient than a higher one, I'd expect it to be removed from the devicetree configuration of available frequencies for the governor to choose from.
I personally have the same view, but talking with some partners at Connect I had the impression that this is not always possible.
IMHO one of the "main" goal of sched-DVFS is to contribute to provide (as much as possible) deterministic behaviors. We have the chance to refactor CPUFreq to better integrate with the scheduler and thus we should try to exploit this opportunity to improve the overall determines of the solution.
From this viewpoint, I think it's not so fare away from reality that, if you schedule a task as FIFO or BATCH real-time, you care most about
Sorry I'm not sure what you mean by BATCH real-time - did you mean SCHED_RR? I'm just aware of two real-time policies, FIFO and RR. AFAICS BATCH is very similar to regular CFS.
You right, actually I mean FIFO and RR.
latencies, or you _should_ care about latencies. Specifically the time to completion of a task. If this is true the race-to-idle is the only "deterministic" way to achieve such a goal.
I don't believe determinism is part of the semantics of the RT class today. RT just offers the capability for strict prioritization of work.
Right, "determinism" is not the proper term. My focus is on "time-to-completion" (TtC), what I was meaning is that the shortest TtC can be achieved just by running at the highest OPP.
However I agree whit you that this semantics is not clearly stated today for FIFO/RR tasks. I'm just wondering if it could make sense.
Given that getting EAS/sched-dvfs accepted is such a herculean task I think any semantic changes should be avoided at least until the foundation is upstream and being used. Especially if they may have a significant impact on energy or performance.
Ok, I agree on that tactics. However, I'm still more on the idea that a long term strategy should be that to better define the role of each scheduling class from an performance/power standpoint too.
Because of this I think a policy of increasing the OPP when RT tasks are runnable will cause a net increase in energy consumption,
I would argue that this is hard to define in general. We actually do not know if running at a lower OPP could be more/less energy efficient. It depends from many other (possibly external) factors, e.g. OPP curves definition, interaction with I/O devices... Quite sure instead we will increase power consumption.
Agreed it's hard to define or know for sure but in general for the purposes of energy, I think it's fair to say that usually you should run at the lowest OPP which meets the performance requirements of the usecase. This assumes that OPPs which consume more or equal power to others while providing less performance have been removed. The typical device configuration out there today supports this conclusion IMO (usage of ondemand/interactive governor).
Maybe I'm wrong but, as already told, at Connect I had the impression that this assumption is not always true for reasons that sometimes they cannot or don't want to explain.
But again, the goal of sched-DVFS is to be energy-efficient?
Partly yes, as energy-efficient as possible while satisfying the demand for performance.
I think that this responsibility should be better assigned to other players, i.e. scheduling classes.
I'd agree in as much as if a workload wants a strict determinism guarantee it should migrate to SCHED_DEADLINE.
which need not be incurred since RT tasks do not receive this preferential OPP treatment today.
Do they not receive such a preferred treatment just because CPUFreq as always been completely decoupled from scheduler specific information? If we use just the "average CPU idle time" to select the OPP once a while, that's according to me the reason why FIFO/BATCH don't get a specific treatment.
I think that on this specific point we should better get involved RT guys and ask them if a race-to-idle strategy could better match their expectations.
It can be debated whether the limitations of CPUfreq established the semantics of the RT class or vice-versa, but either way having RT affect the OPP in this way would be a major semantic/policy change that will almost certainly have significant repercussions in power profiling.
I agree that more broad discussion would be a good before going further. Beyond just the RT guys though I think a community-wide discussion on lkml and linux-pm would be appropriate.
Right, any specific proposal?
Maybe I'm wrong but I have the impression that once you schedule a task as FIFO/BATCH, sometimes you also need to "hack" into CPUFreq to ensure a minimum OPP which allows to match your tasks demands in terms of time-to-completion.
I've not seen this specific issue. The boosting I've seen is typically associated with CFS tasks. RT tasks on the platforms I've worked with are usually small enough that they can be satisfied regardless of the OPP.
Ok, that's an interesting point. If this is the general use-case for FIFO/RT tasks, i.e. they can always be "completed fast enough" by running at the lowest OPP, then asking sched-DVFS for the cumulative RT load should always work.
Here the problem is that with the frameworks we have right now people need to use/combine features of different frameworks to achieve their goals. This sounds to me like something which could be improved, provided that we start by splitting responsibility and let user know which tool should be used to achieve a specific goal.
Specifically, if you care about responsiveness and energy-efficiency, you should better use DEADLINE instead of FIFO/RR. While if you go for this last class, than you should be aware that you get a race-to-idle behavior, whatever this means from an energy/power standpoint.
If there's broad consensus that this semantic/policy change is what folks want then I'm all for it, but I'd expect pushback.
Ok, so let's have this discussion on the list perhaps once we have the first initial rework which integrate FIFO/RR without changing the current semitic.
Cheers Patrick