On 14 October 2015 at 21:58, Steve Muckle steve.muckle@linaro.org wrote:
On 10/14/2015 01:58 AM, Patrick Bellasi wrote:
From my experience race to idle has never panned out as an energy-efficient strategy, presumably due to the nonlinear increase in power cost as performance increases.
I agree with you that "race-to-idle" is not (always) a good energy-efficient strategy. However, is the _main_ goal of sched-DVFS to be energy-efficient?
I'd say the primary goal of sched-dvfs is to manage CPU frequency to offer the required performance for a platform at the best possible consumption of energy.
In this case, what should we do for platforms where the lower OPP are less energy-efficient than some higher OPP? We just discovered from some discussions at Connect that there are many platforms adopting that strategy for certain different reasons.
If a lower OPP is less energy efficient than a higher one, I'd expect it to be removed from the devicetree configuration of available frequencies for the governor to choose from.
I agree on that point too and i think this has also been discussed on LKML. Having a low OPP that is less power efficicent than an higher one doesn't make any sense for both power and performance pov
IMHO one of the "main" goal of sched-DVFS is to contribute to provide (as much as possible) deterministic behaviors. We have the chance to refactor CPUFreq to better integrate with the scheduler and thus we should try to exploit this opportunity to improve the overall determines of the solution.
From this viewpoint, I think it's not so fare away from reality that, if you schedule a task as FIFO or BATCH real-time, you care most about
Sorry I'm not sure what you mean by BATCH real-time - did you mean SCHED_RR? I'm just aware of two real-time policies, FIFO and RR. AFAICS BATCH is very similar to regular CFS.
latencies, or you _should_ care about latencies. Specifically the time to completion of a task. If this is true the race-to-idle is the only "deterministic" way to achieve such a goal.
I don't believe determinism is part of the semantics of the RT class today. RT just offers the capability for strict prioritization of work.
Given that getting EAS/sched-dvfs accepted is such a herculean task I think any semantic changes should be avoided at least until the foundation is upstream and being used. Especially if they may have a significant impact on energy or performance.
Because of this I think a policy of increasing the OPP when RT tasks are runnable will cause a net increase in energy consumption,
I would argue that this is hard to define in general. We actually do not know if running at a lower OPP could be more/less energy efficient. It depends from many other (possibly external) factors, e.g. OPP curves definition, interaction with I/O devices... Quite sure instead we will increase power consumption.
Agreed it's hard to define or know for sure but in general for the purposes of energy, I think it's fair to say that usually you should run at the lowest OPP which meets the performance requirements of the usecase. This assumes that OPPs which consume more or equal power to others while providing less performance have been removed. The typical device configuration out there today supports this conclusion IMO (usage of ondemand/interactive governor).
But again, the goal of sched-DVFS is to be energy-efficient?
Partly yes, as energy-efficient as possible while satisfying the demand for performance.
I think that this responsibility should be better assigned to other players, i.e. scheduling classes.
I'd agree in as much as if a workload wants a strict determinism guarantee it should migrate to SCHED_DEADLINE.
which need not be incurred since RT tasks do not receive this preferential OPP treatment today.
Do they not receive such a preferred treatment just because CPUFreq as always been completely decoupled from scheduler specific information? If we use just the "average CPU idle time" to select the OPP once a while, that's according to me the reason why FIFO/BATCH don't get a specific treatment.
I think that on this specific point we should better get involved RT guys and ask them if a race-to-idle strategy could better match their expectations.
If you look at current implementation, we don't aks for max freq or a specific freq as soon as rt task is involved but we use the cpufreq governor policy as for any other task. So we should keep the same behavior with sched-dvfs as a 1st step: rt sched-class will provide is requirement according to current RT task load. Thenwe will see for some improvement but this fall back in a policy and schedTune could probably help in this area so we can "boost" RT class
It can be debated whether the limitations of CPUfreq established the semantics of the RT class or vice-versa, but either way having RT affect the OPP in this way would be a major semantic/policy change that will almost certainly have significant repercussions in power profiling.
I agree that more broad discussion would be a good before going further. Beyond just the RT guys though I think a community-wide discussion on lkml and linux-pm would be appropriate.
Maybe I'm wrong but I have the impression that once you schedule a task as FIFO/BATCH, sometimes you also need to "hack" into CPUFreq to ensure a minimum OPP which allows to match your tasks demands in terms of time-to-completion.
I've not seen this specific issue. The boosting I've seen is typically associated with CFS tasks. RT tasks on the platforms I've worked with are usually small enough that they can be satisfied regardless of the OPP.
Here the problem is that with the frameworks we have right now people need to use/combine features of different frameworks to achieve their goals. This sounds to me like something which could be improved, provided that we start by splitting responsibility and let user know which tool should be used to achieve a specific goal.
Specifically, if you care about responsiveness and energy-efficiency, you should better use DEADLINE instead of FIFO/RR. While if you go for this last class, than you should be aware that you get a race-to-idle behavior, whatever this means from an energy/power standpoint.
If there's broad consensus that this semantic/policy change is what folks want then I'm all for it, but I'd expect pushback.