On Thu, Oct 15, 2015 at 06:24:16PM +0200, Vincent Guittot wrote:
On 15 October 2015 at 18:10, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Thu, Oct 15, 2015 at 10:24:27AM +0100, Patrick Bellasi wrote:
On Thu, Oct 15, 2015 at 09:23:58AM +0200, Vincent Guittot wrote:
On 14 October 2015 at 21:58, Steve Muckle steve.muckle@linaro.org wrote:
On 10/14/2015 01:58 AM, Patrick Bellasi wrote:
> From my experience race to idle has never panned out as an > energy-efficient strategy, presumably due to the nonlinear increase in > power cost as performance increases.
I agree with you that "race-to-idle" is not (always) a good energy-efficient strategy. However, is the _main_ goal of sched-DVFS to be energy-efficient?
I'd say the primary goal of sched-dvfs is to manage CPU frequency to offer the required performance for a platform at the best possible consumption of energy.
In this case, what should we do for platforms where the lower OPP are less energy-efficient than some higher OPP? We just discovered from some discussions at Connect that there are many platforms adopting that strategy for certain different reasons.
If a lower OPP is less energy efficient than a higher one, I'd expect it to be removed from the devicetree configuration of available frequencies for the governor to choose from.
I agree on that point too and i think this has also been discussed on LKML. Having a low OPP that is less power efficicent than an higher one doesn't make any sense for both power and performance pov
Actually, I think it could make sense from a power standpoint.
For example, if you are under a thermal constraint but still want to progress with your workload there are only two possibilities: a) throttle an energy-efficient OPP b) switch to a low-power OPP, even if less energy-efficient i.e. reduced the F without lowering the V
I have heard the same argument several times from thermal management people. Despite their lower energy efficiency they still rely on their lower power to stay within the thermal budget.
I'm personally more for the first solution and a suitable usage of bandwidth control could be just enough to provide such a solution.
Agreed, using inefficient OPPs is not really desirable, but is the only option available at the moment. I think that ideally, we should have hardware implemented idle-injection (throttling) instead and let the thermal framework specify the duty cycle. Throttling through software could work, but it isn't feasible to do tricks like aligning the throttling across all cpus in a cluster to enter a deeper idle-state and make the throttling even more efficient.
Back to reality... We do have the choice to avoid those inefficient state in the OPP table. We could hide them from schedDVFS/cpufreq governors when the system isn't thermally constrained and only enable them when strictly needed. I think somebody (Steve?) suggested recently that we could do something like reordered the list OPPs to be ordered by decreasing energy efficiency and always start the search for an appropriate OPP from the the beginning. That way we should never pick and inefficient one unless thermal contraints forces us to it. I think that could be a good start.
Looks like we are aligned with the behavior :-) Not sure what is meant by reordering the OPP list But i'm not sure that we really need to order them. We just need to ensure that at each moment, an enable OPP must be more power efficient that an OPP with higher freq. By default, we enable the OPP with max capacity wrt the efficiency Then, we can let thermal manager to enable/disable some OPPs because of thermal mitigation but it msut ensue that the previous rule is always true
I think we are saying the same thing :-) We want to make sure we don't pick inefficient OPPs unless were are forced to do it due to thermal contraints. The rest is just implementation details.
It looks like we can't get rid of those OPPs anytime soon, so we better not assume that aren't there and deal with them instead.