Re: sched: ARM: arch_scale_freq_power

11 Oct 2011


      On Tue, 2011-10-11 at 15:08 +0530, Amit Kucheria wrote:
...
...
That shouldn't be done using cpu_power, we have sched_smt_power_savings
and sched_mc_power_savings for stuff like that.
AFAICT, sched_mc assume all cores to have the same capacity - which is
certainly true of the x86 architecture. But in ARM you can see hybrid
cores[1] designed using different fab technology, so that some cores
can run at 'n' GHz and some at 'm' GHz. The idea being that when there
isn't much to do (e.g periodic keep alives for messaging, email, etc.)
you don't wake up the higher power-consuming cores.
From TFA[1],  "Sheeva was already capable of 1.2GHz, but the new
design can go up to 1.5GHz. But only two of the 628's Sheeva cores run
at the full 1.5GHz. The third one is down-clocked to 624MHz, and
interesting design choice that saves on power but adds some extra
utility. In a sense, the 628 could be called a 2.5-core design."
Cute :-)
...
Are we mistaken in thinking that sched_mc can not currently handle
this usecase? How would we 'tune' sched_mc to do this w/o playing with
cpu_power?
Yeah, sched_mc wants some TLC there.
...
...
Although I would really like to kill all those different
sched_*_power_savings knobs and reduce it to one.
...
If the cpu_power is
higher than 1024, the cpu is no more seen out of capacity by the
load_balance as soon as a short process is running and teh main result
is that the small tasks will stay on the same cpu. This configuration
is mainly usefull for ARM dual core system when we want to power gate
one cpu. I use cyclictest to simulate such use case.
Yeah, but that's wrong.
What is wrong - the use case simulation using cyclictest? Can you
suggest better tools?
Using cpu_power to do power saving load-balancing like that.
So ideally cpu_power is simply a factor in the weight balance decision
such that:
cpu_weight_i      cpu_weigjt_j
    ------------  ~=  ------------
    cpu_power_i       cpu_power_j
This yields that under sufficient[*] load, eg. 5 equal weight tasks and
your 2.5 core thingy, you'd get 2:2:1
The decision on what to do on under-utilized systems should be separate
from this.
Currently the load-balancer doesn't know about 'short' running processes
at all, we just have nr_running and weight it doesn't know/care about
for how long those will be around for.
Now for some of the cgroup crap we track a time-weighted weight average,
and pjt was talking about pulling that up into the normal code to get
rid of our multitude of different ways to calculate actual load. [**]
(/me pokes pjt with a sharp stick, where those patches at!?)
But that only gets you half-way there, you also need to compute an
effective time-weighted load per task to go with that.. now while all
that is quite feasible, the problem is overhead. We very much already
are way to expensive and should be cutting back, not keep adding more
and more accounting.
[*] Sufficient such that the weight problem is feasible. eg. 3 equal
tasks on 2 equal cores can never be statically balanced, 2 unequal tasks
on 2 equal cores (or v.v.) can't ever be balanced.
[**] I suspect this might solve the over-balancing problem triggered by
tasks woken from the tick that also does the load-balance pass. This
load-balance pass will run in sIRQ context and thus preempt running all
those just woken tasks, thus giving the impression the CPU is very busy,
while in fact most those tasks will instantly go back to sleep after
finding nothing to do.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: sched: ARM: arch_scale_freq_power