Re: [Eas-dev] [RFC 6/6] sched: cap_gov: PELT-based cpu frequency scaling

22 Apr 2015

      On 22 April 2015 at 13:10, Morten Rasmussen morten.rasmussen@arm.com wrote:
...
On Tue, Apr 21, 2015 at 05:58:03PM +0100, Mike Turquette wrote:
...
Quoting Juri Lelli (2015-04-16 09:46:47)
...
On 16/04/15 06:29, Michael Turquette wrote:
...
+#define UP_THRESHOLD           95
Is this a leftover? In the changelog you say that you moved away from
thresholds. Anyway, since we scale utilization by freq, I'm not sure
we can live without some sort of up_threshold. The problem is that if
you are running a task flat out on a CPU at a certain freq, let's say
the lower one, you'll always get a usage for that CPU that corresponds
to the current capacity of that CPU at that freq. As you use the usage
signal to decide when to ramp up, you will never ramp up in this
situation because the signal won't cross the capacity at the lower
frequency.
Juri & Morten,
Yes, the UP_THRESHOLD constant is a leftover.
We discussed the issue of usage being capped at the current capacity in
our call yesterday but I have some doubts. Let's forget big.little for a
moment and talk about an SMP system. On my pandaboard I clearly see
usage values taken directly from get_cpu_usgae that scale up and down
through the whole range (and as a result the cpu frequencies selected
cover the whole range).
Let me clarify that 'capped' was the wrong word. It is converging
towards the current capacity. Sorry for the confusion.
cfs.utilization_load_avg is the sum of PELT utilization for all tasks on
the rq. Utilization is running time tracking which means that the sum can only
temporarily and under special circumstances (such as task migration and
fork) go above 100% (1024) if we ignore frequency invariance. If it goes
above it will converge to 100% over time. It happens fairly quickly for
forked tasks as their avg_period is small in the early life of a new
task.
In Vincent's patch set, my patch 'sched: Make sched entity usage
tracking scale-invariant' changes this a bit. In
__update_entity_runnable_avg() we now scale the utilization PELT signal by
freq_curr/freq_max. The sum (cfs.utilization_load_avg) is therefore also
converging towards freq_curr/freq_max (*1024). For example, running at
300 MHz and freq_max = 1000 MHz, the sum is converging towards 307.
Without any migrations or new tasks, the utilization will be in the range
0..307 no matter how many tasks that are on the rq. Just as before, the
sum may temporarily go above if you have new tasks being forked or task
being migrated to the rq.
Let's take an example where you have a task an existing task waking up
with a low utilization, say 100. It could be a webpage rendering thread
that did minor updates to some webpage already loaded last time it was
scheduled, but this time it is being scheduled to render a new webpage.
The task PELT utilization is added to cfs.utilization_load_avg when it
is enqueued, so the sum is now 100. freq_curr = 300 MHz. The task will
start rendering the webpage and run for quite a while during which it
will built up its PELT utilization. It will ramp up quickly in the
beginning and converge towards 307 due to the freq_curr/freq_max scaling
in __update_entity_runnable_avg(). Due to the properties of the
geometric series it will converge slower and slower the closer we get to
307. PJT defined:
#define LOAD_AVG_MAX_N 345 /* number of full periods to produce
LOAD_MAX_AVG */
where a period here is 1024 us. So if you don't have any other tasks
causing any noise it may take quite a while to get to 307. Worst case
345 ms. If you do have noise you may not see this delay, but I wouldn't
rely on it for determining when to increase the frequency.
In Vincent's patches get_cpu_usage() returns a somewhat modified metric.
static int get_cpu_usage(int cpu)
{
        unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
        unsigned long capacity = capacity_orig_of(cpu);
    if (usage >= SCHED_LOAD_SCALE)
            return capacity;

    return (usage * capacity) >> SCHED_LOAD_SHIFT;

}
The utilization is scaled and capped by cpu capacity.
capacity_orig_of(cpu) is 1024 for non-SMT and non-big.LITTLE systems in
which case get_cpu_usage() just enforces an upper limit for
cfs.utilization_load_avg at 1024. For such systems get_cpu_usage() can
be compared to normalized frequency (freq_curr*1024/freq_max). If you
are running at 300 MHz, your normalized frequency is 300*1024/1000 = 307
and get_cpu_usage() will eventually return 307 if you have at least one
always-running tasks on the cpu.
In mainline Linux, capacity != 1024 for SMT systems (determined by
1178/#hw_threads) and big.LITTLE systems with the clock-frequency
property set in DT (which enables Vincent's capacity scaling code in
topology.c and it enabled in exynos5420.dtsi). In this case
get_cpu_usage() scales utilization to the range
0..capacity_orig_of(cpu).
If we take the example from before but now have an SMT system with two
hw-threads per core, capacity_orig_of() = 589. If you have an
always-running task and you are at 300 MHz, cfs.utilization_load_avg =
307 (as before), but get_cpu_usage() returns 307*589/1024 = 176. 307 is
the convergence target again and won't go above unless other tasks show
up and due to the capacity scaling in get_cpu_usage() it will never go
above 176. If you were running at 1000 MHz (freq_max), get_cpu_usage()
would return 589. You would never go above 589 despite your normalized
frequency freq_curr*1024/freq_max = 1024*1024/1024 = 1024. So here you
would be comparing usage on one scale (0..589) to frequency 'capacity'
on another scale (0..1024). That is broken in my opinion. The same
scaling must be applied on both sides. Either apply capacity_orig_of()
scaling to the frequency or have a non-scaling version of
get_cpu_usage().
The issue is the same for big.LITTLE systems. If you enabled Vincent's
cpu_efficiency code for TC2 by setting the clock-frequency properties in
the DT (as they are in the LSK tree), the A7 capacity_orig_of() = 606.
While I don't want big.LITTLE to be part of the sched/dvfs integration
discussion, IMHO, we are working towards a goal of better scheduling and
power management on all systems including big.LITTLE. So I think we
should keep those in mind too and avoid cutting corners where we know it
will cause trouble for some systems. I'm not asking for you to do
big.LITTLE specific modifications or even mention it in the patch set,
I'm just asking for minor changes that allows us to extend this to work
for big.LITTLE as well.
I agree with Morten that you have to use capacity_orig_of(CPU) instead
of SCHED_CAPACITY_SCALE when you compare the compute capacity of an
frequency with the current usage of the CPU
get_cpu_usage is in the range [0..capacity_orig_of(CPU)] so you have
to scale the compute capacity of the frequency point in the same
range. As Morten points out, in SMP system the capacity_orig_of(CPU)
is SCHED_CAPACITY_SCALE but directly usign this default value is a
shortcut
Regards,
Vincent
...
...
My current testing involves short running tasks that are quickly queued
and dequeued, not a long running task as you suggest. Is there a
different behavior in the way cfs.utilization_load_avg is used depending
on task length?
PELT utilization tracks the running time of the tasks.
cfs.utilization_load_avg is the sum of the PELT utilization of all tasks
on the rq. The PELT utilization builds up when a task is running and
decays when it is blocked/sleeping. Keep in mind that PELT utilization
is initialized to max but with a very short history, so the utilization
value is very sensitive in the early life of a task.
...
Can you please explain why you feel that the return value of
get_cpu_usage will not exceed the current capacity? I do not observe
this behavior. Do you see this when testing only my branch? Or do you
see it when merging my branch with the eas v3 series?
I think it is covered above. I haven't tested the patches myself, but
Juri has confirmed that get_cpu_usage() is converging towards
freq_curr*1024/freq_max using the user-space governor.
...
Vincent,
The value of cfs.utilization_load_avg is already normalized against the
max possible capacity, right? I do not believe that the return value of
get_cpu_usage is capped at the current capacity, but please let me know
if I have a misunderstanding.
As said above, it is not capped but converging towards
freq_curr*capacity_orig_of(cpu)/frq_max.
I hope that answers your questions, please let me know if it doesn't.
Thanks,
Morten

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [RFC 6/6] sched: cap_gov: PELT-based cpu frequency scaling