Re: [Eas-dev] [PATCH RFC 1/3] sched/fair: Optimize energy calculation with task oriented

31 Jan 2018


      Hi all,
On Fri, Jan 19, 2018 at 12:34:20PM +0800, Leo Yan wrote:
...
Let's firstly see one example for a small utilization task is waken up
and need calculate energy for two candidate CPUs; each candidate CPU
cannot decide the final OPP by itself due it binds with other CPUs in
the same clock domain, at the end we need calculate all CPUs energy.
Let's use below CPU topology as the example:
Cluster_0               Cluster_1
CPU_0                   CPU_4
CPU_1                   CPU_5
CPU_2                   CPU_6
CPU_3                   CPU_7


Current code always calculate the energy for all CPUs in bound clock
domain, if the candidate CPUs are CPU_0 and CPU_4, then the formula for
energy calculation is as below:
E(CPU_0) = E(CPU_0)` + E(CPU_1) + E(CPU_2) + E(CPU_3) + E(CLS_0)`
         + E(CPU_4)  + E(CPU_5) + E(CPU_6) + E(CPU_7) + E(CLS_1)
E(CPU_4) = E(CPU_0)  + E(CPU_1) + E(CPU_2) + E(CPU_3) + E(CLS_0)
         + E(CPU_4)` + E(CPU_5) + E(CPU_6) + E(CPU_7) + E(CLS_1)`
E_Diff(CPU_0 - CPU_4) = E(CPU_0) - E(CPU_4)
But from upper formula we can easily get to know CPU_1/2/3/5/6/7 energy
calculation are redundant, so if we only take account the energy for the
task consumed (but not compute all CPUs energy) after place it onto one
specific CPU, then the energy calculation can be optimized as:
E(CPU_0) = E(CPU_0)` + E(CLS_0)` - E(CPU_0) - E(CLS_0)
E(CPU_4) = E(CPU_4)` + E(CLS_1)` - E(CPU_4) - E(CLS_1)
E_Diff(CPU_0 - CPU_4) = E(CPU_0) - E(CPU_4)
So the energy calculation iteration can be reduced from 20 times to 8
times; this can significant reduce the energy calculation overload.
After using task oriented calculation, there has one case the energy
calculation might take longer time than previous method. For instance,
if candidate CPUs are CPU_0 and CPU1, and after place task on either
CPU the CPU OPP will be increased. In this case, the old code uses
below method for energy calculation:
E(CPU_0) = E(CPU_0)` + E(CPU_1) + E(CPU_2) + E(CPU_3) + E(CLS_0)
E(CPU_1) = E(CPU_0) + E(CPU_1)` + E(CPU_2) + E(CPU_3) + E(CLS_0)
E_Diff(CPU_1 - CPU_0) = E(CPU_1) - E(CPU_0)
Because the OPP increasing impacts other CPUs in the same clock domain,
so it needs to calculate all related CPUs energy:
E(CPU_0) = E(CPU_0)` + E(CPU_1)' + E(CPU_2)' + E(CPU_3)' + E(CLS_0)`
         - E(CPU_0)  - E(CPU_1)  - E(CPU_2)  - E(CPU_3)  - E(CLS_0)
E(CPU_1) = E(CPU_0)` + E(CPU_1)' + E(CPU_2)' + E(CPU_3)' + E(CLS_0)`
         - E(CPU_0)  - E(CPU_1)  - E(CPU_2)  - E(CPU_3)  - E(CLS_0)
E_Diff(CPU_1 - CPU_0) = E(CPU_1) - E(CPU_0)
We can use more complex method for optimization, e.g. firstly calculate
the CPU_0 OPP and CPU_1 OPP and directly select CPU with most power
efficiency OPP. Or we can reuse the energy data before task placement
for two candidates. These methods can be used for later optimization.
As side effect, this patch also resolves energy calculation consistent
issue, e.g. for some cases the energy calculation is for one cluster,
some cases the energy calculation is for multiple clusters; so the
energy data semantics are not consistent for different scenarios. This
patch fixes issue by always calculating task based energy.
To achieve the optimization, this patch utilizes 'eenv->sg_cap' and
'eenv->sg_top' parameters; the parameter 'eenv->sg_cap' is only about
the CPU capacity shared attribution, so eventually it's to describe the
clock domain shared within CPUs, from this parameter we can get to know
the final OPP selection; we need utilize parameter 'eenv->sg_top' to
define which CPU we take care about, if the frequency is not changed
after placing waken task then it will set the first level scheduling
group to it (means one the single CPU) so finally the energy calculation
can be limited to this single CPU.
On Hikey960, after fixing LITTLT CPU freq to 1402000Hz and big CPU to
1421000Hz, with the home screen scenario for 10s ftrace log the energy
calculation duration can be optimized as below:
Energy calculation between LITTLE CPU and big CPU, the duration can be
decreased from 34660ns to 16565ns (52% decreasing); when the energy
calculation between the two CPUs in the same cluster, the duration can
be decreased from 24342ns to 21093ns (13% decreasing).
Thanks a lot for Daniel reviewing and suggestion, this patch is big and
hard to digest so I will split this patch into two smaller patches
(or even more smaller patches if I can) for easier reviewing:
- The first patch is to add cpu frequency predication around
  and cancel redundant CPUs for energy calculation;
- The second patch is to introduce task energy calculation;
I will prepare for new patch set for this. FYI.
[...]
Thanks,
Leo Yan

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH RFC 1/3] sched/fair: Optimize energy calculation with task oriented