[Eas-dev] [PATCH 13/15] sched/fair: Optimize energy computation with task oriented

17 Mar 2018

Let's firstly see one example for a small utilization task is waken up
and need calculate energy for two candidate CPUs.  From hardware design
perspective, each candidate CPU cannot decide OPP by itself due it binds
with other CPUs in the same clock domain, e.g. it binds clock domain
in the cluster with other CPUs, at the end we need to compute energy for
all CPUs in the cluster.
Let's use below CPU topology as the example:
Cluster_0               Cluster_1
    CPU_0                   CPU_4
    CPU_1                   CPU_5
    CPU_2                   CPU_6
    CPU_3                   CPU_7
Current code always calculate the energy for all CPUs in bound clock
domain, if the candidates are CPU_0 and CPU_4, the formula for energy
computation is as below:
E(CPU_x) stands the CPU_x energy, notation E(CPU_x)` means the CPU_x
energy after take placed on it, TE(CPU_x) means the total energy for
all computation for all CPUs when task is placed on CPU_x,
E_DIFF(CPU_x - CPU_y) means the energy difference between CPU_x and
CPU_y, it equals to TE(CPU_x) - TE(CPU_y).
TE(CPU_0) = E(CPU_0)` + E(CPU_1) + E(CPU_2) + E(CPU_3) + E(CLS_0)`
          + E(CPU_4)  + E(CPU_5) + E(CPU_6) + E(CPU_7) + E(CLS_1)
TE(CPU_4) = E(CPU_0)  + E(CPU_1) + E(CPU_2) + E(CPU_3) + E(CLS_0)
          + E(CPU_4)` + E(CPU_5) + E(CPU_6) + E(CPU_7) + E(CLS_1)`
E_DIFF(CPU_0 - CPU_4) = TE(CPU_0) - TE(CPU_4)
...
From upper formula we easily get to know CPU_1/2/3/5/6/7 energy
computations are redundant, on the other hand if we only consider the
energy consumption by waken task, the energy difference is between
the target CPU energy data before and after task placed on it.  This
method have one benefit is it can avoid to compute all CPUs energy in
the same cluster, and only can focus the energy change introduced by
waken task on the target CPU, which this method is called 'task
oriented computation' in this patch, the energy computation can be
optimized as:
TE(CPU_0) = E(CPU_0)` + E(CLS_0)` - E(CPU_0) - E(CLS_0)
TE(CPU_4) = E(CPU_4)` + E(CLS_1)` - E(CPU_4) - E(CLS_1)
E_DIFF(CPU_0 - CPU_4) = TE(CPU_0) - TE(CPU_4)
As the result, the computation iteration can be reduced from 20 times
to 8 times; so this can significantly reduce calculation overload.
After using task oriented computation, it has one case the computation
might take longer time than previous method.  For instance, when
candidates are CPU_0 and CPU1, and after place task on either CPU the
OPP will be increased, in this case, the old method uses below
computation:
TE(CPU_0) = E(CPU_0)` + E(CPU_1) + E(CPU_2) + E(CPU_3) + E(CLS_0)
TE(CPU_1) = E(CPU_0) + E(CPU_1)` + E(CPU_2) + E(CPU_3) + E(CLS_0)
E_DIFF(CPU_0 - CPU_1) = TE(CPU_0) - TE(CPU_1)
Using task oriented computation, because the OPP increasing impacts
other CPUs in the same cluster, so it needs to calculate all related
CPUs energy:
TE(CPU_0) = E(CPU_0)` + E(CPU_1)' + E(CPU_2)' + E(CPU_3)' + E(CLS_0)`
          - E(CPU_0)  - E(CPU_1)  - E(CPU_2)  - E(CPU_3)  - E(CLS_0)
TE(CPU_1) = E(CPU_0)` + E(CPU_1)' + E(CPU_2)' + E(CPU_3)' + E(CLS_0)`
          - E(CPU_0)  - E(CPU_1)  - E(CPU_2)  - E(CPU_3)  - E(CLS_0)
E_DIFF(CPU_0 - CPU_1) = TE(CPU_0) - TE(CPU_1)
We can use more complex method for optimization, e.g. we can extend
eenv structure to cache CPU energy data so can reuse them for two
candidates.  This can be used for later optimization.
As side effect, this patch also resolves energy calculation consistent
issue, e.g. for some cases the energy calculation is for one cluster,
some cases the energy calculation is for multiple clusters; so the
energy data semantics are not consistent for different scenarios.
Computation inconsistent issue might introduce trouble for PE filter.
This patch fixes issue by always calculating task based energy.
To achieve the optimization, this patch utilizes 'eenv->sg_cap' and
'eenv->sg_top' parameters; the parameter 'eenv->sg_cap' is only about
the CPU capacity shared attribution, so eventually it's to describe the
clock domain shared within CPUs, from this parameter we can get to know
the final OPP selection; we need utilize parameter 'eenv->sg_top' to
define which CPU we take care about, if the frequency is not changed
after placing waken task then it will set the first level scheduling
group to it (means one the single CPU) so the energy computation is
limited to this single CPU, otherwise it rolls back to compute all CPUs
in the same clock domain.
Change-Id: Ifb64ad77c6173388abf13e255e2ed8e8586a38bc
Signed-off-by: Leo Yan leo.yan@linaro.org
---
 kernel/sched/fair.c | 76 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 49eee75..4811fce 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5473,7 +5473,7 @@ struct energy_env {
    	/* Estimated energy variation wrt previous CPU */
    	int	nrg_delta;
-	} cpu[NR_CPUS];
+	} cpu[NR_CPUS*2];
/* The morst energy efficient CPU for the specified energy_env::p */
    int			next_cpu;
@@ -5791,45 +5791,55 @@ static int compute_energy(struct energy_env *eenv, int candidate)
  */
 static int compute_task_energy(struct energy_env *eenv, int cpu)
 {
-	struct sched_domain *sd, *sd_cap;
-	struct sched_group *sg;
-	int first_cpu;
+	struct sched_domain *sd;
+	unsigned int prev_cap_idx, next_cap_idx;
+	int cmp_idx, ret;
-	sd = rcu_dereference(per_cpu(sd_ea, cpu));
+	sd = rcu_dereference(per_cpu(sd_scs, cpu));
    if (!sd)
    	return -1; /* Error */
-	sg = sd->groups;
-	do {
-		/* Skip SGs which do not contains a candidate CPU */
-		if (!cpumask_intersects(&eenv->cpus_mask, sched_group_cpus(sg)))
-			continue;
-
-		eenv->sg_top = sg;
-
-		first_cpu = cpumask_first(sched_group_cpus(sg));
-
-		/*
-		 * The CPU capacity sharing attribution is decided by hardhware
-		 * design so we can decide the sg_cp value at the beginning
-		 * for specific CPU.
-		 */
-		sd_cap = rcu_dereference(per_cpu(sd_scs, first_cpu));
-		if (sd_cap && sd_cap->parent)
-			eenv->sg_cap = sd_cap->parent->groups;
-		else
-			eenv->sg_cap = sd_cap->groups;
-
-		find_new_capacity(eenv, cpu);
+	/*
+	 * The CPU capacity sharing attribution is decided by hardhware
+	 * design so we can decide the sg_cp value at the beginning
+	 * for specific CPU.
+	 */
+	if (sd && sd->parent)
+		eenv->sg_cap = sd->parent->groups;
+	else
+		eenv->sg_cap = sd->groups;
+
+	/* Estimate capacity index before task placement */
+	cmp_idx = NR_CPUS + cpu;
+	prev_cap_idx = find_new_capacity(eenv, cmp_idx);
+	next_cap_idx = find_new_capacity(eenv, cpu);
+
+	/*
+	 * Computation is iteration sched_group from bottom to up level for
+	 * energy accumulation, 'sg_top' is top most sched_group:
+	 * - If the CPU frequency has no change before and after task placed
+	 *   onto the target CPU, set 'sg_top' to sched_group for the target
+	 *   CPU; this means only to calculate the energy for this single CPU
+	 *   and ignore other CPUs in the same clock domain.
+	 * - If found the OPP frequency is changed after task placement then
+	 *   need to calculate all CPUs who bound in the same clock domain,
+	 *   so set 'sg_top' to shared capacity scheduling group.
+	 */
+	if (prev_cap_idx != next_cap_idx)
+		eenv->sg_top = eenv->sg_cap;
+	else
+		eenv->sg_top = sd->groups;
-		/* energy is unscaled to reduce rounding errors */
-		if (compute_energy(eenv, cpu) == -EINVAL) {
-			eenv->next_cpu = eenv->prev_cpu;
-			return -EINVAL;
-		}
+	/* energy is unscaled to reduce rounding errors */
+	ret = compute_energy(eenv, cmp_idx);
+	if (ret < 0)
+		return ret;
-	} while (sg = sg->next, sg != sd->groups);
+	ret = compute_energy(eenv, cpu);
+	if (ret < 0)
+		return ret;
+	eenv->cpu[cpu].energy -= eenv->cpu[cmp_idx].energy;
    return 0;
 }
--
1.9.1

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[Eas-dev] [PATCH 13/15] sched/fair: Optimize energy computation with task oriented