Hi Patrick,
This patch mainly have two purpose.
The first one purpose is to adjust the range for capacity index so let capacity index and energy index have similiar range between each other. This helps task to fall into more reasonable PE filter region. So this is finished by patch 1.
The second purpose is to support negative boosting value in PE filter, so schedTune has integrity of algorithm which can support both for positive and negative boosting values. As we know, if we set boost value as positive value, then the PE filter region will rotate to right side so give more chance for (PB) region and reduce chance for (PC) region, so finally we can get filter region as below:
^ (O) | / (PB) | / | / | / `-> cut |/ --------------------------> /| / | / | / | / | (PC) | (SO)
On the other than, if set boosting as negative value, then it should rotate the PE filter region to left side, so we can get filter region as below. This is finished by patch 0002~0006.
^ (O) \ | (PB) \ | \ | \ | | --------------------------> |\ | \ | \ | \ | \ (PC) | \ (SO)
Patch 0007 is used to verify PE filter table with LISA. I did some testing on Hikey for TraceAnalysis::plotEDiffSpace() for PE filtering and TraceAnalysis::plotTasks() for boosting signals; have passed these testing.
v2 -> v1: * Refine for patch 0001 to discount cap_delta in function energy_diff(); * Fix bug and typo in patch 0003; * Refine patch 0004, so open optimal and sub-optimal regions checkin; when disabled configuration CONFIG_CGROUP_SCHEDTUNE; * Add patch 0006 to support negative value for sysctl_sched_cfs_boost; * Add patch 0007 to trace energy_diff properly.
Leo Yan (7): sched/fair: discount capacity index for PE filter sched/tune: minor fix for gain table sched/tune: polish for PE gain table index sched/tune: open optimal and sub-optimal regions for checking sched/tune: add PE filter support for negative boosting sched/tune: let sysctl_sched_cfs_boost support negative value DEBUG: sched/tune: move energy_diff trace point
include/linux/sched/sysctl.h | 6 +-- kernel/sched/fair.c | 29 +++++++--- kernel/sched/tune.c | 124 +++++++++++++++++-------------------------- kernel/sysctl.c | 5 +- 4 files changed, 76 insertions(+), 88 deletions(-)
-- 1.9.1
When calculate PE filter we need to use normalized capacity index and energy index to check if migration is fell into filter regions. So for cap_delta value, it directly generate out different from two capacity index based on before and after task's migration. Whatever the migrated task is a very small task or big task, they will have same cap_delta. On the other hand, the nrg_delta is calculated based on task utilization; so we can observe the range for cap_delta is much higher than the range for nrg_delta.
Let's think about one case, e.g. a task's util_avg=10; if there have other tasks are sharing scheduling with this task, and all tasks finally accumulated util_avg=1024 for 100% utilization. In this case, actually this task only can get benefit for performance with cap_delta as:
cap_delta * task(util_avg) / 1024;
The rest part for cap_delta will be consumed by other tasks on the CPU. So we can say this value is for this task's performance boosting in worst case.
On the other hand, we need to distinguish the task performance boosting affection for different CPU utilization. Usually the big load task with high utilization will have more prominent affection rather than low utilization task. So by discounting util_avg value, we can reflect for this affection.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e693a95..793b6e8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5150,6 +5150,16 @@ energy_diff(struct energy_env *eenv) if (boost == 0) return eenv->nrg.diff;
+ /* + * The capacity index delta is meaningful for task with 100% + * utilization; If task will small utilization then we need + * discount the delta as the percentage between the task + * utilization with 100% (1024): + * cap.delta = cap.delta * task_util / 1024 + */ + eenv->cap.delta = + (eenv->cap.delta * eenv->util_delta) >> SCHED_CAPACITY_SHIFT; + /* Compute normalized energy diff */ nrg_delta = normalize_energy(eenv->nrg.diff); eenv->nrg.delta = nrg_delta; -- 1.9.1
When boost=0 this will directly handled by energy_diff(); so if any case for boost > 0, we should give chance for task migration when PE filter fall into performance boost region. So this patch is to adjust gain table so can boost task when task boost value < 10%.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index 5680748..6489f25 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -38,11 +38,11 @@ struct threshold_params { */ static struct threshold_params threshold_gains[] = { - { 0, 5 }, /* < 10% */ - { 1, 5 }, /* < 20% */ - { 2, 5 }, /* < 30% */ - { 3, 5 }, /* < 40% */ - { 4, 5 }, /* < 50% */ + { 1, 5 }, /* < 10% */ + { 2, 5 }, /* < 20% */ + { 3, 5 }, /* < 30% */ + { 4, 5 }, /* < 40% */ + { 5, 5 }, /* < 50% */ { 5, 4 }, /* < 60% */ { 5, 3 }, /* < 70% */ { 5, 2 }, /* < 80% */ -- 1.9.1
Currently we use the same index for PB and PC region, so it's not necessary to define two index value for PE regions cut.
This patch is to polish related code, which only use one index to refer gain table and it will be used both for PB and PC region's cut.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 57 +++++++++++++++++------------------------------------ 1 file changed, 18 insertions(+), 39 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index 6489f25..103b27f 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -19,11 +19,8 @@ unsigned int sysctl_sched_cfs_boost __read_mostly;
extern struct target_nrg schedtune_target_nrg;
-/* Performance Boost region (B) threshold params */ -static int perf_boost_idx; - -/* Performance Constraint region (C) threshold params */ -static int perf_constrain_idx; +/* Performance-Energy (P-E) thresholds index */ +static int pe_threshold_idx;
/** * Performance-Energy (P-E) Space thresholds constants @@ -51,18 +48,10 @@ threshold_gains[] = { };
static int -__schedtune_accept_deltas(int nrg_delta, int cap_delta, - int perf_boost_idx, int perf_constrain_idx) +__schedtune_accept_deltas(int nrg_delta, int cap_delta, int threshold_idx) { int payoff = -INT_MAX; - int gain_idx = -1; - - /* Performance Boost (B) region */ - if (nrg_delta >= 0 && cap_delta > 0) - gain_idx = perf_boost_idx; - /* Performance Constraint (C) region */ - else if (nrg_delta < 0 && cap_delta <= 0) - gain_idx = perf_constrain_idx; + int gain_idx = threshold_idx;
/* Default: reject schedule candidate */ if (gain_idx == -1) @@ -120,11 +109,8 @@ struct schedtune { /* Boost value for tasks on that SchedTune CGroup */ int boost;
- /* Performance Boost (B) region threshold params */ - int perf_boost_idx; - - /* Performance Constraint (C) region threshold params */ - int perf_constrain_idx; + /* Power-efficiency gain table index */ + int threshold_idx;
/* Hint to bias scheduling of tasks on that SchedTune CGroup * towards idle CPUs */ @@ -158,8 +144,7 @@ static inline struct schedtune *parent_st(struct schedtune *st) static struct schedtune root_schedtune = { .boost = 0, - .perf_boost_idx = 0, - .perf_constrain_idx = 0, + .threshold_idx = 0, .prefer_idle = 0, };
@@ -168,8 +153,7 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, struct task_struct *task) { struct schedtune *ct; - int perf_boost_idx; - int perf_constrain_idx; + int idx;
/* Optimal (O) region */ if (nrg_delta < 0 && cap_delta > 0) { @@ -186,12 +170,10 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, /* Get task specific perf Boost/Constraints indexes */ rcu_read_lock(); ct = task_schedtune(task); - perf_boost_idx = ct->perf_boost_idx; - perf_constrain_idx = ct->perf_constrain_idx; + idx = ct->threshold_idx; rcu_read_unlock();
- return __schedtune_accept_deltas(nrg_delta, cap_delta, - perf_boost_idx, perf_constrain_idx); + return __schedtune_accept_deltas(nrg_delta, cap_delta, idx); }
/* @@ -586,7 +568,7 @@ boost_write(struct cgroup_subsys_state *css, struct cftype *cft, s64 boost) { struct schedtune *st = css_st(css); - unsigned threshold_idx; + unsigned int idx; int boost_pct;
if (boost < -100 || boost > 100) @@ -599,15 +581,13 @@ boost_write(struct cgroup_subsys_state *css, struct cftype *cft, * The current implementatio uses the same cuts for both * B and C regions. */ - threshold_idx = clamp(boost_pct, 0, 99) / 10; - st->perf_boost_idx = threshold_idx; - st->perf_constrain_idx = threshold_idx; + idx = clamp(boost_pct, 0, 99) / 10; + st->threshold_idx = idx;
st->boost = boost; if (css == &root_schedtune.css) { sysctl_sched_cfs_boost = boost; - perf_boost_idx = threshold_idx; - perf_constrain_idx = threshold_idx; + pe_threshold_idx = idx; }
/* Update CPU boost */ @@ -759,7 +739,7 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, }
return __schedtune_accept_deltas(nrg_delta, cap_delta, - perf_boost_idx, perf_constrain_idx); + pe_threshold_idx); }
#endif /* CONFIG_CGROUP_SCHEDTUNE */ @@ -770,7 +750,7 @@ sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, loff_t *ppos) { int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); - unsigned threshold_idx; + unsigned idx; int boost_pct;
if (ret || !write) @@ -786,9 +766,8 @@ sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, * The current implementatio uses the same cuts for both * B and C regions. */ - threshold_idx = clamp(boost_pct, 0, 99) / 10; - perf_boost_idx = threshold_idx; - perf_constrain_idx = threshold_idx; + idx = clamp(boost_pct, 0, 99) / 10; + pe_threshold_idx = idx;
return 0; } -- 1.9.1
In current code, if schedTune check the migration falls into (O) region or (SO) region, then it directly bails out. This works well for boost value > 0, which means the boosted task will always migrate for (O) region and will be rejected to migration for (SO) region.
On the other hand, the PE filter formula can handle (O) region and (SO) region well. So we can use PE filter formula to get correct result as well. This is prerequisite for us to open these two regions for checking.
Furthermore, there have another reason for us to check these two regions is for boost < 0. Thinking about if one task with negative boost value, so in this case it hints the task bias to migrate to CPU with lower capacity, for extreme case, like boost = -100, that means scheduler should _ONLY_ take care to reduce capacity rather than power, in this case it may fall into (SO) region.
So we enable these two regions for checking.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 24 ------------------------ 1 file changed, 24 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index 103b27f..d863cd6 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -155,18 +155,6 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, struct schedtune *ct; int idx;
- /* Optimal (O) region */ - if (nrg_delta < 0 && cap_delta > 0) { - trace_sched_tune_filter(nrg_delta, cap_delta, 0, 0, 1, 0); - return INT_MAX; - } - - /* Suboptimal (S) region */ - if (nrg_delta > 0 && cap_delta < 0) { - trace_sched_tune_filter(nrg_delta, cap_delta, 0, 0, -1, 5); - return -INT_MAX; - } - /* Get task specific perf Boost/Constraints indexes */ rcu_read_lock(); ct = task_schedtune(task); @@ -726,18 +714,6 @@ int schedtune_accept_deltas(int nrg_delta, int cap_delta, struct task_struct *task) { - /* Optimal (O) region */ - if (nrg_delta < 0 && cap_delta > 0) { - trace_sched_tune_filter(nrg_delta, cap_delta, 0, 0, 1, 0); - return INT_MAX; - } - - /* Suboptimal (S) region */ - if (nrg_delta > 0 && cap_delta < 0) { - trace_sched_tune_filter(nrg_delta, cap_delta, 0, 0, -1, 5); - return -INT_MAX; - } - return __schedtune_accept_deltas(nrg_delta, cap_delta, pe_threshold_idx); } -- 1.9.1
In current code, PE filter don't support negative boosting and it will directly convert the same case with boost=0.
So this patch is to add PE filter support for negative boosting, when the boosting value is negative, then that means we should rotate left for cut regions; for boosting value = -100, that means the cut regions should rotate totally under X-axis; so finally it will only consider to place task with lower capacity but don't care about any case for energy.
After PE filter support both positive and negative boosting value, it will have more confidence for algorithm integrity.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index d863cd6..ba1fdfb 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -47,15 +47,34 @@ threshold_gains[] = { { 5, 0 } /* <= 100% */ };
+static struct threshold_params +negative_threshold_gains[] = { + { -1, 5 }, /* < 10% */ + { -2, 5 }, /* < 20% */ + { -3, 5 }, /* < 30% */ + { -4, 5 }, /* < 40% */ + { -5, 5 }, /* < 50% */ + { -5, 4 }, /* < 60% */ + { -5, 3 }, /* < 70% */ + { -5, 2 }, /* < 80% */ + { -5, 1 }, /* < 90% */ + { -5, 0 } /* <= 100% */ +}; + static int __schedtune_accept_deltas(int nrg_delta, int cap_delta, int threshold_idx) { int payoff = -INT_MAX; - int gain_idx = threshold_idx; - - /* Default: reject schedule candidate */ - if (gain_idx == -1) - return payoff; + int gain_idx; + struct threshold_params *gain_table; + + if (threshold_idx > 0) { + gain_idx = threshold_idx; + gain_table = threshold_gains; + } else { + gain_idx = -threshold_idx; + gain_table = negative_threshold_gains; + }
/* * Evaluate "Performance Boost" vs "Energy Increase" @@ -86,8 +105,8 @@ __schedtune_accept_deltas(int nrg_delta, int cap_delta, int threshold_idx) * for both the B and C regions, we can use the same payoff formula * where a positive value represents the accept condition. */ - payoff = cap_delta * threshold_gains[gain_idx].nrg_gain; - payoff -= nrg_delta * threshold_gains[gain_idx].cap_gain; + payoff = cap_delta * gain_table[gain_idx].nrg_gain; + payoff -= nrg_delta * gain_table[gain_idx].cap_gain;
return payoff; } @@ -569,7 +588,7 @@ boost_write(struct cgroup_subsys_state *css, struct cftype *cft, * The current implementatio uses the same cuts for both * B and C regions. */ - idx = clamp(boost_pct, 0, 99) / 10; + idx = clamp(boost_pct, -99, 99) / 10; st->threshold_idx = idx;
st->boost = boost; @@ -742,7 +761,7 @@ sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, * The current implementatio uses the same cuts for both * B and C regions. */ - idx = clamp(boost_pct, 0, 99) / 10; + idx = clamp(boost_pct, -99, 99) / 10; pe_threshold_idx = idx;
return 0; -- 1.9.1
In previous code sysctl_sched_cfs_boost is limited to only set value in the range [0..100], and cannot set negative value to proc node /proc/sys/kernel/sched_cfs_boost.
So this patch is to change sysctl_sched_cfs_boost as 'int' type and change its range to [-100..100] in sysctl.
Signed-off-by: Leo Yan leo.yan@linaro.org --- include/linux/sched/sysctl.h | 6 +++--- kernel/sched/tune.c | 2 +- kernel/sysctl.c | 5 +++-- 3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index d68e88c..0d29f58 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -88,16 +88,16 @@ extern unsigned int sysctl_sched_cfs_bandwidth_slice; #endif
#ifdef CONFIG_SCHED_TUNE -extern unsigned int sysctl_sched_cfs_boost; +extern int sysctl_sched_cfs_boost; int sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos); -static inline unsigned int get_sysctl_sched_cfs_boost(void) +static inline int get_sysctl_sched_cfs_boost(void) { return sysctl_sched_cfs_boost; } #else -static inline unsigned int get_sysctl_sched_cfs_boost(void) +static inline int get_sysctl_sched_cfs_boost(void) { return 0; } diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index ba1fdfb..b2eefe7 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -15,7 +15,7 @@ static bool schedtune_initialized = false; #endif
-unsigned int sysctl_sched_cfs_boost __read_mostly; +int sysctl_sched_cfs_boost __read_mostly;
extern struct target_nrg schedtune_target_nrg;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c index d964422..91272da 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -126,7 +126,8 @@ static int __maybe_unused one = 1; static int __maybe_unused two = 2; static int __maybe_unused four = 4; static unsigned long one_ul = 1; -static int one_hundred = 100; +static int __maybe_unused one_hundred = 100; +static int __maybe_unused neg_one_hundred = -100; #ifdef CONFIG_PRINTK static int ten_thousand = 10000; #endif @@ -504,7 +505,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, #endif .proc_handler = &sysctl_sched_cfs_boost_handler, - .extra1 = &zero, + .extra1 = &neg_one_hundred, .extra2 = &one_hundred, }, #endif -- 1.9.1
In energy_diff trace point, it records payoff so we can know the calculation result for PE filter. In current code, the trace point is recorded in __energy_diff(), but this function is before PE filter calculation, so always we get a stale value for payoff in the stack; As result using tools (like LISA) to get wrong PE filter analysis result.
So this patch is to move energy_diff trace point to function energy_aware_wake_cpu(), so we can gather all correct info and can use trace point well after disable CONFIG_SCHED_TUNE.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 793b6e8..785bb8d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5093,12 +5093,6 @@ static inline int __energy_diff(struct energy_env *eenv) eenv->nrg.diff = eenv->nrg.after - eenv->nrg.before; eenv->payoff = 0;
- trace_sched_energy_diff(eenv->task, - eenv->src_cpu, eenv->dst_cpu, eenv->util_delta, - eenv->nrg.before, eenv->nrg.after, eenv->nrg.diff, - eenv->cap.before, eenv->cap.after, eenv->cap.delta, - eenv->nrg.delta, eenv->payoff); - return eenv->nrg.diff; }
@@ -5752,7 +5746,7 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) int target_max_cap = INT_MAX; int target_cpu = task_cpu(p); unsigned long task_util_boosted, new_util; - int i; + int diff, i;
if (sysctl_sched_sync_hint_enable && sync) { int cpu = smp_processor_id(); @@ -5843,6 +5837,7 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) }
if (target_cpu != task_cpu(p)) { + struct energy_env eenv = { .util_delta = task_util(p), .src_cpu = task_cpu(p), @@ -5854,7 +5849,15 @@ static int energy_aware_wake_cpu(struct task_struct *p, int target, int sync) if (cpu_overutilized(task_cpu(p))) return target_cpu;
- if (energy_diff(&eenv) >= 0) + diff = energy_diff(&eenv); + + trace_sched_energy_diff(eenv.task, + eenv.src_cpu, eenv.dst_cpu, eenv.util_delta, + eenv.nrg.before, eenv.nrg.after, eenv.nrg.diff, + eenv.cap.before, eenv.cap.after, eenv.cap.delta, + eenv.nrg.delta, eenv.payoff); + + if (diff >= 0) return task_cpu(p); }
-- 1.9.1
On 23-Sep 10:58, Leo Yan wrote:
Hi Patrick,
Hi Leo, here are some comments to be further discussed in the tomorrow's hacking session.
This patch mainly have two purpose.
The first one purpose is to adjust the range for capacity index so let capacity index and energy index have similiar range between each other. This helps task to fall into more reasonable PE filter region. So this is finished by patch 1.
Do we have some PESpace plots to compare the filter behaviors pre and post this patch?
The second purpose is to support negative boosting value in PE filter, so schedTune has integrity of algorithm which can support both for positive and negative boosting values.
That's right, so fare we use negative boosting only for OPP biasing but not for CPU biasing. But let start by defining what is the goal we are after. Do we want to use negative boosting to somehow bias the selection of a little CPU also for a big task?
As we know, if we set boost value as positive value, then the PE filter region will rotate to right side so give more chance for (PB) region and reduce chance for (PC) region, so finally we can get filter region as below:
^ (O) | / (PB) | / | / | / `-> cut |/ --------------------------> /| / | / | / | / | (PC) | (SO)
On the other than, if set boosting as negative value, then it should rotate the PE filter region to left side, so we can get filter region as below. This is finished by patch 0002~0006.
^ (O) \ | (PB) \ | \ | \ | \| --------------------------> |\ | \ | \ | \ | \ (PC) | \ (SO)
If I understand it correctly this means that the more we negative boost a task the more we are willing to accept scheduling candidates which are part of the Suboptimal region. Is that right?
IMO we should always avoid SO candidates, because we know that for these scheduling candidates we will spend more energy for lower performances.
I have also some doubts about the filtering related to the Optional region. Why we would like to avoid to get better performances while also saving energy?
Provided that we want to bias CPU selection for negative boosted tasks (it that is our goal) I'm wondering if we cannot do that by exploiting this information in the energy_aware_wake_cpu, even before getting to the energy_diff and accept_deltas.
Patch 0007 is used to verify PE filter table with LISA. I did some testing on Hikey for TraceAnalysis::plotEDiffSpace() for PE filtering and TraceAnalysis::plotTasks() for boosting signals; have passed these testing.
Would be nice to review these data tomorrow.
v2 -> v1:
- Refine for patch 0001 to discount cap_delta in function energy_diff();
- Fix bug and typo in patch 0003;
- Refine patch 0004, so open optimal and sub-optimal regions checkin; when disabled configuration CONFIG_CGROUP_SCHEDTUNE;
- Add patch 0006 to support negative value for sysctl_sched_cfs_boost;
- Add patch 0007 to trace energy_diff properly.
Leo Yan (7): sched/fair: discount capacity index for PE filter sched/tune: minor fix for gain table sched/tune: polish for PE gain table index sched/tune: open optimal and sub-optimal regions for checking sched/tune: add PE filter support for negative boosting sched/tune: let sysctl_sched_cfs_boost support negative value DEBUG: sched/tune: move energy_diff trace point
include/linux/sched/sysctl.h | 6 +-- kernel/sched/fair.c | 29 +++++++--- kernel/sched/tune.c | 124 +++++++++++++++++-------------------------- kernel/sysctl.c | 5 +- 4 files changed, 76 insertions(+), 88 deletions(-)
-- 1.9.1
-- #include <best/regards.h>
Patrick Bellasi
Hi Patrick,
On Wed, Sep 28, 2016 at 12:31:07AM +0100, Patrick Bellasi wrote:
[...]
This patch mainly have two purpose.
The first one purpose is to adjust the range for capacity index so let capacity index and energy index have similiar range between each other. This helps task to fall into more reasonable PE filter region. So this is finished by patch 1.
Do we have some PESpace plots to compare the filter behaviors pre and post this patch?
Have not yet.
The second purpose is to support negative boosting value in PE filter, so schedTune has integrity of algorithm which can support both for positive and negative boosting values.
That's right, so fare we use negative boosting only for OPP biasing but not for CPU biasing. But let start by defining what is the goal we are after. Do we want to use negative boosting to somehow bias the selection of a little CPU also for a big task?
Yes. Negative boosting value will impact task placement but not OPP selection.
As we know, if we set boost value as positive value, then the PE filter region will rotate to right side so give more chance for (PB) region and reduce chance for (PC) region, so finally we can get filter region as below:
^ (O) | / (PB) | / | / | / `-> cut |/ --------------------------> /| / | / | / | / | (PC) | (SO)
On the other than, if set boosting as negative value, then it should rotate the PE filter region to left side, so we can get filter region as below. This is finished by patch 0002~0006.
^ (O) \ | (PB) \ | \ | \ | \| --------------------------> |\ | \ | \ | \ | \ (PC) | \ (SO)
If I understand it correctly this means that the more we negative boost a task the more we are willing to accept scheduling candidates which are part of the Suboptimal region. Is that right?
Yes.
IMO we should always avoid SO candidates, because we know that for these scheduling candidates we will spend more energy for lower performances.
I have also some doubts about the filtering related to the Optional region. Why we would like to avoid to get better performances while also saving energy?
If we set negative boosting value, then from my understanding it's capacity suppressing oriented; so every region have below defintion:
- For (O) region, it means if increasing specific capacity, we should see the energy should decrease to certain scope; - For (SO) region, it means if decrease specific capacity, we even we can give chance for increasing limited energy, due we really want to see the capacity can be reduced; (if we thinking the corner case like little core's highest OPP has lower capacity with higher power efficiency than big core's lowest OPP, so it's possible to migrate to little core in this case) - For (PC) region, it means capacity and energy both decrease, this is the best case;
So for boost = -100, it means it completely accept capacity reducing, whatever the energy difference.
Provided that we want to bias CPU selection for negative boosted tasks (it that is our goal) I'm wondering if we cannot do that by exploiting this information in the energy_aware_wake_cpu, even before getting to the energy_diff and accept_deltas.
Patch 0007 is used to verify PE filter table with LISA. I did some testing on Hikey for TraceAnalysis::plotEDiffSpace() for PE filtering and TraceAnalysis::plotTasks() for boosting signals; have passed these testing.
Would be nice to review these data tomorrow.
Yes.
v2 -> v1:
- Refine for patch 0001 to discount cap_delta in function energy_diff();
- Fix bug and typo in patch 0003;
- Refine patch 0004, so open optimal and sub-optimal regions checkin; when disabled configuration CONFIG_CGROUP_SCHEDTUNE;
- Add patch 0006 to support negative value for sysctl_sched_cfs_boost;
- Add patch 0007 to trace energy_diff properly.
Leo Yan (7): sched/fair: discount capacity index for PE filter sched/tune: minor fix for gain table sched/tune: polish for PE gain table index sched/tune: open optimal and sub-optimal regions for checking sched/tune: add PE filter support for negative boosting sched/tune: let sysctl_sched_cfs_boost support negative value DEBUG: sched/tune: move energy_diff trace point
include/linux/sched/sysctl.h | 6 +-- kernel/sched/fair.c | 29 +++++++--- kernel/sched/tune.c | 124 +++++++++++++++++-------------------------- kernel/sysctl.c | 5 +- 4 files changed, 76 insertions(+), 88 deletions(-)
-- 1.9.1
-- #include <best/regards.h>
Patrick Bellasi