Hi Patrick,
This patch series is to refine and enhance schedTune.
There have mainly two purpose. One purpose is to adjust the range for capacity index so let capacity index and energy index have similiar range between each other. This will help for task to fall into more reasonable PE filter region. This is finished by patch 1.
Another target is to support negative boosting value in PE filter, so schedTune has integrity of algorithm which can support both for positive and negative boosting values. This is finished by patch 2~5.
Please note, this patch set is mainly used for discussion. I have _NOT_ do any testing at my side.
Leo Yan (5): sched/fair: discount capacity index for PE filter sched/tune: minor fix for gain table sched/tune: polish for PE gain table index sched/tune: open optimal and sub-optimal regions for checking sched/tune: add PE filter support for negative boosting
kernel/sched/fair.c | 10 +++++ kernel/sched/tune.c | 111 +++++++++++++++++++++++----------------------------- 2 files changed, 58 insertions(+), 63 deletions(-)
-- 1.9.1
When calculate PE filter we need to use normalized capacity index and energy index to check if migration is fell into filter regions. So for cap_delta value, it directly generate out different from two capacity index based on before and after task's migration. Whatever the migrated task is a very small task or big task, they will have same cap_delta. On the other hand, the nrg_delta is calculated based on task utilization; so we can observe the range for cap_delta is much higher than the range for nrg_delta.
Let's think about one case, e.g. a task's util_avg=10; if there have other tasks are sharing scheduling with this task, and all tasks finally accumulated util_avg=1024 for 100% utilization. In this case, actually this task only can get benefit for performance with cap_delta as:
cap_delta * task(util_avg) / 1024;
The rest part for cap_delta will be consumed by other tasks on the CPU. So we can say this value is for this task's performance boosting in worst case.
On the other hand, we need to distinguish the task performance boosting affection for different CPU utilization. Usually the big load task with high utilization will have more prominent affection rather than low utilization task. So by discounting util_avg value, we can reflect for this affection.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/fair.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 17dcd8e..661982f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5096,6 +5096,16 @@ static inline int __energy_diff(struct energy_env *eenv) eenv->nrg.diff = eenv->nrg.after - eenv->nrg.before; eenv->payoff = 0;
+ /* + * The capacity index delta is meaningful for task with 100% + * utilization; If task will small utilization then we need + * discount the delta as the percentage between the task + * utilization with 100% (1024): + * cap.delta = cap.delta * task_util / 1024 + */ + eenv->cap.delta = + (eenv->cap.delta * eenv->util_delta) >> SCHED_CAPACITY_SHIFT; + trace_sched_energy_diff(eenv->task, eenv->src_cpu, eenv->dst_cpu, eenv->util_delta, eenv->nrg.before, eenv->nrg.after, eenv->nrg.diff, -- 1.9.1
When boost=0 this will directly handled by energy_diff(); so if any case for boost > 0, we should give chance for task migration when PE filter fall into performance boost region. So this patch is to adjust gain table so can boost task when task boost value < 10%.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index bd7f319..dbe1825 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -38,11 +38,11 @@ struct threshold_params { */ static struct threshold_params threshold_gains[] = { - { 0, 5 }, /* < 10% */ - { 1, 5 }, /* < 20% */ - { 2, 5 }, /* < 30% */ - { 3, 5 }, /* < 40% */ - { 4, 5 }, /* < 50% */ + { 1, 5 }, /* < 10% */ + { 2, 5 }, /* < 20% */ + { 3, 5 }, /* < 30% */ + { 4, 5 }, /* < 40% */ + { 5, 5 }, /* < 50% */ { 5, 4 }, /* < 60% */ { 5, 3 }, /* < 70% */ { 5, 2 }, /* < 80% */ -- 1.9.1
Currently we use the same index for PB and PC region, so it's not necessary to define two index value for PE regions cut.
This patch is to polish related code, which only use one index to refer gain table and it will be used both for PB and PC region's cut.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 58 +++++++++++++++++------------------------------------ 1 file changed, 18 insertions(+), 40 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index dbe1825..4bca506 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -19,11 +19,8 @@ unsigned int sysctl_sched_cfs_boost __read_mostly;
extern struct target_nrg schedtune_target_nrg;
-/* Performance Boost region (B) threshold params */ -static int perf_boost_idx; - -/* Performance Constraint region (C) threshold params */ -static int perf_constrain_idx; +/* Performance-Energy (P-E) thresholds index */ +static int threshold_idx;
/** * Performance-Energy (P-E) Space thresholds constants @@ -51,18 +48,10 @@ threshold_gains[] = { };
static int -__schedtune_accept_deltas(int nrg_delta, int cap_delta, - int perf_boost_idx, int perf_constrain_idx) +__schedtune_accept_deltas(int nrg_delta, int cap_delta, int threshold_idx) { int payoff = -INT_MAX; - int gain_idx = -1; - - /* Performance Boost (B) region */ - if (nrg_delta >= 0 && cap_delta > 0) - gain_idx = perf_boost_idx; - /* Performance Constraint (C) region */ - else if (nrg_delta < 0 && cap_delta <= 0) - gain_idx = perf_constrain_idx; + int gain_idx = threshold_idx;
/* Default: reject schedule candidate */ if (gain_idx == -1) @@ -120,11 +109,8 @@ struct schedtune { /* Boost value for tasks on that SchedTune CGroup */ int boost;
- /* Performance Boost (B) region threshold params */ - int perf_boost_idx; - - /* Performance Constraint (C) region threshold params */ - int perf_constrain_idx; + /* Power-efficiency gain table index */ + int threshold_idx;
/* Hint to bias scheduling of tasks on that SchedTune CGroup * towards idle CPUs */ @@ -158,8 +144,7 @@ static inline struct schedtune *parent_st(struct schedtune *st) static struct schedtune root_schedtune = { .boost = 0, - .perf_boost_idx = 0, - .perf_constrain_idx = 0, + .threshold_idx = 0, .prefer_idle = 0, };
@@ -168,8 +153,7 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, struct task_struct *task) { struct schedtune *ct; - int perf_boost_idx; - int perf_constrain_idx; + int idx;
/* Optimal (O) region */ if (nrg_delta < 0 && cap_delta > 0) { @@ -186,12 +170,10 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, /* Get task specific perf Boost/Constraints indexes */ rcu_read_lock(); ct = task_schedtune(task); - perf_boost_idx = ct->perf_boost_idx; - perf_constrain_idx = ct->perf_constrain_idx; + idx = ct->threshold_idx; rcu_read_unlock();
- return __schedtune_accept_deltas(nrg_delta, cap_delta, - perf_boost_idx, perf_constrain_idx); + return __schedtune_accept_deltas(nrg_delta, cap_delta, idx); }
/* @@ -586,7 +568,7 @@ boost_write(struct cgroup_subsys_state *css, struct cftype *cft, s64 boost) { struct schedtune *st = css_st(css); - unsigned threshold_idx; + unsigned int idx; int boost_pct;
if (boost < -100 || boost > 100) @@ -599,15 +581,13 @@ boost_write(struct cgroup_subsys_state *css, struct cftype *cft, * The current implementatio uses the same cuts for both * B and C regions. */ - threshold_idx = clamp(boost_pct, 0, 99) / 10; - st->perf_boost_idx = threshold_idx; - st->perf_constrain_idx = threshold_idx; + idx = clamp(boost_pct, 0, 99) / 10; + st->threshold_idx = idx;
st->boost = boost; if (css == &root_schedtune.css) { sysctl_sched_cfs_boost = boost; - perf_boost_idx = threshold_idx; - perf_constrain_idx = threshold_idx; + threshold_idx = idx; }
/* Update CPU boost */ @@ -756,8 +736,7 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, return -INT_MAX; }
- return __schedtune_accept_deltas(nrg_delta, cap_delta, - perf_boost_idx, perf_constrain_idx); + return __schedtune_accept_deltas(nrg_delta, cap_delta, threshold_idx); }
#endif /* CONFIG_CGROUP_SCHEDTUNE */ @@ -768,7 +747,7 @@ sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, loff_t *ppos) { int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); - unsigned threshold_idx; + unsigned idx; int boost_pct;
if (ret || !write) @@ -784,9 +763,8 @@ sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, * The current implementatio uses the same cuts for both * B and C regions. */ - threshold_idx = clamp(boost_pct, 0, 99) / 10; - perf_boost_idx = threshold_idx; - perf_constrain_idx = threshold_idx; + idx = clamp(boost_pct, 0, 99) / 10; + threshold_idx = threshold_idx;
return 0; } -- 1.9.1
In current code, if schedTune check the migration falls into (O) region or (SO) region, then it directly bails out. This works well for boost value > 0, which means the boosted task will always migrate for (O) region and will be rejected to migration for (SO) region.
On the other hand, the PE filter formula can handle (O) region and (SO) region well. So we can use PE filter formula to get correct result as well. This is prerequisite for us to open these two regions for checking.
Furthermore, there have another reason for us to check these two regions is for boost < 0. Thinking about if one task with negative boost value, so in this case it hints the task bias to migrate to CPU with lower capacity, for extreme case, like boost = -100, that means scheduler should _ONLY_ take care to reduce capacity rather than power, in this case it may fall into (SO) region.
So we enable these two regions for checking.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 12 ------------ 1 file changed, 12 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index 4bca506..5c1844a 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -155,18 +155,6 @@ schedtune_accept_deltas(int nrg_delta, int cap_delta, struct schedtune *ct; int idx;
- /* Optimal (O) region */ - if (nrg_delta < 0 && cap_delta > 0) { - trace_sched_tune_filter(nrg_delta, cap_delta, 0, 0, 1, 0); - return INT_MAX; - } - - /* Suboptimal (S) region */ - if (nrg_delta > 0 && cap_delta < 0) { - trace_sched_tune_filter(nrg_delta, cap_delta, 0, 0, -1, 5); - return -INT_MAX; - } - /* Get task specific perf Boost/Constraints indexes */ rcu_read_lock(); ct = task_schedtune(task); -- 1.9.1
In current code, PE filter don't support negative boosting and it will directly convert the same case with boost=0.
So this patch is to add PE filter support for negative boosting, when the boosting value is negative, then that means we should rotate left for cut regions; for boosting value = -100, that means the cut regions should rotate totally under X-axis; so finally it will only consider to place task with lower capacity but don't care about any case for energy.
After PE filter support both positive and negative boosting value, it will have more confidence for algorithm integrity.
Signed-off-by: Leo Yan leo.yan@linaro.org --- kernel/sched/tune.c | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index 5c1844a..fef0fc9 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -47,15 +47,34 @@ threshold_gains[] = { { 5, 0 } /* <= 100% */ };
+static struct threshold_params +negative_threshold_gains[] = { + { -1, 5 }, /* < 10% */ + { -2, 5 }, /* < 20% */ + { -3, 5 }, /* < 30% */ + { -4, 5 }, /* < 40% */ + { -5, 5 }, /* < 50% */ + { -5, 4 }, /* < 60% */ + { -5, 3 }, /* < 70% */ + { -5, 2 }, /* < 80% */ + { -5, 1 }, /* < 90% */ + { -5, 0 } /* <= 100% */ +}; + static int __schedtune_accept_deltas(int nrg_delta, int cap_delta, int threshold_idx) { int payoff = -INT_MAX; - int gain_idx = threshold_idx; - - /* Default: reject schedule candidate */ - if (gain_idx == -1) - return payoff; + int gain_idx; + struct threshold_params *gain_table; + + if (threshold_idx > 0) { + gain_idx = threshold_idx; + gain_table = threshold_gains; + } else { + gain_idx = -threshold_idx; + gain_table = negative_threshold_gains; + }
/* * Evaluate "Performance Boost" vs "Energy Increase" @@ -86,8 +105,8 @@ __schedtune_accept_deltas(int nrg_delta, int cap_delta, int threshold_idx) * for both the B and C regions, we can use the same payoff formula * where a positive value represents the accept condition. */ - payoff = cap_delta * threshold_gains[gain_idx].nrg_gain; - payoff -= nrg_delta * threshold_gains[gain_idx].cap_gain; + payoff = cap_delta * gain_table[gain_idx].nrg_gain; + payoff -= nrg_delta * gain_table[gain_idx].cap_gain;
return payoff; } @@ -569,7 +588,7 @@ boost_write(struct cgroup_subsys_state *css, struct cftype *cft, * The current implementatio uses the same cuts for both * B and C regions. */ - idx = clamp(boost_pct, 0, 99) / 10; + idx = clamp(boost_pct, -99, 99) / 10; st->threshold_idx = idx;
st->boost = boost; @@ -751,7 +770,7 @@ sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, * The current implementatio uses the same cuts for both * B and C regions. */ - idx = clamp(boost_pct, 0, 99) / 10; + idx = clamp(boost_pct, -99, 99) / 10; threshold_idx = threshold_idx;
return 0; -- 1.9.1
On 06-Sep 14:13, Leo Yan wrote:
Hi Patrick,
Hi Leo,
This patch series is to refine and enhance schedTune.
Thanks for the posting. I'm currently working on a consolidation and refactoring of SchedTune patches which should also address the main issue we have so fare, i.e. the performance index definition to better support PE-Spate filtering.
I'll have a look at these patches from you as well to see which bits can be eventually merged in this new consolidated patch set.
My goal would be to have a refreshed SchedTune series on hand for the Connect week. I'm targeting v4.4, thus we should be in a good position to run some tests on an HiKey 96Board.
There have mainly two purpose. One purpose is to adjust the range for capacity index so let capacity index and energy index have similiar range between each other. This will help for task to fall into more reasonable PE filter region. This is finished by patch 1.
Another target is to support negative boosting value in PE filter, so schedTune has integrity of algorithm which can support both for positive and negative boosting values. This is finished by patch 2~5.
Please note, this patch set is mainly used for discussion. I have _NOT_ do any testing at my side.
Please do consider that next week I'm out of office, thus I'll do my best to come back to you with some comments on this series the week after.
Cheers Patrick
Leo Yan (5): sched/fair: discount capacity index for PE filter sched/tune: minor fix for gain table sched/tune: polish for PE gain table index sched/tune: open optimal and sub-optimal regions for checking sched/tune: add PE filter support for negative boosting
kernel/sched/fair.c | 10 +++++ kernel/sched/tune.c | 111 +++++++++++++++++++++++----------------------------- 2 files changed, 58 insertions(+), 63 deletions(-)
-- 1.9.1
-- #include <best/regards.h>
Patrick Bellasi
Hi Patrick,
On Wed, Sep 07, 2016 at 06:30:37PM +0100, Patrick Bellasi wrote:
[...]
My goal would be to have a refreshed SchedTune series on hand for the Connect week. I'm targeting v4.4, thus we should be in a good position to run some tests on an HiKey 96Board.
Yeah, I prepared these patches based on AOSP v4.4.
Please note, this patch set is mainly used for discussion. I have _NOT_ do any testing at my side.
Please do consider that next week I'm out of office, thus I'll do my best to come back to you with some comments on this series the week after.
Sure, I will test it on Hikey AOSP in next week :)
Thanks, Leo Yan