Hi Tixy, Mark & Alex,
These are the candidate patches for the next 3.10 LSK from us as mentioned by Basil.
Best Regards, Chris
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches.
Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern.
Signed-off-by: Chris Redpath chris.redpath@arm.com --- include/linux/sched.h | 4 ++++ kernel/sched/core.c | 6 +++--- kernel/sched/fair.c | 2 +- 3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index a3c8b27..f862feb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -950,6 +950,10 @@ struct sched_avg { u32 usage_avg_sum; };
+#ifdef CONFIG_SCHED_HMP +#define hmp_task_should_forkboost(task) ((task->parent && task->parent->pid > 2)) +#endif + #ifdef CONFIG_SCHEDSTATS struct sched_statistics { u64 wait_start; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c481b49..9408d23 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1635,9 +1635,9 @@ static void __sched_fork(struct task_struct *p) #ifdef CONFIG_SCHED_HMP /* keep LOAD_AVG_MAX in sync with fair.c if load avg series is changed */ #define LOAD_AVG_MAX 47742 - if (p->mm) { - p->se.avg.hmp_last_up_migration = 0; - p->se.avg.hmp_last_down_migration = 0; + p->se.avg.hmp_last_up_migration = 0; + p->se.avg.hmp_last_down_migration = 0; + if (hmp_task_should_forkboost(p)) { p->se.avg.load_avg_ratio = 1023; p->se.avg.load_avg_contrib = (1023 * scale_load_down(p->se.load.weight)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 97ed132..90c8a81 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4385,7 +4385,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
#ifdef CONFIG_SCHED_HMP /* always put non-kernel forking tasks on a big domain */ - if (p->mm && (sd_flag & SD_BALANCE_FORK)) { + if (unlikely(sd_flag & SD_BALANCE_FORK) && hmp_task_should_forkboost(p)) { new_cpu = hmp_select_faster_cpu(p, prev_cpu); if (new_cpu != NR_CPUS) { hmp_next_up_delay(&p->se, new_cpu);
On Tue, 2014-08-12 at 14:50 +0100, Chris Redpath wrote:
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches.
Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern.
Signed-off-by: Chris Redpath chris.redpath@arm.com
This is the same as the patch from June and looks OK to me. Though at the time I said
I wondered why '2', when init is PID 1. So I ran 'ps' on PC and on ARM board running Android and saw 2 is kthreadd, so '2' is correct, but can't help but think it would be nice to have a comment to that affect on the above #define, so, OK if I add this?
/* * We want to avoid boosting any processes forked from init (PID 1) * and kthreadd (assumed to be PID 2). */
And you replied 'Yes, absolutely fine by me'.
So I'll do that this time :-)
On 08/12/2014 09:50 PM, Chris Redpath wrote:
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches.
Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern.
It looks ok for me.
But in my mind, the service task can be moved to little cpu in late running, isn't it?
Signed-off-by: Chris Redpath chris.redpath@arm.com
On 14/08/14 03:57, Alex Shi wrote:
On 08/12/2014 09:50 PM, Chris Redpath wrote:
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches.
Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern.
It looks ok for me.
But in my mind, the service task can be moved to little cpu in late running, isn't it?
Yes, the issue is that some services set up their timers early in their lifetime, as called out in the description. When the task is running on a big CPU at the time, the timer is allocated there and tends to stay there unless we hotplug that CPU later and force those timers to be moved.
All we do here is to not give those tasks the same fork-boost that we give user-initiated tasks. They will still move big/little according to computational load as all other tasks do.
Signed-off-by: Chris Redpath chris.redpath@arm.com
Frequently in HMP, the big CPUs are only active with one task per CPU and there may be idle CPUs in the big cluster. This patch avoids triggering an idle balance in situations where none of the active CPUs in the current HMP domain have > 1 tasks running.
When packing is enabled, only enforce this behaviour when we are not in the smallest domain - there we idle balance whenever a CPU is over the up_threshold regardless of tasks in case one needs to be moved.
Signed-off-by: Chris Redpath chris.redpath@arm.com --- kernel/sched/fair.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 90c8a81..41d0cbd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6537,16 +6537,16 @@ static int nohz_test_cpu(int cpu) * Decide if the tasks on the busy CPUs in the * littlest domain would benefit from an idle balance */ -static int hmp_packing_ilb_needed(int cpu) +static int hmp_packing_ilb_needed(int cpu, int ilb_needed) { struct hmp_domain *hmp; - /* always allow ilb on non-slowest domain */ + /* allow previous decision on non-slowest domain */ if (!hmp_cpu_is_slowest(cpu)) - return 1; + return ilb_needed;
/* if disabled, use normal ILB behaviour */ if (!hmp_packing_enabled) - return 1; + return ilb_needed;
hmp = hmp_cpu_domain(cpu); for_each_cpu_and(cpu, &hmp->cpus, nohz.idle_cpus_mask) { @@ -6558,19 +6558,34 @@ static int hmp_packing_ilb_needed(int cpu) } #endif
+DEFINE_PER_CPU(cpumask_var_t, ilb_tmpmask); + static inline int find_new_ilb(int call_cpu) { int ilb = cpumask_first(nohz.idle_cpus_mask); #ifdef CONFIG_SCHED_HMP - int ilb_needed = 1; + int ilb_needed = 0; + int cpu; + struct cpumask* tmp = per_cpu(ilb_tmpmask, smp_processor_id());
/* restrict nohz balancing to occur in the same hmp domain */ ilb = cpumask_first_and(nohz.idle_cpus_mask, &((struct hmp_domain *)hmp_cpu_domain(call_cpu))->cpus);
+ /* check to see if it's necessary within this domain */ + cpumask_andnot(tmp, + &((struct hmp_domain *)hmp_cpu_domain(call_cpu))->cpus, + nohz.idle_cpus_mask); + for_each_cpu(cpu, tmp) { + if (cpu_rq(cpu)->nr_running > 1) { + ilb_needed = 1; + break; + } + } + #ifdef CONFIG_SCHED_HMP_LITTLE_PACKING if (ilb < nr_cpu_ids) - ilb_needed = hmp_packing_ilb_needed(ilb); + ilb_needed = hmp_packing_ilb_needed(ilb, ilb_needed); #endif
if (ilb_needed && ilb < nr_cpu_ids && idle_cpu(ilb))
On Tue, 2014-08-12 at 14:50 +0100, Chris Redpath wrote:
Frequently in HMP, the big CPUs are only active with one task per CPU and there may be idle CPUs in the big cluster. This patch avoids triggering an idle balance in situations where none of the active CPUs in the current HMP domain have > 1 tasks running.
When packing is enabled, only enforce this behaviour when we are not in the smallest domain - there we idle balance whenever a CPU is over the up_threshold regardless of tasks in case one needs to be moved.
Signed-off-by: Chris Redpath chris.redpath@arm.com
This looks sane to me, though I have one comment about the implementation, see inline comment below.
kernel/sched/fair.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 90c8a81..41d0cbd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6537,16 +6537,16 @@ static int nohz_test_cpu(int cpu)
- Decide if the tasks on the busy CPUs in the
- littlest domain would benefit from an idle balance
*/ -static int hmp_packing_ilb_needed(int cpu) +static int hmp_packing_ilb_needed(int cpu, int ilb_needed) { struct hmp_domain *hmp;
- /* always allow ilb on non-slowest domain */
- /* allow previous decision on non-slowest domain */ if (!hmp_cpu_is_slowest(cpu))
return 1;
return ilb_needed;
/* if disabled, use normal ILB behaviour */ if (!hmp_packing_enabled)
return 1;
return ilb_needed;
hmp = hmp_cpu_domain(cpu); for_each_cpu_and(cpu, &hmp->cpus, nohz.idle_cpus_mask) { @@ -6558,19 +6558,34 @@ static int hmp_packing_ilb_needed(int cpu) } #endif +DEFINE_PER_CPU(cpumask_var_t, ilb_tmpmask);
static inline int find_new_ilb(int call_cpu) { int ilb = cpumask_first(nohz.idle_cpus_mask); #ifdef CONFIG_SCHED_HMP
- int ilb_needed = 1;
- int ilb_needed = 0;
- int cpu;
- struct cpumask* tmp = per_cpu(ilb_tmpmask, smp_processor_id());
Why do we need a percpu static variable ilb_tmpmask? It seems to only be used once, here in this function, so could we not instead just have a local stack based temporary variable like:
struct cpumask tmp;
or have I missed something?
/* restrict nohz balancing to occur in the same hmp domain */ ilb = cpumask_first_and(nohz.idle_cpus_mask, &((struct hmp_domain *)hmp_cpu_domain(call_cpu))->cpus);
- /* check to see if it's necessary within this domain */
- cpumask_andnot(tmp,
&((struct hmp_domain *)hmp_cpu_domain(call_cpu))->cpus,
nohz.idle_cpus_mask);
- for_each_cpu(cpu, tmp) {
if (cpu_rq(cpu)->nr_running > 1) {
ilb_needed = 1;
break;
}
- }
#ifdef CONFIG_SCHED_HMP_LITTLE_PACKING if (ilb < nr_cpu_ids)
ilb_needed = hmp_packing_ilb_needed(ilb);
ilb_needed = hmp_packing_ilb_needed(ilb, ilb_needed);
#endif if (ilb_needed && ilb < nr_cpu_ids && idle_cpu(ilb))
On 12/08/14 17:26, Jon Medhurst (Tixy) wrote:
On Tue, 2014-08-12 at 14:50 +0100, Chris Redpath wrote:
Frequently in HMP, the big CPUs are only active with one task per CPU and there may be idle CPUs in the big cluster. This patch avoids triggering an idle balance in situations where none of the active CPUs in the current HMP domain have > 1 tasks running.
When packing is enabled, only enforce this behaviour when we are not in the smallest domain - there we idle balance whenever a CPU is over the up_threshold regardless of tasks in case one needs to be moved.
Signed-off-by: Chris Redpath chris.redpath@arm.com
This looks sane to me, though I have one comment about the implementation, see inline comment below.
kernel/sched/fair.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 90c8a81..41d0cbd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6537,16 +6537,16 @@ static int nohz_test_cpu(int cpu)
- Decide if the tasks on the busy CPUs in the
- littlest domain would benefit from an idle balance
*/ -static int hmp_packing_ilb_needed(int cpu) +static int hmp_packing_ilb_needed(int cpu, int ilb_needed) { struct hmp_domain *hmp;
- /* always allow ilb on non-slowest domain */
- /* allow previous decision on non-slowest domain */ if (!hmp_cpu_is_slowest(cpu))
return 1;
return ilb_needed;
/* if disabled, use normal ILB behaviour */ if (!hmp_packing_enabled)
return 1;
return ilb_needed;
hmp = hmp_cpu_domain(cpu); for_each_cpu_and(cpu, &hmp->cpus, nohz.idle_cpus_mask) {
@@ -6558,19 +6558,34 @@ static int hmp_packing_ilb_needed(int cpu) } #endif
+DEFINE_PER_CPU(cpumask_var_t, ilb_tmpmask);
- static inline int find_new_ilb(int call_cpu) { int ilb = cpumask_first(nohz.idle_cpus_mask); #ifdef CONFIG_SCHED_HMP
- int ilb_needed = 1;
- int ilb_needed = 0;
- int cpu;
- struct cpumask* tmp = per_cpu(ilb_tmpmask, smp_processor_id());
Why do we need a percpu static variable ilb_tmpmask? It seems to only be used once, here in this function, so could we not instead just have a local stack based temporary variable like:
struct cpumask tmp;
or have I missed something?
We could do that, but this is called during sched tick so I wanted to avoid creating the cpumask on the stack. Do you think that's a reasonable thing to do or do you think it'd be just as quick as calling smp_processor_id()?
The per-cpu variable stems from having a cached mask and not wanting to share it during ticks.
/* restrict nohz balancing to occur in the same hmp domain */ ilb = cpumask_first_and(nohz.idle_cpus_mask, &((struct hmp_domain *)hmp_cpu_domain(call_cpu))->cpus);
- /* check to see if it's necessary within this domain */
- cpumask_andnot(tmp,
&((struct hmp_domain *)hmp_cpu_domain(call_cpu))->cpus,
nohz.idle_cpus_mask);
- for_each_cpu(cpu, tmp) {
if (cpu_rq(cpu)->nr_running > 1) {
ilb_needed = 1;
break;
}
- }
- #ifdef CONFIG_SCHED_HMP_LITTLE_PACKING if (ilb < nr_cpu_ids)
ilb_needed = hmp_packing_ilb_needed(ilb);
ilb_needed = hmp_packing_ilb_needed(ilb, ilb_needed);
#endif
if (ilb_needed && ilb < nr_cpu_ids && idle_cpu(ilb))
On Wed, 2014-08-13 at 10:09 +0100, Chris Redpath wrote:
On 12/08/14 17:26, Jon Medhurst (Tixy) wrote:
On Tue, 2014-08-12 at 14:50 +0100, Chris Redpath wrote:
Frequently in HMP, the big CPUs are only active with one task per CPU and there may be idle CPUs in the big cluster. This patch avoids triggering an idle balance in situations where none of the active CPUs in the current HMP domain have > 1 tasks running.
When packing is enabled, only enforce this behaviour when we are not in the smallest domain - there we idle balance whenever a CPU is over the up_threshold regardless of tasks in case one needs to be moved.
Signed-off-by: Chris Redpath chris.redpath@arm.com
This looks sane to me, though I have one comment about the implementation, see inline comment below.
kernel/sched/fair.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 90c8a81..41d0cbd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6537,16 +6537,16 @@ static int nohz_test_cpu(int cpu)
- Decide if the tasks on the busy CPUs in the
- littlest domain would benefit from an idle balance
*/ -static int hmp_packing_ilb_needed(int cpu) +static int hmp_packing_ilb_needed(int cpu, int ilb_needed) { struct hmp_domain *hmp;
- /* always allow ilb on non-slowest domain */
- /* allow previous decision on non-slowest domain */ if (!hmp_cpu_is_slowest(cpu))
return 1;
return ilb_needed;
/* if disabled, use normal ILB behaviour */ if (!hmp_packing_enabled)
return 1;
return ilb_needed;
hmp = hmp_cpu_domain(cpu); for_each_cpu_and(cpu, &hmp->cpus, nohz.idle_cpus_mask) {
@@ -6558,19 +6558,34 @@ static int hmp_packing_ilb_needed(int cpu) } #endif
+DEFINE_PER_CPU(cpumask_var_t, ilb_tmpmask);
- static inline int find_new_ilb(int call_cpu) { int ilb = cpumask_first(nohz.idle_cpus_mask); #ifdef CONFIG_SCHED_HMP
- int ilb_needed = 1;
- int ilb_needed = 0;
- int cpu;
- struct cpumask* tmp = per_cpu(ilb_tmpmask, smp_processor_id());
Why do we need a percpu static variable ilb_tmpmask? It seems to only be used once, here in this function, so could we not instead just have a local stack based temporary variable like:
struct cpumask tmp;
or have I missed something?
We could do that, but this is called during sched tick so I wanted to avoid creating the cpumask on the stack. Do you think that's a reasonable thing to do or do you think it'd be just as quick as calling smp_processor_id()?
Well, depends if you think HMP will be being used on enormous supercluster with gazillions of cores. I thought it's just being used for mobile devices running Android? In which case, presumably those kernels are compiled with NR_CPUS <= 32 cpus and cpumask will be a single 32bit word and probably won't use any extra stack compared to what you have. There again, in the scheme of things, it's probably doesn't make much difference, so might as well stick with what you have.
The per-cpu variable stems from having a cached mask and not wanting to share it during ticks.
You say 'cached' mask which makes me think that the value is being saved for use later, but its not is it, its a temporary value (the name 'tmp' is a clue) that is being calculated in find_new_ilb() and used immediately, and nothing else then looks at the calculated value again (it will get overwritten by the next call to find_new_ilb() on by that cpu).
Anyway, I can see nothing wrong with the patch, and if no-one else raises issues we can ask for it to be added to LSK once ARM have finished their testing and give me the go-ahead...
linaro-kernel@lists.linaro.org