This is a series of fixes for big.LITTLE MP.
Patches 1, 2, 3, 6 & 8 are defect fixes. Patches 4 & 5 are to including packing in the TC2 build. Patch 7 is an old patch which accidentally got ommited from LSK.
Contents ---------------------------------------------------------------- [PATCH 1/8] sched: reset blocked load decay_count during [PATCH 2/8] sched: update runqueue clock before migrations away [PATCH 3/8] sched: hmp: Make idle balance behaviour normal when [PATCH 4/8] sched: hmp: Change TC2 packing config to disabled [PATCH 5/8] config: Make packing present on TC2 [PATCH 6/8] sched: hmp: Fix potential task_struct memory leak [PATCH 7/8] HMP: Restrict irq_default_affinity to hmp_slow_cpu_mask [PATCH 8/8] HMP: Fix rt task allowed cpu mask restriction code on
kernel/irq/irqdesc.c | 8 +++++++ kernel/sched/core.c | 5 ++++- kernel/sched/fair.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++--------- linaro/configs/big-LITTLE-MP.conf | 1 + 4 files changed, 69 insertions(+), 11 deletions(-)
From: Chris Redpath chris.redpath@arm.com
If an entity happens to sleep for less than one tick duration the tracked load associated with that entity can be decayed by an unexpectedly large amount if it is later migrated to a different CPU. This can interfere with correct scheduling when entity load is used for decision making.
The reason for this is that when an entity is dequeued and enqueued quickly, such that se.avg.decay_count and cfs_rq.decay_counter do not differ when that entity is enqueued again, __synchronize_entity_decay skips the calculation step and also skips clearing the decay_count. At a later time that entity may be migrated and its load will be decayed incorrectly.
All users of this function expect decay_count to be zero'ed after use.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- kernel/sched/fair.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a771fa6..01cc427 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1460,12 +1460,9 @@ static inline u64 __synchronize_entity_decay(struct sched_entity *se) u64 decays = atomic64_read(&cfs_rq->decay_counter);
decays -= se->avg.decay_count; - if (!decays) - return 0; - - se->avg.load_avg_contrib = decay_load(se->avg.load_avg_contrib, decays); + if (decays) + se->avg.load_avg_contrib = decay_load(se->avg.load_avg_contrib, decays); se->avg.decay_count = 0; - return decays; }
From: Chris Redpath chris.redpath@arm.com
If we migrate a sleeping task away from a CPU which has the tick stopped, then both the clock_task and decay_counter will be out of date for that CPU and we will not decay load correctly regardless of how often we update the blocked load.
This is only an issue for tasks which are not on a runqueue (because otherwise that CPU would be awake) and simultaneously the CPU the task previously ran on has had the tick stopped.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 01cc427..86e82c8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4379,6 +4379,7 @@ unlock: * load-balance). */ #ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu); /* * Called immediately before a task is migrated to a new cpu; task_cpu(p) and * cfs_rq_of(p) references at time of call are still valid and identify the @@ -4390,7 +4391,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) { struct sched_entity *se = &p->se; struct cfs_rq *cfs_rq = cfs_rq_of(se); - /* * Load tracking: accumulate removed load so that it can be processed * when we next update owning cfs_rq under rq->lock. Tasks contribute @@ -4398,6 +4398,25 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) * be negative here since on-rq tasks have decay-count == 0. */ if (se->avg.decay_count) { + /* + * If we migrate a sleeping task away from a CPU + * which has the tick stopped, then both the clock_task + * and decay_counter will be out of date for that CPU + * and we will not decay load correctly. + */ + if (!se->on_rq && nohz_test_cpu(task_cpu(p))) { + struct rq *rq = cpu_rq(task_cpu(p)); + unsigned long flags; + /* + * Current CPU cannot be holding rq->lock in this + * circumstance, but another might be. We must hold + * rq->lock before we go poking around in its clocks + */ + raw_spin_lock_irqsave(&rq->lock, flags); + update_rq_clock(rq); + update_cfs_rq_blocked_load(cfs_rq, 0); + raw_spin_unlock_irqrestore(&rq->lock, flags); + } se->avg.decay_count = -__synchronize_entity_decay(se); atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load); } @@ -6329,6 +6348,17 @@ static struct { atomic_t nr_cpus; unsigned long next_balance; /* in jiffy units */ } nohz ____cacheline_aligned; +/* + * nohz_test_cpu used when load tracking is enabled. FAIR_GROUP_SCHED + * dependency below may be removed when load tracking guards are + * removed. + */ +#ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu) +{ + return cpumask_test_cpu(cpu, nohz.idle_cpus_mask); +} +#endif
#ifdef CONFIG_SCHED_HMP_LITTLE_PACKING /* @@ -6481,6 +6511,18 @@ static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb, return NOTIFY_DONE; } } +#else +/* + * nohz_test_cpu used when load tracking is enabled. FAIR_GROUP_SCHED + * dependency below may be removed when load tracking guards are + * removed. + */ +#ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu) +{ + return 0; +} +#endif #endif
static DEFINE_SPINLOCK(balancing);
On Tue, 2014-01-21 at 13:03 +0000, Jon Medhurst wrote:
From: Chris Redpath chris.redpath@arm.com
If we migrate a sleeping task away from a CPU which has the tick stopped, then both the clock_task and decay_counter will be out of date for that CPU and we will not decay load correctly regardless of how often we update the blocked load.
This is only an issue for tasks which are not on a runqueue (because otherwise that CPU would be awake) and simultaneously the CPU the task previously ran on has had the tick stopped.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org
kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 01cc427..86e82c8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4379,6 +4379,7 @@ unlock:
- load-balance).
*/ #ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu);
This forward declaration is for two versions of the functions defined later, but with the length of the file and all the #if's it's not totally obvious that there aren't combinations of options that will lead to this function not getting defined. A suggestion to make this a bit more robust and clean, would be to define the empty function at this location as well, e.g.
#ifdef CONFIG_NO_HZ_COMMON static int nohz_test_cpu(int cpu); #else static inline int nohz_test_cpu(int cpu) { return 0; } #endif
Then no need to add this dummy version later on.
/*
- Called immediately before a task is migrated to a new cpu; task_cpu(p) and
- cfs_rq_of(p) references at time of call are still valid and identify the
@@ -4390,7 +4391,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) { struct sched_entity *se = &p->se; struct cfs_rq *cfs_rq = cfs_rq_of(se);
A spurious line removal seems to have crept in above.
/* * Load tracking: accumulate removed load so that it can be processed * when we next update owning cfs_rq under rq->lock. Tasks contribute @@ -4398,6 +4398,25 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) * be negative here since on-rq tasks have decay-count == 0. */ if (se->avg.decay_count) {
/*
* If we migrate a sleeping task away from a CPU
* which has the tick stopped, then both the clock_task
* and decay_counter will be out of date for that CPU
* and we will not decay load correctly.
*/
if (!se->on_rq && nohz_test_cpu(task_cpu(p))) {
struct rq *rq = cpu_rq(task_cpu(p));
unsigned long flags;
/*
* Current CPU cannot be holding rq->lock in this
* circumstance, but another might be. We must hold
* rq->lock before we go poking around in its clocks
*/
raw_spin_lock_irqsave(&rq->lock, flags);
update_rq_clock(rq);
update_cfs_rq_blocked_load(cfs_rq, 0);
raw_spin_unlock_irqrestore(&rq->lock, flags);
se->avg.decay_count = -__synchronize_entity_decay(se); atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load); }}
@@ -6329,6 +6348,17 @@ static struct { atomic_t nr_cpus; unsigned long next_balance; /* in jiffy units */ } nohz ____cacheline_aligned;
Nit-pick, there possibly should be a blank line here.
+/*
- nohz_test_cpu used when load tracking is enabled. FAIR_GROUP_SCHED
- dependency below may be removed when load tracking guards are
- removed.#ifdef CONFIG_NO_HZ_COMMON
- */
+#ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu) +{
- return cpumask_test_cpu(cpu, nohz.idle_cpus_mask);
+} +#endif #ifdef CONFIG_SCHED_HMP_LITTLE_PACKING /*
[...]
On 21/01/14 13:48, Jon Medhurst (Tixy) wrote:
On Tue, 2014-01-21 at 13:03 +0000, Jon Medhurst wrote:
From: Chris Redpath chris.redpath@arm.com
If we migrate a sleeping task away from a CPU which has the tick stopped, then both the clock_task and decay_counter will be out of date for that CPU and we will not decay load correctly regardless of how often we update the blocked load.
This is only an issue for tasks which are not on a runqueue (because otherwise that CPU would be awake) and simultaneously the CPU the task previously ran on has had the tick stopped.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org
kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 01cc427..86e82c8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4379,6 +4379,7 @@ unlock:
- load-balance).
*/ #ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu);
This forward declaration is for two versions of the functions defined later, but with the length of the file and all the #if's it's not totally obvious that there aren't combinations of options that will lead to this function not getting defined. A suggestion to make this a bit more robust and clean, would be to define the empty function at this location as well, e.g.
#ifdef CONFIG_NO_HZ_COMMON static int nohz_test_cpu(int cpu); #else static inline int nohz_test_cpu(int cpu) { return 0; } #endif
Then no need to add this dummy version later on.
OK, good spot.
/*
- Called immediately before a task is migrated to a new cpu; task_cpu(p) and
- cfs_rq_of(p) references at time of call are still valid and identify the
@@ -4390,7 +4391,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) { struct sched_entity *se = &p->se; struct cfs_rq *cfs_rq = cfs_rq_of(se);
A spurious line removal seems to have crept in above.
ditto
/* * Load tracking: accumulate removed load so that it can be processed * when we next update owning cfs_rq under rq->lock. Tasks contribute @@ -4398,6 +4398,25 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) * be negative here since on-rq tasks have decay-count == 0. */ if (se->avg.decay_count) {
/*
* If we migrate a sleeping task away from a CPU
* which has the tick stopped, then both the clock_task
* and decay_counter will be out of date for that CPU
* and we will not decay load correctly.
*/
if (!se->on_rq && nohz_test_cpu(task_cpu(p))) {
struct rq *rq = cpu_rq(task_cpu(p));
unsigned long flags;
/*
* Current CPU cannot be holding rq->lock in this
* circumstance, but another might be. We must hold
* rq->lock before we go poking around in its clocks
*/
raw_spin_lock_irqsave(&rq->lock, flags);
update_rq_clock(rq);
update_cfs_rq_blocked_load(cfs_rq, 0);
raw_spin_unlock_irqrestore(&rq->lock, flags);
se->avg.decay_count = -__synchronize_entity_decay(se); atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load); }}
@@ -6329,6 +6348,17 @@ static struct { atomic_t nr_cpus; unsigned long next_balance; /* in jiffy units */ } nohz ____cacheline_aligned;
Nit-pick, there possibly should be a blank line here.
Also, this patch (while agreed to be correct) was clearly flagged as a non-optimal way to fix the issue. There isn't another way right now, but I'm on the hook to provide one there. I have a partial solution but I don't like that much either.
Is it OK to add a revert patch for this to linaro-stable later, once the proper solution is available in mainline?
+/*
- nohz_test_cpu used when load tracking is enabled. FAIR_GROUP_SCHED
- dependency below may be removed when load tracking guards are
- removed.#ifdef CONFIG_NO_HZ_COMMON
- */
+#ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu) +{
- return cpumask_test_cpu(cpu, nohz.idle_cpus_mask);
+} +#endif
#ifdef CONFIG_SCHED_HMP_LITTLE_PACKING /*
[...]
Do you want me to resend this patch updated or will you do it Tixy?
--Chris
On Tue, 2014-01-21 at 13:53 +0000, Chris Redpath wrote: [...]
Also, this patch (while agreed to be correct) was clearly flagged as a non-optimal way to fix the issue. There isn't another way right now, but I'm on the hook to provide one there. I have a partial solution but I don't like that much either.
Is it OK to add a revert patch for this to linaro-stable later, once the proper solution is available in mainline?
I'll leave that to the LSK maintainers to confirm.
Do you want me to resend this patch updated or will you do it Tixy?
I'm happy to update the patch, will follow up to this with modified version...
On Tue, Jan 21, 2014 at 02:20:50PM +0000, Jon Medhurst (Tixy) wrote:
On Tue, 2014-01-21 at 13:53 +0000, Chris Redpath wrote:
Is it OK to add a revert patch for this to linaro-stable later, once the proper solution is available in mainline?
I'll leave that to the LSK maintainers to confirm.
I guess we have to do that, yes.
From: Chris Redpath chris.redpath@arm.com
If we migrate a sleeping task away from a CPU which has the tick stopped, then both the clock_task and decay_counter will be out of date for that CPU and we will not decay load correctly regardless of how often we update the blocked load.
This is only an issue for tasks which are not on a runqueue (because otherwise that CPU would be awake) and simultaneously the CPU the task previously ran on has had the tick stopped.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org ---
This is an updated version of this patch following comments in http://lists.linaro.org/pipermail/linaro-kernel/2014-January/010675.html
kernel/sched/fair.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 01cc427..e3c4d25 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4379,6 +4379,16 @@ unlock: * load-balance). */ #ifdef CONFIG_FAIR_GROUP_SCHED + +#ifdef CONFIG_NO_HZ_COMMON +static int nohz_test_cpu(int cpu); +#else +static inline int nohz_test_cpu(int cpu) +{ + return 0; +} +#endif + /* * Called immediately before a task is migrated to a new cpu; task_cpu(p) and * cfs_rq_of(p) references at time of call are still valid and identify the @@ -4398,6 +4408,25 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) * be negative here since on-rq tasks have decay-count == 0. */ if (se->avg.decay_count) { + /* + * If we migrate a sleeping task away from a CPU + * which has the tick stopped, then both the clock_task + * and decay_counter will be out of date for that CPU + * and we will not decay load correctly. + */ + if (!se->on_rq && nohz_test_cpu(task_cpu(p))) { + struct rq *rq = cpu_rq(task_cpu(p)); + unsigned long flags; + /* + * Current CPU cannot be holding rq->lock in this + * circumstance, but another might be. We must hold + * rq->lock before we go poking around in its clocks + */ + raw_spin_lock_irqsave(&rq->lock, flags); + update_rq_clock(rq); + update_cfs_rq_blocked_load(cfs_rq, 0); + raw_spin_unlock_irqrestore(&rq->lock, flags); + } se->avg.decay_count = -__synchronize_entity_decay(se); atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load); } @@ -6330,6 +6359,18 @@ static struct { unsigned long next_balance; /* in jiffy units */ } nohz ____cacheline_aligned;
+/* + * nohz_test_cpu used when load tracking is enabled. FAIR_GROUP_SCHED + * dependency below may be removed when load tracking guards are + * removed. + */ +#ifdef CONFIG_FAIR_GROUP_SCHED +static int nohz_test_cpu(int cpu) +{ + return cpumask_test_cpu(cpu, nohz.idle_cpus_mask); +} +#endif + #ifdef CONFIG_SCHED_HMP_LITTLE_PACKING /* * Decide if the tasks on the busy CPUs in the
From: Chris Redpath chris.redpath@arm.com
The presence of packing permanently changed the idle balance behaviour. Do not restrict idle balance on the smallest CPUs when packing is present but disabled.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- kernel/sched/fair.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 86e82c8..fb8059c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6372,6 +6372,10 @@ static int hmp_packing_ilb_needed(int cpu) if (!hmp_cpu_is_slowest(cpu)) return 1;
+ /* if disabled, use normal ILB behaviour */ + if (!hmp_packing_enabled) + return 1; + hmp = hmp_cpu_domain(cpu); for_each_cpu_and(cpu, &hmp->cpus, nohz.idle_cpus_mask) { /* only idle balance if a CPU is loaded over threshold */
From: Chris Redpath chris.redpath@arm.com
Since TC2 power curves don't really have a utilisation hotspot where packing makes sense, if it is present for a TC2 system at least make it default to disabled.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fb8059c..a2d2b34 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3709,12 +3709,13 @@ unsigned int hmp_next_up_threshold = 4096; unsigned int hmp_next_down_threshold = 4096;
#ifdef CONFIG_SCHED_HMP_LITTLE_PACKING -unsigned int hmp_packing_enabled = 1; #ifndef CONFIG_ARCH_VEXPRESS_TC2 +unsigned int hmp_packing_enabled = 1; unsigned int hmp_full_threshold = (NICE_0_LOAD * 9) / 8; #else /* TC2 has a sharp consumption curve @ around 800Mhz, so we aim to spread the load around that frequency. */ +unsigned int hmp_packing_enabled; unsigned int hmp_full_threshold = 650; /* 80% of the 800Mhz freq * NICE_0_LOAD */ #endif #endif
From: Chris Redpath chris.redpath@arm.com
The scheduler will default packing to disabled, but this includes the feature so that we can test it more easily.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- linaro/configs/big-LITTLE-MP.conf | 1 + 1 file changed, 1 insertion(+)
diff --git a/linaro/configs/big-LITTLE-MP.conf b/linaro/configs/big-LITTLE-MP.conf index 0bbc603..ced3cf9 100644 --- a/linaro/configs/big-LITTLE-MP.conf +++ b/linaro/configs/big-LITTLE-MP.conf @@ -9,3 +9,4 @@ CONFIG_HMP_FAST_CPU_MASK="" CONFIG_HMP_SLOW_CPU_MASK="" CONFIG_HMP_VARIABLE_SCALE=y CONFIG_HMP_FREQUENCY_INVARIANT_SCALE=y +CONFIG_SCHED_HMP_LITTLE_PACKING=y
From: Chris Redpath chris.redpath@arm.com
We use get_task_struct to increment the ref count on a task_struct so that even if the task dies with a pending migration we are still able to read the memory without causing a fault.
In the case of non-running tasks, we forgot to decrement the ref count when we are done with the task.
Signed-off-by: Chris Redpath chris.redpath@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- kernel/sched/fair.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a2d2b34..e661499 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7061,13 +7061,13 @@ static void hmp_migrate_runnable_task(struct rq *rq) * with the source rq. */ if (src_rq->active_balance) - return; + goto out;
if (src_rq->nr_running <= 1) - return; + goto out;
if (task_rq(p) != src_rq) - return; + goto out; /* * Not sure if this applies here but one can never * be too cautious @@ -7102,6 +7102,8 @@ static void hmp_migrate_runnable_task(struct rq *rq)
rcu_read_unlock(); double_unlock_balance(src_rq, dst_rq); +out: + put_task_struct(p); }
static DEFINE_SPINLOCK(hmp_force_migration);
From: Dietmar Eggemann dietmar.eggemann@arm.com
This patch limits the default affinity mask for all irqs to the cluster of the little cpus.
This patch has the positive side effect that an irq thread which has its IRQTF_RUNTHREAD set inside irq_thread() -> irq_wait_for_interrupt() will not overwrite its struct task_struct->cpus_allowed with a full cpu mask of desc->irq_data.affinity in irq_thread_check_affinity() essentially reverting patch "HMP: experimental: Force all rt tasks to start on little domain." for this irq thread.
Signed-off-by: Dietmar Eggemann dietmar.eggemann@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- kernel/irq/irqdesc.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 473b2b6..20ecfb0 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -36,12 +36,20 @@ static int __init irq_affinity_setup(char *str) } __setup("irqaffinity=", irq_affinity_setup);
+extern struct cpumask hmp_slow_cpu_mask; + static void __init init_irq_default_affinity(void) { #ifdef CONFIG_CPUMASK_OFFSTACK if (!irq_default_affinity) zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT); #endif +#ifdef CONFIG_SCHED_HMP + if (!cpumask_empty(&hmp_slow_cpu_mask)) { + cpumask_copy(irq_default_affinity, &hmp_slow_cpu_mask); + return; + } +#endif if (cpumask_empty(irq_default_affinity)) cpumask_setall(irq_default_affinity); }
From: Dietmar Eggemann dietmar.eggemann@arm.com
There is an error scenario where on a 1x1 HMP system (weight of the hmp_slow_cpu_mask is 1) the short-cut of restricting the allowed cpu mask of an rt tasks leads to triggering a kernel bug in the rt sched class set_cpus_allowed function set_cpus_allowed_rt().
In case the task is on the run-queue and the weight of the required cpu mask is 1 and this is different to the p->nr_cpus_allowed value, this back-end function interprets this in such a way that a task changed from being migratable to not migratable anymore and decrements the rt_nr_migratory counter. There is a BUG_ON(!rq->rt.rt_nr_migratory) check in this code path which triggers in this situation.
To circumvent this issue, set the number of allowed cpus for a task p to the weight of the hmp_slow_cpu_mask before calling do_set_cpus_allowed() in __setscheduler(). It will be set to this value in do_set_cpus_allowed() after the call to the sched class related backend function any way. By doing this, set_cpus_allowed_rt() returns without trying to update the rt_nr_migratory counter.
This patch has been tested with a test device driver requiring a threaded irq handler on a TC2 system with a reduced cpu mask (1 Cortex A15, 1 Cortex A7).
Signed-off-by: Dietmar Eggemann dietmar.eggemann@arm.com Signed-off-by: Jon Medhurst tixy@linaro.org --- kernel/sched/core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 134d815..277e355 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3854,8 +3854,11 @@ __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio) p->sched_class = &rt_sched_class; #ifdef CONFIG_SCHED_HMP if (!cpumask_empty(&hmp_slow_cpu_mask)) - if (cpumask_equal(&p->cpus_allowed, cpu_all_mask)) + if (cpumask_equal(&p->cpus_allowed, cpu_all_mask)) { + p->nr_cpus_allowed = + cpumask_weight(&hmp_slow_cpu_mask); do_set_cpus_allowed(p, &hmp_slow_cpu_mask); + } #endif } else
Hi Mark
If this series is OK for LSK then here is a pull request for these from the 'clean' MP branch. Note, the last 3 commits listed below are additional ones which are already in LSK and add no delta. If you want me to supply the patches in a different way, please let me know.
The following changes since commit 9a0758156e5f7f2f609617eb342e476378ef63f2:
sched: hmp: Fix build breakage when not using CONFIG_SCHED_HMP (2013-11-22 14:15:38 +0000)
are available in the git repository at:
git://git.linaro.org/arm/big.LITTLE/mp.git for-lsk
for you to fetch changes up to 3c5d96e49313fdc1d663bae6f6baf926ae221f82:
HMP: Fix rt task allowed cpu mask restriction code on 1x1 system (2014-01-21 15:17:52 +0000)
---------------------------------------------------------------- Chris Redpath (6): sched: reset blocked load decay_count during synchronization sched: update runqueue clock before migrations away sched: hmp: Make idle balance behaviour normal when packing disabled sched: hmp: Change TC2 packing config to disabled default if present config: Make packing present on TC2 sched: hmp: Fix potential task_struct memory leak
Dietmar Eggemann (2): HMP: Restrict irq_default_affinity to hmp_slow_cpu_mask HMP: Fix rt task allowed cpu mask restriction code on 1x1 system
Jon Medhurst (1): config: Add config fragments for big LITTLE MP
Kamalesh Babulal (1): sched/debug: Add load-tracking statistics to task
Thomas Gleixner (1): genirq: Add default affinity mask command line option
Documentation/kernel-parameters.txt | 9 ++++++++ kernel/irq/irqdesc.c | 29 +++++++++++++++++++++++-- kernel/sched/core.c | 5 ++++- kernel/sched/debug.c | 6 ++++++ kernel/sched/fair.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++-------- linaro/configs/big-LITTLE-MP.conf | 12 +++++++++++ 6 files changed, 112 insertions(+), 12 deletions(-) create mode 100644 linaro/configs/big-LITTLE-MP.conf tixy@linaro1:/data/src/linux$
On Tue, Jan 21, 2014 at 01:03:27PM +0000, Jon Medhurst wrote:
This is a series of fixes for big.LITTLE MP.
Patches 1, 2, 3, 6 & 8 are defect fixes. Patches 4 & 5 are to including packing in the TC2 build. Patch 7 is an old patch which accidentally got ommited from LSK.
These all look good to me now, Alex? IIRC one of them was a fix already present upstream, if that is the case please put the upstream commit ID in the changelog (cherry-pick -xs of the original commit will DTRT and is probably easiest).
On Tue, 2014-01-21 at 15:47 +0000, Mark Brown wrote:
On Tue, Jan 21, 2014 at 01:03:27PM +0000, Jon Medhurst wrote:
This is a series of fixes for big.LITTLE MP.
Patches 1, 2, 3, 6 & 8 are defect fixes. Patches 4 & 5 are to including packing in the TC2 build. Patch 7 is an old patch which accidentally got ommited from LSK.
These all look good to me now, Alex? IIRC one of them was a fix already present upstream,
Not that I can see for any of these 8 patches. There was a 9th patch which Chirs originally sent but that was already in LSK. Looking at how that happened I can now see that Alex added it [1] along with another one [2]. So I've now got to decide if that second one is an 'MP' patch as well, this is all getting even more difficult to keep track of and maintain.
[1] Commit 916ff46934c350d6dee3c02da62950539559922c [2] Commit f0862cb749c537c32de82912d30110098ddc6cd6
On 01/22/2014 12:53 AM, Jon Medhurst (Tixy) wrote:
On Tue, 2014-01-21 at 15:47 +0000, Mark Brown wrote:
On Tue, Jan 21, 2014 at 01:03:27PM +0000, Jon Medhurst wrote:
This is a series of fixes for big.LITTLE MP.
Patches 1, 2, 3, 6 & 8 are defect fixes. Patches 4 & 5 are to including packing in the TC2 build. Patch 7 is an old patch which accidentally got ommited from LSK.
These all look good to me now, Alex? IIRC one of them was a fix already present upstream,
All looks fine for me on for-lsk branch of mp tree.
Not that I can see for any of these 8 patches. There was a 9th patch which Chirs originally sent but that was already in LSK. Looking at how that happened I can now see that Alex added it [1] along with another one [2]. So I've now got to decide if that second one is an 'MP' patch as well, this is all getting even more difficult to keep track of and maintain.
yes, a bit, when LSK has many input resources. like just from ARM, I have 3 input tree. :)
[remote "tixy"] url = git://git.linaro.org/people/tixy/kernel.git fetch = +refs/heads/*:refs/remotes/tixy/* [remote "landing-arm"] url = git://git.linaro.org/landing-teams/working/arm/kernel.git fetch = +refs/heads/*:refs/remotes/landing-arm/* [remote "mp"] url = git://git.linaro.org/arm/big.LITTLE/mp.git
[1] Commit 916ff46934c350d6dee3c02da62950539559922c [2] Commit f0862cb749c537c32de82912d30110098ddc6cd6
On Wed, 2014-01-22 at 10:36 +0800, Alex Shi wrote:
On 01/22/2014 12:53 AM, Jon Medhurst (Tixy) wrote:
[..]
Looking at how
that happened I can now see that Alex added it [1] along with another one [2]. So I've now got to decide if that second one is an 'MP' patch as well, this is all getting even more difficult to keep track of and maintain.
yes, a bit, when LSK has many input resources. like just from ARM, I have 3 input tree. :)
[remote "tixy"] url = git://git.linaro.org/people/tixy/kernel.git fetch = +refs/heads/*:refs/remotes/tixy/* [remote "landing-arm"] url = git://git.linaro.org/landing-teams/working/arm/kernel.git fetch = +refs/heads/*:refs/remotes/landing-arm/* [remote "mp"] url = git://git.linaro.org/arm/big.LITTLE/mp.git
Yes, but if I have updates for LSK I inform you and you pull them, you don't have to monitor any branches on the off chance I've added something. Whereas to maintain the MP patchset topic I have to go hunting for anything that someone might have added to LSK and cherry-pick it. I guess that's not a problem if all those patches always go into the same place, but it sure doesn't help keep an understandable git history.
On 01/22/2014 05:26 PM, Jon Medhurst (Tixy) wrote:
On Wed, 2014-01-22 at 10:36 +0800, Alex Shi wrote:
On 01/22/2014 12:53 AM, Jon Medhurst (Tixy) wrote:
[..]
Looking at how
that happened I can now see that Alex added it [1] along with another one [2]. So I've now got to decide if that second one is an 'MP' patch as well, this is all getting even more difficult to keep track of and maintain.
yes, a bit, when LSK has many input resources. like just from ARM, I have 3 input tree. :)
[remote "tixy"] url = git://git.linaro.org/people/tixy/kernel.git fetch = +refs/heads/*:refs/remotes/tixy/* [remote "landing-arm"] url = git://git.linaro.org/landing-teams/working/arm/kernel.git fetch = +refs/heads/*:refs/remotes/landing-arm/* [remote "mp"] url = git://git.linaro.org/arm/big.LITTLE/mp.git
Yes, but if I have updates for LSK I inform you and you pull them, you don't have to monitor any branches on the off chance I've added something. Whereas to maintain the MP patchset topic I have to go hunting for anything that someone might have added to LSK and cherry-pick it. I guess that's not a problem if all those patches always go into the same place, but it sure doesn't help keep an understandable git history.
Tixy, I appreciate what you did for lsk!
BTW, Why we need to keep every topic branch fast forward only? I mean Even in upstream kernel, important tree, like tip/linux-next, are all rebase often.
On 22 January 2014 09:42, Alex Shi alex.shi@linaro.org wrote:
BTW, Why we need to keep every topic branch fast forward only? I mean Even in upstream kernel, important tree, like tip/linux-next, are all rebase often.
We don't, but once things have been merged into the LSK itself we need to keep that fast forward only which in turn means that it's at the very least much easier to keep the branches that feed into it fast forward only. If something has only been in -test or not even merged into that then it's definitely OK to just rebase.
On 01/22/2014 07:09 PM, Mark Brown wrote:
On 22 January 2014 09:42, Alex Shi <alex.shi@linaro.org mailto:alex.shi@linaro.org> wrote:
BTW, Why we need to keep every topic branch fast forward only? I mean Even in upstream kernel, important tree, like tip/linux-next, are all rebase often.
We don't, but once things have been merged into the LSK itself we need to keep that fast forward only which in turn means that it's at the very least much easier to keep the branches that feed into it fast forward only. If something has only been in -test or not even merged into that then it's definitely OK to just rebase.
Thanks for nice explanation! It's clear LSK need keep fast forward only.
I mean LSK has many input resources tree, and if every resource like to maintain a separate and functional tree, while they also cherry pick the same patch, then when revert needed, that is a bit ugly. and that's the Tixy's meaning a bit difficult to maintain. If it is a issue, I want to know why we need keep all input resource trees fast forward only. Do there have specific requests for the fast forward?
On 22 January 2014 14:39, Alex Shi alex.shi@linaro.org wrote:
I mean LSK has many input resources tree, and if every resource like to maintain a separate and functional tree, while they also cherry pick the same patch, then when revert needed, that is a bit ugly. and that's the Tixy's meaning a bit difficult to maintain. If it is a issue, I want to know why we need keep all input resource trees fast forward only. Do there have specific requests for the fast forward?
It makes life a lot easier making sure the right thing happens when things are merged into the LSK itself if the inputs themselves are fast forward only too (at least the versions of them that are merged into the LSK) and it makes the history of the LSK clearer since otherwise we get multiple copies of the same patch appearing in history (as has happened for the big.LITTLE branch).
On Tue, 2014-01-21 at 16:53 +0000, Jon Medhurst (Tixy) wrote:
On Tue, 2014-01-21 at 15:47 +0000, Mark Brown wrote:
On Tue, Jan 21, 2014 at 01:03:27PM +0000, Jon Medhurst wrote:
This is a series of fixes for big.LITTLE MP.
Patches 1, 2, 3, 6 & 8 are defect fixes. Patches 4 & 5 are to including packing in the TC2 build. Patch 7 is an old patch which accidentally got ommited from LSK.
These all look good to me now, Alex? IIRC one of them was a fix already present upstream,
Not that I can see for any of these 8 patches. There was a 9th patch which Chirs originally sent but that was already in LSK. Looking at how that happened I can now see that Alex added it [1] along with another one [2]. So I've now got to decide if that second one is an 'MP' patch as well, this is all getting even more difficult to keep track of and maintain.
[1] Commit 916ff46934c350d6dee3c02da62950539559922c [2] Commit f0862cb749c537c32de82912d30110098ddc6cd6
Seems that people are talking about reverting [2] from upstream, see http://lists.linaro.org/pipermail/linaro-kernel/2014-January/010694.html
So perhaps we should revert it in LSK too?
linaro-kernel@lists.linaro.org