Hi Tixy,
I have 2 new patches for you. Patch 1 is a bug I found on a platform where all the little CPUs can be unplugged. Patch 2 is one that got missed during the initial LSK population. This patch stops us giving Android services (started by init) an initial boost to an A15 and lets them be scheduled as normal given the tracked load. The main reason for this is so that when they allocate timers etc, they are allocated on little CPUs.
Basil will send our test results out separately.
Best Regards, Chris
If someone hotplugs all the little CPUs while another CPU is handling a wakeup, we can potentially return new_cpu == NR_CPUS from hmp_select_slower_cpu (which is called internally by hmp_best_little_cpu as well). We will use this to deref the per_cpu rq array in hmp_next_down_delay which can go boom.
Signed-off-by: Chris Redpath chris.redpath@arm.com --- kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 71da724..483dee8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4483,7 +4483,11 @@ unlock: #else new_cpu = hmp_select_slower_cpu(p, prev_cpu); #endif - if (new_cpu != prev_cpu) { + /* + * we might have no suitable CPU + * in which case new_cpu == NR_CPUS + */ + if (new_cpu < NR_CPUS && new_cpu != prev_cpu) { hmp_next_down_delay(&p->se, new_cpu); trace_sched_hmp_migrate(p, new_cpu, HMP_MIGRATE_WAKEUP); return new_cpu;
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches.
Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern.
Signed-off-by: Chris Redpath chris.redpath@arm.com --- include/linux/sched.h | 4 ++++ kernel/sched/core.c | 6 +++--- kernel/sched/fair.c | 2 +- 3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index a3c8b27..f862feb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -950,6 +950,10 @@ struct sched_avg { u32 usage_avg_sum; };
+#ifdef CONFIG_SCHED_HMP +#define hmp_task_should_forkboost(task) ((task->parent && task->parent->pid > 2)) +#endif + #ifdef CONFIG_SCHEDSTATS struct sched_statistics { u64 wait_start; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f583951..c327e57 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1635,9 +1635,9 @@ static void __sched_fork(struct task_struct *p) #ifdef CONFIG_SCHED_HMP /* keep LOAD_AVG_MAX in sync with fair.c if load avg series is changed */ #define LOAD_AVG_MAX 47742 - if (p->mm) { - p->se.avg.hmp_last_up_migration = 0; - p->se.avg.hmp_last_down_migration = 0; + p->se.avg.hmp_last_up_migration = 0; + p->se.avg.hmp_last_down_migration = 0; + if (hmp_task_should_forkboost(p)) { p->se.avg.load_avg_ratio = 1023; p->se.avg.load_avg_contrib = (1023 * scale_load_down(p->se.load.weight)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 483dee8..81b5ef9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4384,7 +4384,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
#ifdef CONFIG_SCHED_HMP /* always put non-kernel forking tasks on a big domain */ - if (p->mm && (sd_flag & SD_BALANCE_FORK)) { + if (unlikely(sd_flag & SD_BALANCE_FORK) && hmp_task_should_forkboost(p)) { new_cpu = hmp_select_faster_cpu(p, prev_cpu); if (new_cpu != NR_CPUS) { hmp_next_up_delay(&p->se, new_cpu);
On Wed, 2014-06-11 at 14:08 +0100, Chris Redpath wrote:
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches.
Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern.
Signed-off-by: Chris Redpath chris.redpath@arm.com
include/linux/sched.h | 4 ++++ kernel/sched/core.c | 6 +++--- kernel/sched/fair.c | 2 +- 3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index a3c8b27..f862feb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -950,6 +950,10 @@ struct sched_avg { u32 usage_avg_sum; }; +#ifdef CONFIG_SCHED_HMP +#define hmp_task_should_forkboost(task) ((task->parent && task->parent->pid > 2))
I wondered why '2', when init is PID 1. So I ran 'ps' on PC and on ARM board running Android and saw 2 is kthreadd, so '2' is correct, but can't help but think it would be nice to have a comment to that affect on the above #define, so, OK if I add this?
/* * We want to avoid boosting any processes forked from init (PID 1) * and kthreadd (assumed to be PID 2). */
+#endif
#ifdef CONFIG_SCHEDSTATS struct sched_statistics { u64 wait_start; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f583951..c327e57 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1635,9 +1635,9 @@ static void __sched_fork(struct task_struct *p) #ifdef CONFIG_SCHED_HMP /* keep LOAD_AVG_MAX in sync with fair.c if load avg series is changed */ #define LOAD_AVG_MAX 47742
- if (p->mm) {
p->se.avg.hmp_last_up_migration = 0;
p->se.avg.hmp_last_down_migration = 0;
- p->se.avg.hmp_last_up_migration = 0;
- p->se.avg.hmp_last_down_migration = 0;
The above change means that these values are now set to zero for kernel threads whereas before they were left uninitialised. That seems like the right thing, but wondered if that meant there was a bug before?
- if (hmp_task_should_forkboost(p)) { p->se.avg.load_avg_ratio = 1023; p->se.avg.load_avg_contrib = (1023 * scale_load_down(p->se.load.weight));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 483dee8..81b5ef9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4384,7 +4384,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) #ifdef CONFIG_SCHED_HMP /* always put non-kernel forking tasks on a big domain */
The above comment is completely true any more, how about we just drop it, as the if statement is now pretty self explanatory?
- if (p->mm && (sd_flag & SD_BALANCE_FORK)) {
- if (unlikely(sd_flag & SD_BALANCE_FORK) && hmp_task_should_forkboost(p)) { new_cpu = hmp_select_faster_cpu(p, prev_cpu); if (new_cpu != NR_CPUS) { hmp_next_up_delay(&p->se, new_cpu);
On 11/06/14 14:52, Jon Medhurst (Tixy) wrote:
On Wed, 2014-06-11 at 14:08 +0100, Chris Redpath wrote:
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches.
Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern.
Signed-off-by: Chris Redpath chris.redpath@arm.com
include/linux/sched.h | 4 ++++ kernel/sched/core.c | 6 +++--- kernel/sched/fair.c | 2 +- 3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index a3c8b27..f862feb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -950,6 +950,10 @@ struct sched_avg { u32 usage_avg_sum; };
+#ifdef CONFIG_SCHED_HMP +#define hmp_task_should_forkboost(task) ((task->parent && task->parent->pid > 2))
I wondered why '2', when init is PID 1. So I ran 'ps' on PC and on ARM board running Android and saw 2 is kthreadd, so '2' is correct, but can't help but think it would be nice to have a comment to that affect on the above #define, so, OK if I add this?
Yes, absolutely fine by me.
/*
- We want to avoid boosting any processes forked from init (PID 1)
- and kthreadd (assumed to be PID 2).
*/
+#endif
- #ifdef CONFIG_SCHEDSTATS struct sched_statistics { u64 wait_start;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f583951..c327e57 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1635,9 +1635,9 @@ static void __sched_fork(struct task_struct *p) #ifdef CONFIG_SCHED_HMP /* keep LOAD_AVG_MAX in sync with fair.c if load avg series is changed */ #define LOAD_AVG_MAX 47742
- if (p->mm) {
p->se.avg.hmp_last_up_migration = 0;
p->se.avg.hmp_last_down_migration = 0;
- p->se.avg.hmp_last_up_migration = 0;
- p->se.avg.hmp_last_down_migration = 0;
The above change means that these values are now set to zero for kernel threads whereas before they were left uninitialised. That seems like the right thing, but wondered if that meant there was a bug before?
I think it should be zero-init anyway but I haven't checked.
- if (hmp_task_should_forkboost(p)) { p->se.avg.load_avg_ratio = 1023; p->se.avg.load_avg_contrib = (1023 * scale_load_down(p->se.load.weight));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 483dee8..81b5ef9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4384,7 +4384,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
#ifdef CONFIG_SCHED_HMP /* always put non-kernel forking tasks on a big domain */
The above comment is completely true any more, how about we just drop it, as the if statement is now pretty self explanatory?
Agreed.
- if (p->mm && (sd_flag & SD_BALANCE_FORK)) {
- if (unlikely(sd_flag & SD_BALANCE_FORK) && hmp_task_should_forkboost(p)) { new_cpu = hmp_select_faster_cpu(p, prev_cpu); if (new_cpu != NR_CPUS) { hmp_next_up_delay(&p->se, new_cpu);
--Chris
On Wed, 2014-06-11 at 14:08 +0100, Chris Redpath wrote:
Hi Tixy,
I have 2 new patches for you. Patch 1 is a bug I found on a platform where all the little CPUs can be unplugged. Patch 2 is one that got missed during the initial LSK population.
Thanks, they looks straight forward enough to me. I just made several minor comments on the second one; mostly comments about comments :-)
On Wed, 2014-06-11 at 14:08 +0100, Chris Redpath wrote:
Hi Tixy,
I have 2 new patches for you. Patch 1 is a bug I found on a platform where all the little CPUs can be unplugged. Patch 2 is one that got missed during the initial LSK population. This patch stops us giving Android services (started by init) an initial boost to an A15 and lets them be scheduled as normal given the tracked load. The main reason for this is so that when they allocate timers etc, they are allocated on little CPUs.
Basil will send our test results out separately.
First boot after trying these patches out, I hit a lockdep warning [2] which looks like one reported recently [1]. Perhaps not boosting process forked from init now means that these end up getting migrated during boot when they weren't before and so triggering this pre-existing problem?
As for the actual lockdep issue, looks like a problem with last month's patch "hmp: Use idle pull to perform forced up-migrations", to do with how hmp_keepalive_delay() uses cpuidle_driver_ref(). Right now I'm working on high priority Juno related matters, so won't be investigating further for some time...
[1] http://lists.linaro.org/pipermail/linaro-kernel/2014-June/015528.html
[2] [ 11.964254] ================================= [ 11.977079] [ INFO: inconsistent lock state ] [ 11.989905] 3.10.42-00217-gd77abb1 #1 Not tainted [ 12.003754] --------------------------------- [ 12.016578] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 12.034274] swapper/3/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 12.049405] (cpuidle_driver_lock){+.?...}, at: [<c0300979>] cpuidle_driver_ref+0x15/0x38 [ 12.073549] {SOFTIRQ-ON-W} state was registered at: [ 12.087911] [<c006199f>] __lock_acquire+0x44f/0x920 [ 12.102798] [<c00624d5>] lock_acquire+0x65/0xbc [ 12.116658] [<c040b60b>] _raw_spin_lock+0x23/0x30 [ 12.131030] [<c030082f>] cpuidle_register_driver+0x23/0xc8 [ 12.147708] [<c062db51>] bl_idle_init+0x85/0x100 [ 12.161823] [<c000867d>] do_one_initcall+0xd1/0x114 [ 12.176708] [<c060e9e1>] kernel_init_freeable+0x109/0x180 [ 12.193132] [<c03ffbd9>] kernel_init+0x11/0x110 [ 12.206994] [<c000cda9>] ret_from_fork+0x11/0x1c [ 12.221111] irq event stamp: 29780 [ 12.231117] hardirqs last enabled at (29780): [<c040b7c5>] _raw_spin_unlock_irqrestore+0x25/0x34 [ 12.257364] hardirqs last disabled at (29779): [<c040b6c5>] _raw_spin_lock_irqsave+0x19/0x3c [ 12.282241] softirqs last enabled at (29770): [<c0023721>] irq_enter+0x61/0x64 [ 12.303791] softirqs last disabled at (29771): [<c0023565>] do_softirq+0x5d/0x60 [ 12.325595] [ 12.325595] other info that might help us debug this: [ 12.344827] Possible unsafe locking scenario: [ 12.344827] [ 12.362264] CPU0 [ 12.369450] ---- [ 12.376636] lock(cpuidle_driver_lock); [ 12.388196] <Interrupt> [ 12.395895] lock(cpuidle_driver_lock); [ 12.407967] [ 12.407967] *** DEADLOCK *** [ 12.407967] [ 12.425407] 1 lock held by swapper/3/0: [ 12.436693] #0: (hmp_force_migration){+.....}, at: [<c00498fd>] hmp_idle_pull+0x45/0x388 [ 12.461099] [ 12.461099] stack backtrace: [ 12.473927] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.10.42-00217-gd77abb1 #1 [ 12.495473] [<c00124f1>] (unwind_backtrace+0x1/0x9c) from [<c000ff79>] (show_stack+0x11/0x14) [ 12.520604] [<c000ff79>] (show_stack+0x11/0x14) from [<c040504d>] (print_usage_bug+0x25d/0x268) [ 12.546247] [<c040504d>] (print_usage_bug+0x25d/0x268) from [<c0060c37>] (mark_lock+0x173/0x5e0) [ 12.572145] [<c0060c37>] (mark_lock+0x173/0x5e0) from [<c006196f>] (__lock_acquire+0x41f/0x920) [ 12.597785] [<c006196f>] (__lock_acquire+0x41f/0x920) from [<c00624d5>] (lock_acquire+0x65/0xbc) [ 12.623684] [<c00624d5>] (lock_acquire+0x65/0xbc) from [<c040b60b>] (_raw_spin_lock+0x23/0x30) [ 12.649069] [<c040b60b>] (_raw_spin_lock+0x23/0x30) from [<c0300979>] (cpuidle_driver_ref+0x15/0x38) [ 12.675993] [<c0300979>] (cpuidle_driver_ref+0x15/0x38) from [<c0049b67>] (hmp_idle_pull+0x2af/0x388) [ 12.703174] [<c0049b67>] (hmp_idle_pull+0x2af/0x388) from [<c004c273>] (run_rebalance_domains+0x11b/0x140) [ 12.731635] [<c004c273>] (run_rebalance_domains+0x11b/0x140) from [<c00233cb>] (__do_softirq+0xdf/0x1d0) [ 12.759585] [<c00233cb>] (__do_softirq+0xdf/0x1d0) from [<c0023565>] (do_softirq+0x5d/0x60) [ 12.784202] [<c0023565>] (do_softirq+0x5d/0x60) from [<c00237a3>] (irq_exit+0x7f/0xa4) [ 12.807537] [<c00237a3>] (irq_exit+0x7f/0xa4) from [<c00114fb>] (handle_IPI+0x167/0x1ac) [ 12.831386] [<c00114fb>] (handle_IPI+0x167/0x1ac) from [<c0008515>] (gic_handle_irq+0x51/0x54) [ 12.856770] [<c0008515>] (gic_handle_irq+0x51/0x54) from [<c000c7ff>] (__irq_svc+0x3f/0x64) [ 12.881382] Exception stack(0xef0d5f28 to 0xef0d5f70) [ 12.896260] 5f20: c0300539 00000004 00000000 00000001 ef0d4010 00000001 [ 12.920361] 5f40: c1c89788 c06ab148 00000001 c06ab0fc 1ff9fba3 00000001 ef0ce100 ef0d5f70 [ 12.944462] 5f60: c0300539 c0061230 20000173 ffffffff [ 12.959341] [<c000c7ff>] (__irq_svc+0x3f/0x64) from [<c0061230>] (trace_hardirqs_on_caller+0xa8/0x16c) [ 12.986777] [<c0061230>] (trace_hardirqs_on_caller+0xa8/0x16c) from [<c0300539>] (cpuidle_enter_state+0x3d/0xb0) [ 13.016775] [<c0300539>] (cpuidle_enter_state+0x3d/0xb0) from [<c0300631>] (cpuidle_idle_call+0x85/0x140) [ 13.044981] [<c0300631>] (cpuidle_idle_call+0x85/0x140) from [<c000d951>] (arch_cpu_idle+0xd/0x30) [ 13.071392] [<c000d951>] (arch_cpu_idle+0xd/0x30) from [<c0054c71>] (cpu_startup_entry+0x131/0x16c) [ 13.098058] [<c0054c71>] (cpu_startup_entry+0x131/0x16c) from [<800081b5>] (0x800081b5)
Hi Chris,
Yes this happens only with IKS runs as I can see from the detailed logs. Tixy, can you confirm if your config was also IKS?
Since the WA2 runs generated all results successfully in IKS mode Bangalore missed to look at the detailed test logs from Jenkins.
Thanks Basil Eljuse...
-----Original Message----- From: Jon Medhurst (Tixy) [mailto:tixy@linaro.org] Sent: 11 June 2014 15:39 To: Chris Redpath Cc: mark.brown@linaro.org; linaro-kernel@lists.linaro.org; Basil Eljuse; Alex Shi; Naresh Kamboju Subject: Re: May/June patches for HMP
On Wed, 2014-06-11 at 14:08 +0100, Chris Redpath wrote:
Hi Tixy,
I have 2 new patches for you. Patch 1 is a bug I found on a platform where all the little CPUs can be unplugged. Patch 2 is one that got missed during the initial LSK population. This patch stops us giving Android services (started by init) an initial boost to an A15 and lets them be scheduled as normal given the tracked load. The main reason for this is so that when they allocate timers etc, they are allocated on little CPUs.
Basil will send our test results out separately.
First boot after trying these patches out, I hit a lockdep warning [2] which looks like one reported recently [1]. Perhaps not boosting process forked from init now means that these end up getting migrated during boot when they weren't before and so triggering this pre-existing problem?
As for the actual lockdep issue, looks like a problem with last month's patch "hmp: Use idle pull to perform forced up-migrations", to do with how hmp_keepalive_delay() uses cpuidle_driver_ref(). Right now I'm working on high priority Juno related matters, so won't be investigating further for some time...
[1] http://lists.linaro.org/pipermail/linaro-kernel/2014-June/015528.html
[2] [ 11.964254] ================================= [ 11.977079] [ INFO: inconsistent lock state ] [ 11.989905] 3.10.42-00217-gd77abb1 #1 Not tainted [ 12.003754] --------------------------------- [ 12.016578] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 12.034274] swapper/3/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 12.049405] (cpuidle_driver_lock){+.?...}, at: [<c0300979>] cpuidle_driver_ref+0x15/0x38 [ 12.073549] {SOFTIRQ-ON-W} state was registered at: [ 12.087911] [<c006199f>] __lock_acquire+0x44f/0x920 [ 12.102798] [<c00624d5>] lock_acquire+0x65/0xbc [ 12.116658] [<c040b60b>] _raw_spin_lock+0x23/0x30 [ 12.131030] [<c030082f>] cpuidle_register_driver+0x23/0xc8 [ 12.147708] [<c062db51>] bl_idle_init+0x85/0x100 [ 12.161823] [<c000867d>] do_one_initcall+0xd1/0x114 [ 12.176708] [<c060e9e1>] kernel_init_freeable+0x109/0x180 [ 12.193132] [<c03ffbd9>] kernel_init+0x11/0x110 [ 12.206994] [<c000cda9>] ret_from_fork+0x11/0x1c [ 12.221111] irq event stamp: 29780 [ 12.231117] hardirqs last enabled at (29780): [<c040b7c5>] _raw_spin_unlock_irqrestore+0x25/0x34 [ 12.257364] hardirqs last disabled at (29779): [<c040b6c5>] _raw_spin_lock_irqsave+0x19/0x3c [ 12.282241] softirqs last enabled at (29770): [<c0023721>] irq_enter+0x61/0x64 [ 12.303791] softirqs last disabled at (29771): [<c0023565>] do_softirq+0x5d/0x60 [ 12.325595] [ 12.325595] other info that might help us debug this: [ 12.344827] Possible unsafe locking scenario: [ 12.344827] [ 12.362264] CPU0 [ 12.369450] ---- [ 12.376636] lock(cpuidle_driver_lock); [ 12.388196] <Interrupt> [ 12.395895] lock(cpuidle_driver_lock); [ 12.407967] [ 12.407967] *** DEADLOCK *** [ 12.407967] [ 12.425407] 1 lock held by swapper/3/0: [ 12.436693] #0: (hmp_force_migration){+.....}, at: [<c00498fd>] hmp_idle_pull+0x45/0x388 [ 12.461099] [ 12.461099] stack backtrace: [ 12.473927] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.10.42-00217-gd77abb1 #1 [ 12.495473] [<c00124f1>] (unwind_backtrace+0x1/0x9c) from [<c000ff79>] (show_stack+0x11/0x14) [ 12.520604] [<c000ff79>] (show_stack+0x11/0x14) from [<c040504d>] (print_usage_bug+0x25d/0x268) [ 12.546247] [<c040504d>] (print_usage_bug+0x25d/0x268) from [<c0060c37>] (mark_lock+0x173/0x5e0) [ 12.572145] [<c0060c37>] (mark_lock+0x173/0x5e0) from [<c006196f>] (__lock_acquire+0x41f/0x920) [ 12.597785] [<c006196f>] (__lock_acquire+0x41f/0x920) from [<c00624d5>] (lock_acquire+0x65/0xbc) [ 12.623684] [<c00624d5>] (lock_acquire+0x65/0xbc) from [<c040b60b>] (_raw_spin_lock+0x23/0x30) [ 12.649069] [<c040b60b>] (_raw_spin_lock+0x23/0x30) from [<c0300979>] (cpuidle_driver_ref+0x15/0x38) [ 12.675993] [<c0300979>] (cpuidle_driver_ref+0x15/0x38) from [<c0049b67>] (hmp_idle_pull+0x2af/0x388) [ 12.703174] [<c0049b67>] (hmp_idle_pull+0x2af/0x388) from [<c004c273>] (run_rebalance_domains+0x11b/0x140) [ 12.731635] [<c004c273>] (run_rebalance_domains+0x11b/0x140) from [<c00233cb>] (__do_softirq+0xdf/0x1d0) [ 12.759585] [<c00233cb>] (__do_softirq+0xdf/0x1d0) from [<c0023565>] (do_softirq+0x5d/0x60) [ 12.784202] [<c0023565>] (do_softirq+0x5d/0x60) from [<c00237a3>] (irq_exit+0x7f/0xa4) [ 12.807537] [<c00237a3>] (irq_exit+0x7f/0xa4) from [<c00114fb>] (handle_IPI+0x167/0x1ac) [ 12.831386] [<c00114fb>] (handle_IPI+0x167/0x1ac) from [<c0008515>] (gic_handle_irq+0x51/0x54) [ 12.856770] [<c0008515>] (gic_handle_irq+0x51/0x54) from [<c000c7ff>] (__irq_svc+0x3f/0x64) [ 12.881382] Exception stack(0xef0d5f28 to 0xef0d5f70) [ 12.896260] 5f20: c0300539 00000004 00000000 00000001 ef0d4010 00000001 [ 12.920361] 5f40: c1c89788 c06ab148 00000001 c06ab0fc 1ff9fba3 00000001 ef0ce100 ef0d5f70 [ 12.944462] 5f60: c0300539 c0061230 20000173 ffffffff [ 12.959341] [<c000c7ff>] (__irq_svc+0x3f/0x64) from [<c0061230>] (trace_hardirqs_on_caller+0xa8/0x16c) [ 12.986777] [<c0061230>] (trace_hardirqs_on_caller+0xa8/0x16c) from [<c0300539>] (cpuidle_enter_state+0x3d/0xb0) [ 13.016775] [<c0300539>] (cpuidle_enter_state+0x3d/0xb0) from [<c0300631>] (cpuidle_idle_call+0x85/0x140) [ 13.044981] [<c0300631>] (cpuidle_idle_call+0x85/0x140) from [<c000d951>] (arch_cpu_idle+0xd/0x30) [ 13.071392] [<c000d951>] (arch_cpu_idle+0xd/0x30) from [<c0054c71>] (cpu_startup_entry+0x131/0x16c) [ 13.098058] [<c0054c71>] (cpu_startup_entry+0x131/0x16c) from [<800081b5>] (0x800081b5)
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
On Wed, 2014-06-11 at 15:51 +0100, Basil Eljuse wrote:
Hi Chris,
Yes this happens only with IKS runs as I can see from the detailed logs. Tixy, can you confirm if your config was also IKS?
It has HMP active and IKS support compiled in but not active, which should be same config as Linaro's LSK Android releases are made with, except I also had CONFIG_PROVE_LOCKING enabled.
I've just compiled without IKS and see the same problem.
I also tried again without the 2 new patches sent today and don't get the lockdep warning, and everything seems consistently reproducible to me.
Not sure which Android file system I have, think its a week or so old one from the CI jobs for the tip kernel (but only difference to LSK builds should be the kernel anyway, and I'm building my own one of those).
The kernel source I'm using is the latest lsk android branch: https://git.linaro.org/kernel/linux-linaro-stable.git/commit/ddb020c86c1de65... with the two new MP patches applied.
And the kernel config I'm using is that generated by running
ARCH=arm scripts/kconfig/merge_config.sh \ linaro/configs/linaro-base.conf \ linaro/configs/android.conf \ linaro/configs/big-LITTLE-MP.conf \ linaro/configs/big-LITTLE-IKS.conf \ linaro/configs/vexpress.conf \ linaro/configs/debug.conf
I've attached my boot logs, including the output of "cat /proc/config.gz" so we can see for definite the config used to build the kernel.
My personal scripts also set CONFIG_GATOR to =y rather than =m which is shown in these logs, but I tried again without that and see the same results.
Hi Basil
What do you want us to do about the two new patches? Just take the first one as that's an obvious and safe fix, and leave out the second one for this month? Or, as the real problem is with last month's MP patches, perhaps include the second patch as well?
Me, and most people here, are totally maxed out working on Juno and arm64 things for the 14.06 release so I can't see anyone being able to spare time investigating the MP patches. So I think it's either a case of applying them as is, or not.
Sorry Tixy, Me too drowned in juno validation tasks.
Just take the first one as that's an obvious and safe fix,
We agree to go with this option for current month.
And this would let Chris focus on for a proper fix for the idle pull related potential deadlock issues.
Thanks Basil Eljuse...
-----Original Message----- From: Jon Medhurst (Tixy) [mailto:tixy@linaro.org] Sent: 12 June 2014 14:46 To: Basil Eljuse Cc: Chris Redpath; mark.brown@linaro.org; linaro-kernel@lists.linaro.org; Alex Shi; Naresh Kamboju Subject: Re: May/June patches for HMP
Hi Basil
What do you want us to do about the two new patches? Just take the first one as that's an obvious and safe fix, and leave out the second one for this month? Or, as the real problem is with last month's MP patches, perhaps include the second patch as well?
Me, and most people here, are totally maxed out working on Juno and arm64 things for the 14.06 release so I can't see anyone being able to spare time investigating the MP patches. So I think it's either a case of applying them as is, or not.
-- Tixy
On Wed, 2014-06-11 at 16:56 +0100, Jon Medhurst (Tixy) wrote:
On Wed, 2014-06-11 at 15:51 +0100, Basil Eljuse wrote:
Hi Chris,
Yes this happens only with IKS runs as I can see from the detailed logs. Tixy, can you confirm if your config was also IKS?
It has HMP active and IKS support compiled in but not active, which should be same config as Linaro's LSK Android releases are made with, except I also had CONFIG_PROVE_LOCKING enabled.
I've just compiled without IKS and see the same problem.
I also tried again without the 2 new patches sent today and don't get the lockdep warning, and everything seems consistently reproducible to me.
Not sure which Android file system I have, think its a week or so old one from the CI jobs for the tip kernel (but only difference to LSK builds should be the kernel anyway, and I'm building my own one of those).
The kernel source I'm using is the latest lsk android branch: https://git.linaro.org/kernel/linux-linaro-stable.git/commit/ddb020c86c1de65... with the two new MP patches applied.
And the kernel config I'm using is that generated by running
ARCH=arm scripts/kconfig/merge_config.sh \ linaro/configs/linaro-base.conf \ linaro/configs/android.conf \ linaro/configs/big-LITTLE-MP.conf \ linaro/configs/big-LITTLE-IKS.conf \ linaro/configs/vexpress.conf \ linaro/configs/debug.conf
I've attached my boot logs, including the output of "cat /proc/config.gz" so we can see for definite the config used to build the kernel.
My personal scripts also set CONFIG_GATOR to =y rather than =m which is shown in these logs, but I tried again without that and see the same results.
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
Hi guys,
I appreciate that we're all a big bogged down at the moment :) but I will try to get a fix sorted out for next month for the cpufreq driver lock issue. I don't mind not releasing the second patch, we have been running without it for quite some time anyway.
I'm not sure that we'll ever see the potential lock-up since each CPU only looks at the cpufreq data once. I thought it was actually impossible to cause, but if lockdep reckons so I'll have to look again.
--Chris ________________________________________ From: Basil Eljuse Sent: 12 June 2014 17:30 To: Jon Medhurst (Tixy) Cc: Chris Redpath; mark.brown@linaro.org; linaro-kernel@lists.linaro.org; Alex Shi; Naresh Kamboju Subject: RE: May/June patches for HMP
Sorry Tixy, Me too drowned in juno validation tasks.
Just take the first one as that's an obvious and safe fix,
We agree to go with this option for current month.
And this would let Chris focus on for a proper fix for the idle pull related potential deadlock issues.
Thanks Basil Eljuse...
-----Original Message----- From: Jon Medhurst (Tixy) [mailto:tixy@linaro.org] Sent: 12 June 2014 14:46 To: Basil Eljuse Cc: Chris Redpath; mark.brown@linaro.org; linaro-kernel@lists.linaro.org; Alex Shi; Naresh Kamboju Subject: Re: May/June patches for HMP
Hi Basil
What do you want us to do about the two new patches? Just take the first one as that's an obvious and safe fix, and leave out the second one for this month? Or, as the real problem is with last month's MP patches, perhaps include the second patch as well?
Me, and most people here, are totally maxed out working on Juno and arm64 things for the 14.06 release so I can't see anyone being able to spare time investigating the MP patches. So I think it's either a case of applying them as is, or not.
-- Tixy
On Wed, 2014-06-11 at 16:56 +0100, Jon Medhurst (Tixy) wrote:
On Wed, 2014-06-11 at 15:51 +0100, Basil Eljuse wrote:
Hi Chris,
Yes this happens only with IKS runs as I can see from the detailed logs. Tixy, can you confirm if your config was also IKS?
It has HMP active and IKS support compiled in but not active, which should be same config as Linaro's LSK Android releases are made with, except I also had CONFIG_PROVE_LOCKING enabled.
I've just compiled without IKS and see the same problem.
I also tried again without the 2 new patches sent today and don't get the lockdep warning, and everything seems consistently reproducible to me.
Not sure which Android file system I have, think its a week or so old one from the CI jobs for the tip kernel (but only difference to LSK builds should be the kernel anyway, and I'm building my own one of those).
The kernel source I'm using is the latest lsk android branch: https://git.linaro.org/kernel/linux-linaro-stable.git/commit/ddb020c86c1de65... with the two new MP patches applied.
And the kernel config I'm using is that generated by running
ARCH=arm scripts/kconfig/merge_config.sh \ linaro/configs/linaro-base.conf \ linaro/configs/android.conf \ linaro/configs/big-LITTLE-MP.conf \ linaro/configs/big-LITTLE-IKS.conf \ linaro/configs/vexpress.conf \ linaro/configs/debug.conf
I've attached my boot logs, including the output of "cat /proc/config.gz" so we can see for definite the config used to build the kernel.
My personal scripts also set CONFIG_GATOR to =y rather than =m which is shown in these logs, but I tried again without that and see the same results.
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
On Thu, 2014-06-12 at 18:25 +0100, Chris Redpath wrote: [...]
I'm not sure that we'll ever see the potential lock-up since each CPU only looks at the cpufreq data once.
That was my initial feeling too, but I didn't want to start thinking about it properly and digging though code because of all the other things on our plates...
Sorry that the test results I shall share on Monday (thought of doing it today) because I could not catchup with Robin yet. Tomorrow I am out of office! Sorry!
Thanks Basil Eljuse...
-----Original Message----- From: Jon Medhurst (Tixy) [mailto:tixy@linaro.org] Sent: 12 June 2014 19:02 To: Chris Redpath Cc: Basil Eljuse; mark.brown@linaro.org; linaro-kernel@lists.linaro.org; Alex Shi; Naresh Kamboju Subject: Re: May/June patches for HMP
On Thu, 2014-06-12 at 18:25 +0100, Chris Redpath wrote: [...]
I'm not sure that we'll ever see the potential lock-up since each CPU only looks at the cpufreq data once.
That was my initial feeling too, but I didn't want to start thinking about it properly and digging though code because of all the other things on our plates...
-- Tixy
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
No worries, it is on my list :)
--Chris ________________________________________ From: Jon Medhurst (Tixy) [tixy@linaro.org] Sent: 12 June 2014 19:02 To: Chris Redpath Cc: Basil Eljuse; mark.brown@linaro.org; linaro-kernel@lists.linaro.org; Alex Shi; Naresh Kamboju Subject: Re: May/June patches for HMP
On Thu, 2014-06-12 at 18:25 +0100, Chris Redpath wrote: [...]
I'm not sure that we'll ever see the potential lock-up since each CPU only looks at the cpufreq data once.
That was my initial feeling too, but I didn't want to start thinking about it properly and digging though code because of all the other things on our plates...
-- Tixy
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
linaro-kernel@lists.linaro.org