On Wed, 2014-06-11 at 14:08 +0100, Chris Redpath wrote:
Hi Tixy,
I have 2 new patches for you. Patch 1 is a bug I found on a platform where all the little CPUs can be unplugged. Patch 2 is one that got missed during the initial LSK population. This patch stops us giving Android services (started by init) an initial boost to an A15 and lets them be scheduled as normal given the tracked load. The main reason for this is so that when they allocate timers etc, they are allocated on little CPUs.
Basil will send our test results out separately.
First boot after trying these patches out, I hit a lockdep warning [2] which looks like one reported recently [1]. Perhaps not boosting process forked from init now means that these end up getting migrated during boot when they weren't before and so triggering this pre-existing problem?
As for the actual lockdep issue, looks like a problem with last month's patch "hmp: Use idle pull to perform forced up-migrations", to do with how hmp_keepalive_delay() uses cpuidle_driver_ref(). Right now I'm working on high priority Juno related matters, so won't be investigating further for some time...
[1] http://lists.linaro.org/pipermail/linaro-kernel/2014-June/015528.html
[2] [ 11.964254] ================================= [ 11.977079] [ INFO: inconsistent lock state ] [ 11.989905] 3.10.42-00217-gd77abb1 #1 Not tainted [ 12.003754] --------------------------------- [ 12.016578] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 12.034274] swapper/3/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 12.049405] (cpuidle_driver_lock){+.?...}, at: [<c0300979>] cpuidle_driver_ref+0x15/0x38 [ 12.073549] {SOFTIRQ-ON-W} state was registered at: [ 12.087911] [<c006199f>] __lock_acquire+0x44f/0x920 [ 12.102798] [<c00624d5>] lock_acquire+0x65/0xbc [ 12.116658] [<c040b60b>] _raw_spin_lock+0x23/0x30 [ 12.131030] [<c030082f>] cpuidle_register_driver+0x23/0xc8 [ 12.147708] [<c062db51>] bl_idle_init+0x85/0x100 [ 12.161823] [<c000867d>] do_one_initcall+0xd1/0x114 [ 12.176708] [<c060e9e1>] kernel_init_freeable+0x109/0x180 [ 12.193132] [<c03ffbd9>] kernel_init+0x11/0x110 [ 12.206994] [<c000cda9>] ret_from_fork+0x11/0x1c [ 12.221111] irq event stamp: 29780 [ 12.231117] hardirqs last enabled at (29780): [<c040b7c5>] _raw_spin_unlock_irqrestore+0x25/0x34 [ 12.257364] hardirqs last disabled at (29779): [<c040b6c5>] _raw_spin_lock_irqsave+0x19/0x3c [ 12.282241] softirqs last enabled at (29770): [<c0023721>] irq_enter+0x61/0x64 [ 12.303791] softirqs last disabled at (29771): [<c0023565>] do_softirq+0x5d/0x60 [ 12.325595] [ 12.325595] other info that might help us debug this: [ 12.344827] Possible unsafe locking scenario: [ 12.344827] [ 12.362264] CPU0 [ 12.369450] ---- [ 12.376636] lock(cpuidle_driver_lock); [ 12.388196] <Interrupt> [ 12.395895] lock(cpuidle_driver_lock); [ 12.407967] [ 12.407967] *** DEADLOCK *** [ 12.407967] [ 12.425407] 1 lock held by swapper/3/0: [ 12.436693] #0: (hmp_force_migration){+.....}, at: [<c00498fd>] hmp_idle_pull+0x45/0x388 [ 12.461099] [ 12.461099] stack backtrace: [ 12.473927] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.10.42-00217-gd77abb1 #1 [ 12.495473] [<c00124f1>] (unwind_backtrace+0x1/0x9c) from [<c000ff79>] (show_stack+0x11/0x14) [ 12.520604] [<c000ff79>] (show_stack+0x11/0x14) from [<c040504d>] (print_usage_bug+0x25d/0x268) [ 12.546247] [<c040504d>] (print_usage_bug+0x25d/0x268) from [<c0060c37>] (mark_lock+0x173/0x5e0) [ 12.572145] [<c0060c37>] (mark_lock+0x173/0x5e0) from [<c006196f>] (__lock_acquire+0x41f/0x920) [ 12.597785] [<c006196f>] (__lock_acquire+0x41f/0x920) from [<c00624d5>] (lock_acquire+0x65/0xbc) [ 12.623684] [<c00624d5>] (lock_acquire+0x65/0xbc) from [<c040b60b>] (_raw_spin_lock+0x23/0x30) [ 12.649069] [<c040b60b>] (_raw_spin_lock+0x23/0x30) from [<c0300979>] (cpuidle_driver_ref+0x15/0x38) [ 12.675993] [<c0300979>] (cpuidle_driver_ref+0x15/0x38) from [<c0049b67>] (hmp_idle_pull+0x2af/0x388) [ 12.703174] [<c0049b67>] (hmp_idle_pull+0x2af/0x388) from [<c004c273>] (run_rebalance_domains+0x11b/0x140) [ 12.731635] [<c004c273>] (run_rebalance_domains+0x11b/0x140) from [<c00233cb>] (__do_softirq+0xdf/0x1d0) [ 12.759585] [<c00233cb>] (__do_softirq+0xdf/0x1d0) from [<c0023565>] (do_softirq+0x5d/0x60) [ 12.784202] [<c0023565>] (do_softirq+0x5d/0x60) from [<c00237a3>] (irq_exit+0x7f/0xa4) [ 12.807537] [<c00237a3>] (irq_exit+0x7f/0xa4) from [<c00114fb>] (handle_IPI+0x167/0x1ac) [ 12.831386] [<c00114fb>] (handle_IPI+0x167/0x1ac) from [<c0008515>] (gic_handle_irq+0x51/0x54) [ 12.856770] [<c0008515>] (gic_handle_irq+0x51/0x54) from [<c000c7ff>] (__irq_svc+0x3f/0x64) [ 12.881382] Exception stack(0xef0d5f28 to 0xef0d5f70) [ 12.896260] 5f20: c0300539 00000004 00000000 00000001 ef0d4010 00000001 [ 12.920361] 5f40: c1c89788 c06ab148 00000001 c06ab0fc 1ff9fba3 00000001 ef0ce100 ef0d5f70 [ 12.944462] 5f60: c0300539 c0061230 20000173 ffffffff [ 12.959341] [<c000c7ff>] (__irq_svc+0x3f/0x64) from [<c0061230>] (trace_hardirqs_on_caller+0xa8/0x16c) [ 12.986777] [<c0061230>] (trace_hardirqs_on_caller+0xa8/0x16c) from [<c0300539>] (cpuidle_enter_state+0x3d/0xb0) [ 13.016775] [<c0300539>] (cpuidle_enter_state+0x3d/0xb0) from [<c0300631>] (cpuidle_idle_call+0x85/0x140) [ 13.044981] [<c0300631>] (cpuidle_idle_call+0x85/0x140) from [<c000d951>] (arch_cpu_idle+0xd/0x30) [ 13.071392] [<c000d951>] (arch_cpu_idle+0xd/0x30) from [<c0054c71>] (cpu_startup_entry+0x131/0x16c) [ 13.098058] [<c0054c71>] (cpu_startup_entry+0x131/0x16c) from [<800081b5>] (0x800081b5)