Hi Ke,
Thank you very much for your feedback !
After a quick investigation I noticed that select_energy_cpu_brute() can actually return an offline CPU in some corner cases (if prev_cpu is offline for example). That was not an issue when select_energy_cpu_brute() was called only from select_task_rq_fair() as select_task_rq() would safely call select_fallback_rq() in that case. However, since:
9e293db sched: EAS: upmigrate misfit current task
select_energy_cpu_brute is now called outside of the wakeup path and an active load balance is triggered unconditionally on the CPU that was selected, which might be offline. I'm not an expert in load balance but I suspect this isn't the right thing to do. I'll investigate a little bit more and try to come up with a fix if this is confirmed to be the root cause.
I hope that's useful !
Regards, Quentin
On Tuesday 09 Jan 2018 at 11:15:03 (+0800), Ke Wang wrote:
Hi Joonwoo, Chris,
When porting EAS1.4 to our platform which is SMP(4*A7, k4.4), we encountered kernel panic frequently after applied following patches:
- | 9e293db sched: EAS: upmigrate misfit current task
- | dc626b2 sched: avoid pushing tasks to an offline CPU
- | 2da014c sched: Extend active balance to accept 'push_task' argument
After applying these three patches, leaving EAS disabled and doing a stability test which includes some random cpu plugin/plugout, kernel panic sometimes happened, always with the same stack as below:
[ 214.742695] c1 ------------[ cut here ]------------ [ 214.742709] c1 kernel BUG at /space/builder/repo/sprdroid8.1_trunk/kernel/kernel/smpboot.c:136! [ 214.742718] c1 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 214.748750] c0 Modules linked in: mtty marlin2_fm mali(O) [ 214.748785] c1 CPU: 1 PID: 18 Comm: migration/2 Tainted: G W O 4.4.83-00912-g370f62c #1 [ 214.748795] c1 Hardware name: Generic DT based system [ 214.748805] c1 task: ef2d9680 task.stack: ee862000 [ 214.748821] c1 PC is at smpboot_thread_fn+0x168/0x270 [ 214.748832] c1 LR is at smpboot_thread_fn+0xe4/0x270 [ 214.748843] c1 pc : [<c014d71c>] lr : [<c014d698>] psr: 200e0113 sp : ee863f38 ip : ee863f38 fp : ee863f5c [ 214.748854] c1 r10: 00000000 r9 : 00000000 r8 : 00000000 [ 214.748862] c1 r7 : 00000001 r6 : c111a814 r5 : ee846140 r4 : ee862000 [ 214.748871] c1 r3 : 00000001 r2 : ee863f28 r1 : 00000000 r0 : 00000002 [ 214.748881] c1 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 214.748890] c1 Control: 10c5387d Table: 9b9e406a DAC: 00000051 ... [ 214.821339] c1 [<c014d71c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c) [ 214.821363] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24) [ 214.821378] c1 Code: e5950000 e5943010 e1500003 0a000000 (e7f001f2)
kernel/kernel/smpboot.c:136: BUG_ON(td->cpu != smp_processor_id());
It seems that OOPS was caused by migration/2 actually running on cpu1.
Do you have any suggestions for this? Thanks in advance. _______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev