Hi All,
LKFT occasionally found a kernel bug in x15 platform, which is a armv7 board.
The bug caught on kernel commit f82786d v4.9.55, but panic could happens in
upstream, since there is no much change on the function call chain.
The function call chain is vector___pabt_svc -> do_PrefetchAbort ->
do_page_fault -> might_sleep()
The trick thing is LKFT team can not reproduce the bug. But from the kernel
panic info, we know the irq_disabled() is 128, that would be the only reason,
we got the panic -- the code can not return since irqs_disabled() = 128.
The preempt_offset and preempt_count are both 0 here.
line 7726 in kernel/sched/core.c: in function ___might_sleep():
if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
!is_idle_task(current)) ||
system_state != SYSTEM_RUNNING || oops_in_progress)
return;
I have no more idea on this issue. Any hints are appreciated!
Regards
Alex
BUG: sleeping function called from invalid context at /srv/oe/build/tmp-rpb-glibc/work-shared/am57xx-evm/kernel-source/arch/arm/mm/fault.c:303
[ 53.264908] in_atomic(): 0, irqs_disabled(): 128, pid: 1691, name: ftracetest
[ 53.272074] 1 lock held by ftracetest/1691:
[ 53.276273] #0: (&mm->mmap_sem){++++++}, at: [<c0d60cfc>] do_page_fault+0x90/0x428
[ 53.284095] irq event stamp: 12924
[ 53.287514] hardirqs last enabled at (12923): [<c0307f10>] no_work_pending+0x4/0x30
[ 53.295289] hardirqs last disabled at (12924): [<c0d605a0>] __pabt_svc+0x60/0xa0
[ 53.302718] softirqs last enabled at (11474): [<c034c5d0>] __do_softirq+0x280/0x5ac
[ 53.310494] softirqs last disabled at (11433): [<c034cc98>] irq_exit+0xf4/0x158
[ 53.317837] CPU: 0 PID: 1691 Comm: ftracetest Not tainted 4.9.55-dirty #1
[ 53.324652] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 53.330857] [<c03114d8>] (unwind_backtrace) from [<c030cb18>] (show_stack+0x10/0x14)
[ 53.338644] [<c030cb18>] (show_stack) from [<c067e604>] (dump_stack+0xa4/0xd0)
[ 53.345908] [<c067e604>] (dump_stack) from [<c0373808>] (___might_sleep+0x1ac/0x2a0)
[ 53.353694] [<c0373808>] (___might_sleep) from [<c0d60ec8>] (do_page_fault+0x25c/0x428)
[ 53.361739] [<c0d60ec8>] (do_page_fault) from [<c03013e8>] (do_PrefetchAbort+0x38/0x9c)
[ 53.369780] [<c03013e8>] (do_PrefetchAbort) from [<c0d605a8>] (__pabt_svc+0x68/0xa0)
[ 53.377557] Exception stack(0xec6fbfa8 to 0xec6fbff0)
[ 53.382629] bfa0: 00000001 00000001 ffffffff 00000000 0010ac68 00000007
[ 53.390845] bfc0: 00000001 0000003f 00000009 0000000c fffffffa be9d27a4 000e31fc ec6fbff8
[ 53.399055] bfe0: b6e6d49c b6e6d49c 40070093 ffffffff
[ 53.404137] [<c0d605a8>] (__pabt_svc) from [<b6e6d49c>] (0xb6e6d49c)
Hi All,
So I’ve spent a little time checking into how one might take advantage of ‘useful’ test cases which are part of various distro packages.
Specifically this often runs the make check or akin target.
These are often wired into the various package build systems that distros have. The nice thing about using this mechanism is:
1) the package already has been base lined.
2) packages run a range of completely useless tests for testing the kernel (like checking internal apis to a package) to things which actually trigger kernel activity like running valgrind or driving some network activity.
So in my case I’ve been using gentoo to dork around since gentoo has a pretty nice process and happily is able to email testing results. More about why this is a good attribute in a moment.
The use scenario ends up looking like:
RC released -> build/deploy new kernel -> emerge package_foo; email package_foo result; emerge package_bar; email package_bar result; post process results and integrate into qa-reports.
Where this might be viewed as kind of weird is this class of testing really doesn’t belong in LAVA. One doesn’t need (or want) to do whole fresh image deployment. I know what you’re thinking, the system will be dirty after a run is made so one MUST do a redeploy or the results will be invalid.
That might be the case for some environments, but it’s not the case for gentoo. Package build and test is already ‘clean’ for the purposes of a system under test because in the land of gentoo we sandbox. If we didn’t distro package testing would be incredibly painful. So really the only ‘danger’ is that the local environment does need to build and boot a to the kernel under test but that’s not a big deal, since if something fails to build/boot, well kernelci will complain. This isn’t about testing that type of situation.
Anyway I should have some test test 4.14 results in a few hours as some examples. I’m thinking an upstream report is probably super super simple, unless a regression pops up.
Comments?
Tom