On Thu, 11 Apr 2024 at 20:12, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Thu, 11 Apr 2024 at 09:55, David Gow davidgow@google.com wrote:
On Wed, 10 Apr 2024 at 23:23, Will Deacon will@kernel.org wrote:
On Wed, Apr 10, 2024 at 03:57:10PM +0530, Naresh Kamboju wrote:
Following kernel crash noticed on Linux next-20240410 tag while running kunit testing on qemu-arm64 and qemu-x86_64.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
<trim>
Ok, so 'task_struct->vfork_done' is NULL. Looks like this code was added recently, so adding Mickaël to cc.
Thanks. This looks like a race condition where the KUnit test kthread can terminate before we wait on it.
Mickaël, does this seem like a correct fix to you?
From: David Gow davidgow@google.com Date: Thu, 11 Apr 2024 12:07:47 +0800 Subject: [PATCH] kunit: Fix race condition in try-catch completion
KUnit's try-catch infrastructure now uses vfork_done, which is always set to a valid completion when a kthread is crated, but which is set to NULL once the thread terminates. This creates a race condition, where the kthread exits before we can wait on it.
Keep a copy of vfork_done, which is taken before we wake_up_process() and so valid, and wait on that instead.
Fixes: 4de2a8e4cca4 ("kunit: Handle test faults") Reported-by: Linux Kernel Functional Testing lkft@linaro.org Signed-off-by: David Gow davidgow@google.com
This patch tested on top of Linux next and reported issues fixed.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
lib/kunit/try-catch.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index fa687278ccc9..fc6cd4d7e80f 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -63,6 +63,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) { struct kunit *test = try_catch->test; struct task_struct *task_struct;
struct completion *task_done; int exit_code, time_remaining; try_catch->context = context;
@@ -75,13 +76,14 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) return; } get_task_struct(task_struct);
task_done = task_struct->vfork_done; wake_up_process(task_struct); /* * As for a vfork(2), task_struct->vfork_done (pointing to the * underlying kthread->exited) can be used to wait for the end of a * kernel thread. */
time_remaining = wait_for_completion_timeout(task_struct->vfork_done,
time_remaining = wait_for_completion_timeout(task_done, kunit_test_timeout()); if (time_remaining == 0) { try_catch->try_result = -ETIMEDOUT;
--
I use to notice kernel panic while running kunit tests now I have noticed this
Unable to handle kernel paging request at virtual address KASAN: null-ptr-deref in range pc : kunit_test_null_dereference (lib/kunit/kunit-test.c:119) lr : kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:31)
The kunit tests run to completion and the system is stable. Kernel did not panic.
kunit test log: ------ <6>[ 76.784878] # Subtest: kunit_fault <6>[ 76.785527] # module: kunit_test <6>[ 76.785785] 1..1 <1>[ 76.794318] Unable to handle kernel paging request at virtual address dfff800000000000 <1>[ 76.796137] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] <1>[ 76.796970] Mem abort info: <1>[ 76.797685] ESR = 0x0000000096000005 <1>[ 76.798868] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 76.800355] SET = 0, FnV = 0 <1>[ 76.800893] EA = 0, S1PTW = 0 <1>[ 76.801715] FSC = 0x05: level 1 translation fault <1>[ 76.802654] Data abort info: <1>[ 76.803713] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 76.804362] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 76.805278] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 76.806302] [dfff800000000000] address between user and kernel address ranges <0>[ 76.808597] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP <4>[ 76.809876] Modules linked in: <4>[ 76.812055] CPU: 1 PID: 567 Comm: kunit_try_catch Tainted: G B N 6.9.0-rc3-next-20240410 #1 <4>[ 76.812987] Hardware name: linux,dummy-virt (DT) <4>[ 76.814123] pstate: 12400009 (nzcV daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) <4>[ 76.814947] pc : kunit_test_null_dereference (lib/kunit/kunit-test.c:119) <4>[ 76.815862] lr : kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:31) <4>[ 76.816765] sp : ffff800083137dc0 <4>[ 76.817473] x29: ffff800083137e20 x28: 0000000000000000 x27: 0000000000000000 <4>[ 76.818684] x26: 0000000000000000 x25: 0000000000000000 x24: fff00000c1b30c00 <4>[ 76.819798] x23: ffffa76fb372e348 x22: ffffa76fb3736550 x21: fff00000c1b30c08 <4>[ 76.820900] x20: 1ffff00010626fb8 x19: ffff8000800879f0 x18: 0000000000000068 <4>[ 76.822008] x17: 0000000000000000 x16: fff00000da132180 x15: ffffa76fb36f3b04 <4>[ 76.823125] x14: ffffa76fb2e3cc28 x13: 1ffe0000181547e4 x12: fffd80001832511a <4>[ 76.824229] x11: 1ffe000018325119 x10: fffd800018325119 x9 : ffffa76fb372e3d0 <4>[ 76.825409] x8 : ffff800083137cb8 x7 : 0000000000000000 x6 : 0000000041b58ab3 <4>[ 76.826532] x5 : ffff700010626fb8 x4 : 00000000f1f1f1f1 x3 : 0000000000000003 <4>[ 76.827653] x2 : dfff800000000000 x1 : fff00000c1928000 x0 : ffff8000800879f0 <4>[ 76.828829] Call trace: <4>[ 76.829410] kunit_test_null_dereference (lib/kunit/kunit-test.c:119) <4>[ 76.830294] kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:31) <4>[ 76.831168] kthread (kernel/kthread.c:389) <4>[ 76.831870] ret_from_fork (arch/arm64/kernel/entry.S:861) <0>[ 76.833252] Code: b90004a3 d5384101 52800063 aa0003f3 (39c00042) All code ======== 0: b90004a3 str w3, [x5, #4] 4: d5384101 mrs x1, sp_el0 8: 52800063 mov w3, #0x3 // #3 c: aa0003f3 mov x19, x0 10:* 39c00042 ldrsb w2, [x2] <-- trapping instruction
Code starting with the faulting instruction =========================================== 0: 39c00042 ldrsb w2, [x2] <4>[ 76.834489] ---[ end trace 0000000000000000 ]---
Links: - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2exQ84OHGOd... -- Linaro LKFT https://lkft.linaro.org