On 10/1/24 18:20, Vlastimil Babka wrote:
Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.ne...
I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups.
Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing.
Thanks!
Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Uladzislau Rezki urezki@gmail.com Cc: rcu@vger.kernel.org Cc: Brendan Higgins brendanhiggins@google.com Cc: David Gow davidgow@google.com Cc: Rae Moar rmoar@google.com Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Signed-off-by: Vlastimil Babka vbabka@suse.cz
lib/slub_kunit.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c index 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054 100644 --- a/lib/slub_kunit.c +++ b/lib/slub_kunit.c @@ -164,10 +164,16 @@ struct test_kfree_rcu_struct { static void test_kfree_rcu(struct kunit *test) {
- struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu",
sizeof(struct test_kfree_rcu_struct),
SLAB_NO_MERGE);
- struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL);
- struct kmem_cache *s;
- struct test_kfree_rcu_struct *p;
- if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
kunit_skip(test, "can't do kfree_rcu() when test is built-in");
- s = test_kmem_cache_create("TestSlub_kfree_rcu",
sizeof(struct test_kfree_rcu_struct),
SLAB_NO_MERGE);
- p = kmem_cache_alloc(s, GFP_KERNEL);
kfree_rcu(p, rcu); kmem_cache_destroy(s);