The SLUB changes for 6.12 included new kunit tests that resulted in noisy warnings, which we normally suppress, and a boot lockup in some configurations in case the kunit tests are built-in.
The warnings are addressed in Patch 1.
The lockups I couldn't reproduce, but inspecting boot initialization order makes me suspect the test_kfree_rcu() calling kfree_rcu() which is too early before RCU finishes initialization. Moving the exection later was tried but broke tests marking their code as __init so Patch 2 skips the test when the slub kunit tests are built-in.
So these are now fixes for 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
The plan is to take the fixes via slab tree for a 6.12 rcX.
Signed-off-by: Vlastimil Babka vbabka@suse.cz --- Changes in v2: - patch 2 skips the test when built-in instead of moving kunit execution later - Link to v1: https://lore.kernel.org/r/20240930-b4-slub-kunit-fix-v1-0-32ca9dbbbc11@suse....
--- Vlastimil Babka (2): mm, slab: suppress warnings in test_leak_destroy kunit test slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
lib/slub_kunit.c | 18 ++++++++++++------ mm/slab.h | 6 ++++++ mm/slab_common.c | 5 +++-- mm/slub.c | 5 +++-- 4 files changed, 24 insertions(+), 10 deletions(-) --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20240930-b4-slub-kunit-fix-6fba4d1c1742
Best regards,
Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.ne... Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Uladzislau Rezki urezki@gmail.com Cc: rcu@vger.kernel.org Cc: Brendan Higgins brendanhiggins@google.com Cc: David Gow davidgow@google.com Cc: Rae Moar rmoar@google.com Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Signed-off-by: Vlastimil Babka vbabka@suse.cz --- lib/slub_kunit.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c index 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054 100644 --- a/lib/slub_kunit.c +++ b/lib/slub_kunit.c @@ -164,10 +164,16 @@ struct test_kfree_rcu_struct {
static void test_kfree_rcu(struct kunit *test) { - struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu", - sizeof(struct test_kfree_rcu_struct), - SLAB_NO_MERGE); - struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL); + struct kmem_cache *s; + struct test_kfree_rcu_struct *p; + + if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST)) + kunit_skip(test, "can't do kfree_rcu() when test is built-in"); + + s = test_kmem_cache_create("TestSlub_kfree_rcu", + sizeof(struct test_kfree_rcu_struct), + SLAB_NO_MERGE); + p = kmem_cache_alloc(s, GFP_KERNEL);
kfree_rcu(p, rcu); kmem_cache_destroy(s);
On 10/1/24 18:20, Vlastimil Babka wrote:
Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.ne...
I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups.
Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing.
Thanks!
Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Uladzislau Rezki urezki@gmail.com Cc: rcu@vger.kernel.org Cc: Brendan Higgins brendanhiggins@google.com Cc: David Gow davidgow@google.com Cc: Rae Moar rmoar@google.com Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Signed-off-by: Vlastimil Babka vbabka@suse.cz
lib/slub_kunit.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c index 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054 100644 --- a/lib/slub_kunit.c +++ b/lib/slub_kunit.c @@ -164,10 +164,16 @@ struct test_kfree_rcu_struct { static void test_kfree_rcu(struct kunit *test) {
- struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu",
sizeof(struct test_kfree_rcu_struct),
SLAB_NO_MERGE);
- struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL);
- struct kmem_cache *s;
- struct test_kfree_rcu_struct *p;
- if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
kunit_skip(test, "can't do kfree_rcu() when test is built-in");
- s = test_kmem_cache_create("TestSlub_kfree_rcu",
sizeof(struct test_kfree_rcu_struct),
SLAB_NO_MERGE);
- p = kmem_cache_alloc(s, GFP_KERNEL);
kfree_rcu(p, rcu); kmem_cache_destroy(s);
On 10/2/24 03:26, Vlastimil Babka wrote:
On 10/1/24 18:20, Vlastimil Babka wrote:
Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.ne...
I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups.
Everything works with your patches applied, so we are good.
Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing.
I have a bunch of debug options enabled. Configuration (generated using "make savedefconfig") for x86_64 is attached.
Thanks, Guenter
On 10/2/24 15:52, Guenter Roeck wrote:
On 10/2/24 03:26, Vlastimil Babka wrote:
On 10/1/24 18:20, Vlastimil Babka wrote:
Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.ne...
I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups.
Everything works with your patches applied, so we are good.
Thanks for testing! Queued for -next now and will send to Linus later if all's good.
Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing.
I have a bunch of debug options enabled. Configuration (generated using "make savedefconfig") for x86_64 is attached.
Hmm, didn't see the hang with that (using virtme-ng) on v6.12-rc1. Guess there's something more to it. Oh well.
Thanks, Guenter
On 10/1/24 09:20, Vlastimil Babka wrote:
Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.ne... Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Uladzislau Rezki urezki@gmail.com Cc: rcu@vger.kernel.org Cc: Brendan Higgins brendanhiggins@google.com Cc: David Gow davidgow@google.com Cc: Rae Moar rmoar@google.com Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Signed-off-by: Vlastimil Babka vbabka@suse.cz
This results in:
KTAP version 1 # Subtest: slub_test # module: slub_kunit 1..8 # test_clobber_zone: pass:1 fail:0 skip:0 total:1 ok 1 test_clobber_zone # test_next_pointer: pass:1 fail:0 skip:0 total:1 ok 2 test_next_pointer # test_first_word: pass:1 fail:0 skip:0 total:1 ok 3 test_first_word # test_clobber_50th_byte: pass:1 fail:0 skip:0 total:1 ok 4 test_clobber_50th_byte # test_clobber_redzone_free: pass:1 fail:0 skip:0 total:1 ok 5 test_clobber_redzone_free # test_kmalloc_redzone_access: pass:1 fail:0 skip:0 total:1 ok 6 test_kmalloc_redzone_access # test_kfree_rcu: pass:0 fail:0 skip:1 total:1 ok 7 test_kfree_rcu # SKIP can't do kfree_rcu() when test is built-in # test_leak_destroy: pass:1 fail:0 skip:0 total:1 ok 8 test_leak_destroy # slub_test: pass:7 fail:0 skip:1 total:8
Tested-by: Guenter Roeck linux@roeck-us.net
Thanks, Guenter
linux-kselftest-mirror@lists.linaro.org