On 10/2/24 03:26, Vlastimil Babka wrote:
On 10/1/24 18:20, Vlastimil Babka wrote:
Guenter Roeck reports that the new slub kunit tests added by commit 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") cause a lockup on boot on several architectures when the kunit tests are configured to be built-in and not modules.
The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection showed the runner for built-in kunit tests kunit_run_all_tests() is called before setting system_state to SYSTEM_RUNNING and calling rcu_end_inkernel_boot(), so this seems like a likely cause. So while I was unable to reproduce the problem myself, skipping the test when the slub_kunit module is built-in should avoid the issue.
An alternative fix that was moving the call to kunit_run_all_tests() a bit later in the boot was tried, but has broken tests with functions marked as __init due to free_initmem() already being done.
Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.ne...
I hope you can confirm it helps, because the commit added two tests and I've only skipped one of them, as it's the one using kfree_rcu(), which is suspected. But the other is responsible for the (now suppressed) kmem_cache_destroy() warning, and maybe I'm missing something and it was actually that one causing the lockups.
Everything works with your patches applied, so we are good.
Since you mentioned the boot lockups happened on some x86_64 too, do you have a .config of the lockup case? I've tried tweaking some rcu options but still nothing.
I have a bunch of debug options enabled. Configuration (generated using "make savedefconfig") for x86_64 is attached.
Thanks, Guenter