On Tue, Aug 05, 2025 at 03:37:38PM +0530, Naresh Kamboju wrote:
On Mon, 4 Aug 2025 at 13:26, Harry Yoo harry.yoo@oracle.com wrote:
On Sat, Aug 02, 2025 at 03:45:51PM +0530, Naresh Kamboju wrote:
Regressions found while validating Linux next on the Radxa Rock Pi 4B platform, we observed kernel crashes and deadlock warnings when running LTP syscall and controller tests under specific PREEMPT_RT configurations. These issues appear to be regressions introduced in next-20250729.
- CONFIG_EXPERT=y
- CONFIG_PREEMPT_RT=y
- CONFIG_LAZY_PREEMPT=y
Regression Analysis:
- New regression? Yes
- Reproducibility? Intermittent
First seen on the next-20250729 Good: next-20250728 Bad: next-20250729 and next-20250801
Test regression: next-20250729 rock Pi 4b Internal error Oops kmem_cache_alloc_bulk_noprof Test regression: next-20250729 rock Pi 4b WARNING kernel locking rtmutex.c at __rt_mutex_slowlock_locked Test regression: next-20250729 rock Pi 4b WARNING kernel rcu tree_plugin.h at rcu_note_context_switch
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Thanks for the report, Naresh!
based on the stack trace, I think there might be a use-after-free or buffer overflow bug that could trigger this.
Could you please try to reproduce it with KASAN enabled to confirm that it is the case?
I have recompiled the kernel with KASAN enabled and rerun the KUNIT tests, along with the LTP syscall tests, in an effort to reproduce the previously reported issue.
While the LTP syscall tests did not reproduce the problem,
Thanks for checking it! It is unfortunate that the error is not reproduced with KASAN :(
We can still try slab_debug=FPU or slab_debug=FPUZ boot parameter. If we're lucky, that may help narrow down who corrupted the freelist. Could you please give it a try if it’s not too much trouble? It won't require rebuilding the kernel as SLUB_DEBUG is already enabled.
...and a few questions to help investigate it further:
- Is it something that is triggered only on (rock PI 4B) AND (PREEMPT_RT=y) AND (LAZY_PREEMPT=y), but not on other boards or the same board with different preemption models?
- With given infrastructure you're using, would it be reasonable to do bisection?
Unfortunately if the freelist chain is corrupted when we allocate objects, it's hard to tell who it is, without further information.
I consistently observed a null pointer dereference during KUNIT testing, specifically in the kunit_fault test, as shown in the log below.
I’ve seen this same crash across several kernel versions, and it is always reproducible when running KUNIT tests.
Could you please confirm if this behavior is expected from the kunit_fault test, or if it indicates an issue that requires further investigation?
I can confirm that this is an expected behavior. The test case voluntarily dereference a NULL pointer and checks if the task was killed because of it. The test case was added recently (since v6.10)
Thanks for your assistance!