On Sat, Aug 02, 2025 at 03:45:51PM +0530, Naresh Kamboju wrote:
Regressions found while validating Linux next on the Radxa Rock Pi 4B platform, we observed kernel crashes and deadlock warnings when running LTP syscall and controller tests under specific PREEMPT_RT configurations. These issues appear to be regressions introduced in next-20250729.
- CONFIG_EXPERT=y
- CONFIG_PREEMPT_RT=y
- CONFIG_LAZY_PREEMPT=y
Regression Analysis:
- New regression? Yes
- Reproducibility? Intermittent
First seen on the next-20250729 Good: next-20250728 Bad: next-20250729 and next-20250801
Test regression: next-20250729 rock Pi 4b Internal error Oops kmem_cache_alloc_bulk_noprof Test regression: next-20250729 rock Pi 4b WARNING kernel locking rtmutex.c at __rt_mutex_slowlock_locked Test regression: next-20250729 rock Pi 4b WARNING kernel rcu tree_plugin.h at rcu_note_context_switch
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Thanks for the report, Naresh!
based on the stack trace, I think there might be a use-after-free or buffer overflow bug that could trigger this.
Could you please try to reproduce it with KASAN enabled to confirm that it is the case?
## Test log [ 527.570253] Unable to handle kernel paging request at virtual address 003f0020f94020a1 [ 527.570274] Mem abort info: [ 527.570277] ESR = 0x0000000096000004 [ 527.570282] EC = 0x25: DABT (current EL), IL = 32 bits [ 527.570288] SET = 0, FnV = 0 [ 527.570292] EA = 0, S1PTW = 0 [ 527.570297] FSC = 0x04: level 0 translation fault [ 527.570302] Data abort info: [ 527.570305] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 527.570310] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 527.570316] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 527.570322] [003f0020f94020a1] address between user and kernel address ranges [ 527.570330] Internal error: Oops: 0000000096000004 [#1] SMP [ 527.570336] Modules linked in: brcmfmac rockchip_dfi brcmutil cfg80211 snd_soc_hdmi_codec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_audio_graph_card hci_uart snd_soc_rockchip_i2s snd_soc_es8316 snd_soc_spdif_tx snd_soc_simple_card_utils btqca rtc_rk808 rockchipdrm btbcm snd_soc_core dw_hdmi_qp bluetooth snd_compress reset_gpio analogix_dp snd_pcm_dmaengine panfrost hantro_vpu dw_mipi_dsi rfkill rockchip_rga drm_shmem_helper drm_dp_aux_bus snd_pcm gpu_sched dw_hdmi pwrseq_core videobuf2_dma_sg v4l2_vp9 snd_timer drm_display_helper v4l2_h264 v4l2_jpeg phy_rockchip_pcie snd v4l2_mem2mem cec videobuf2_dma_contig soundcore videobuf2_memops drm_client_lib videobuf2_v4l2 drm_dma_helper videobuf2_common rockchip_saradc drm_kms_helper industrialio_triggered_buffer kfifo_buf rockchip_thermal pcie_rockchip_host coresight_cpu_debug fuse drm backlight [ 527.570493] CPU: 3 UID: 0 PID: 34254 Comm: mkdir Not tainted 6.16.0-next-20250801 #1 PREEMPT_RT [ 527.570502] Hardware name: Radxa ROCK Pi 4B (DT) [ 527.570506] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 527.570512] pc : kmem_cache_alloc_bulk_noprof (mm/slub.c:5343 (discriminator 1) mm/slub.c:5403 (discriminator 1)) [ 527.570527] lr : kmem_cache_alloc_bulk_noprof (include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/kfence.h:127 mm/slub.c:5307 mm/slub.c:5403) [ 527.570533] sp : ffff80008e24b8f0 [ 527.570536] x29: ffff80008e24b930 x28: 00ff000000584610 x27: ffff800082b30538 [ 527.570545] x26: ffff8000816b64dc x25: 0000000000000cc0 x24: 0000000000000000 [ 527.570554] x23: 0000000000000004 x22: ffff0000f7579d20 x21: 0000000000000001 [ 527.570563] x20: 0000000000000001 x19: ffff000000405b00 x18: ffff80008e24bcd0 [ 527.570572] x17: 0000000000000000 x16: ffff800081e18420 x15: 0000ffffa2670fff [ 527.570582] x14: 0000000000000000 x13: 1fffe000017942e1 x12: 0000ffffa2470fff [ 527.570591] x11: ffff00000bca1708 x10: 0000000000000001 x9 : ffff8000816e41a4 [ 527.570600] x8 : ffff80008e24b850 x7 : fefefefefefefefe x6 : ffff800082b30000 [ 527.570608] x5 : d63f0020f9402021 x4 : ffff0000f7579d58 x3 : 0000000000000000 [ 527.570617] x2 : 0000000000000000 x1 : 0000000000000100 x0 : 0000000000000080 [ 527.570627] Call trace: [ 527.570631] kmem_cache_alloc_bulk_noprof (mm/slub.c:5343
It crashes in get_freepointer() because a bad pointer is passed to it.
I think it is reasonable to suspect that the freelist chain is corrupt due to a use-after-free or buffer overflow (either in maple tree code, or something else that shares the cache with maple tree nodes).
(discriminator 1) mm/slub.c:5403 (discriminator 1)) (P) [ 527.570639] mas_alloc_nodes (lib/maple_tree.c:1278) [ 527.570651] mas_node_count_gfp (lib/maple_tree.c:1339) [ 527.570661] mas_preallocate (lib/maple_tree.c:5538 (discriminator 1)) [ 527.570667] __split_vma (mm/vma.c:528 (discriminator 1)) [ 527.570677] vma_modify (mm/vma.c:1633) [ 527.570685] vma_modify_flags (mm/vma.c:1650) [ 527.570694] mprotect_fixup (mm/mprotect.c:819) [ 527.570704] do_mprotect_pkey (mm/mprotect.c:993) [ 527.570713] __arm64_sys_mprotect (mm/mprotect.c:1011) [ 527.570722] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:54) [ 527.570731] el0_svc_common.constprop.0 (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2)) [ 527.570737] do_el0_svc (arch/arm64/kernel/syscall.c:152) [ 527.570744] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:169 (discriminator 1) arch/arm64/kernel/entry-common.c:182 (discriminator 1) arch/arm64/kernel/entry-common.c:880 (discriminator 1)) [ 527.570752] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:899) [ 527.570760] el0t_64_sync (arch/arm64/kernel/entry.S:596) [ 527.570772] Code: 1400000c f94002c5 b4000aa5 b9402a60 (f86068a0) All code ======== 0: 1400000c b 0x30 4: f94002c5 ldr x5, [x22] 8: b4000aa5 cbz x5, 0x15c c: b9402a60 ldr w0, [x19, #40] 10:* f86068a0 ldr x0, [x5, x0] <-- trapping instruction
Code starting with the faulting instruction
0: f86068a0 ldr x0, [x5, x0] [ 527.570778] ---[ end trace 0000000000000000 ]--- [ 527.570800] ------------[ cut here ]------------ [ 527.570803] rtmutex deadlock detected [ 527.570827] WARNING: kernel/locking/rtmutex.c:1674 at __rt_mutex_slowlock_locked.constprop.0+0x800/0x8e0, CPU#3: mkdir/34254 [ 527.570841] Modules linked in: brcmfmac rockchip_dfi brcmutil cfg80211 snd_soc_hdmi_codec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_audio_graph_card hci_uart snd_soc_rockchip_i2s snd_soc_es8316 snd_soc_spdif_tx snd_soc_simple_card_utils btqca rtc_rk808 rockchipdrm btbcm snd_soc_core dw_hdmi_qp bluetooth snd_compress reset_gpio analogix_dp snd_pcm_dmaengine panfrost hantro_vpu dw_mipi_dsi rfkill rockchip_rga drm_shmem_helper drm_dp_aux_bus snd_pcm gpu_sched dw_hdmi pwrseq_core videobuf2_dma_sg v4l2_vp9 snd_timer drm_display_helper v4l2_h264 v4l2_jpeg phy_rockchip_pcie snd v4l2_mem2mem cec videobuf2_dma_contig soundcore videobuf2_memops drm_client_lib videobuf2_v4l2 drm_dma_helper videobuf2_common rockchip_saradc drm_kms_helper industrialio_triggered_buffer kfifo_buf rockchip_thermal pcie_rockchip_host coresight_cpu_debug fuse drm backlight [ 527.570965] CPU: 3 UID: 0 PID: 34254 Comm: mkdir Tainted: G D 6.16.0-next-20250801 #1 PREEMPT_RT [ 527.570973] Tainted: [D]=DIE [ 527.570976] Hardware name: Radxa ROCK Pi 4B (DT) [ 527.570979] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 527.570985] pc : __rt_mutex_slowlock_locked.constprop.0 (kernel/locking/rtmutex.c:1674 (discriminator 1) kernel/locking/rtmutex.c:1734 (discriminator 1) kernel/locking/rtmutex.c:1760 (discriminator 1)) [ 527.570993] lr : __rt_mutex_slowlock_locked.constprop.0 (kernel/locking/rtmutex.c:1674 (discriminator 1) kernel/locking/rtmutex.c:1734 (discriminator 1) kernel/locking/rtmutex.c:1760 (discriminator 1)) [ 527.571001] sp : ffff80008e24b2e0 [ 527.571004] x29: ffff80008e24b370 x28: ffff000007c1a600 x27: ffff000007c1a601 [ 527.571013] x26: ffff80008e24b318 x25: ffff000007c1a600 x24: 00000000ffffffdd [ 527.571023] x23: 00000000ffffffdd x22: 0000000000000001 x21: ffff000007c1a601 [ 527.571032] x20: ffff80008e24b2f0 x19: ffff00001d756e30 x18: 0000000000000000 [ 527.571041] x17: 65676e6172207373 x16: 6572646461206c65 x15: 0000000000000000 [ 527.571050] x14: 0000000000000020 x13: 0a64657463657465 x12: 64206b636f6c6461 [ 527.571059] x11: 656820747563205b x10: ffff800082adac08 x9 : ffff80008015020c [ 527.571068] x8 : ffff80008e24af08 x7 : 0000000000000001 x6 : ffff800082a5e000 [ 527.571077] x5 : ffff000007c1a600 x4 : 0000000000000000 x3 : 0000000000000027 [ 527.571086] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000007c1a600 [ 527.571094] Call trace: [ 527.571097] __rt_mutex_slowlock_locked.constprop.0 (kernel/locking/rtmutex.c:1674 (discriminator 1) kernel/locking/rtmutex.c:1734 (discriminator 1) kernel/locking/rtmutex.c:1760 (discriminator 1)) (P) [ 527.571107] __rwbase_read_lock (kernel/locking/rwbase_rt.c:114) [ 527.571117] down_read (kernel/locking/rwsem.c:1540) [ 527.571126] acct_collect (arch/arm64/include/asm/jump_label.h:36 include/linux/mmap_lock.h:41 include/linux/mmap_lock.h:454 kernel/acct.c:597) [ 527.571135] do_exit (kernel/exit.c:932) [ 527.571143] make_task_dead (arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 2) arch/arm64/include/asm/atomic.h:52 (discriminator 2) include/linux/atomic/atomic-arch-fallback.h:564 (discriminator 2) include/linux/atomic/atomic-arch-fallback.h:1020 (discriminator 2) include/linux/atomic/atomic-instrumented.h:454 (discriminator 2) kernel/exit.c:1049 (discriminator 2)) [ 527.571150] die (arch/arm64/kernel/traps.c:231) [ 527.571157] die_kernel_fault (arch/arm64/mm/fault.c:313) [ 527.571167] __do_kernel_fault (arch/arm64/mm/fault.c:375 (discriminator 5)) [ 527.571177] do_bad_area (arch/arm64/mm/fault.c:482) [ 527.571186] do_translation_fault (arch/arm64/mm/fault.c:792) [ 527.571195] do_mem_abort (arch/arm64/mm/fault.c:929 (discriminator 1)) [ 527.571204] el1_abort (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:482) [ 527.571210] el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:637) [ 527.571218] el1h_64_sync (arch/arm64/kernel/entry.S:591)
The deadlock wouldn't have been triggered if it didn't crash in SLUB, so this looks rather irrelevant.