Hi,
On a PREEMPT_RT kernel based on v6.16-rc1, I hit the following splat:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | Preemption disabled at: | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline] | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997 | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) | __dump_stack+0x30/0x40 lib/dump_stack.c:94 | dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120 | dump_stack+0x1c/0x3c lib/dump_stack.c:129 | __might_resched+0x2e4/0x52c kernel/sched/core.c:8800 | __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline] | rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57 | spin_lock include/linux/spinlock_rt.h:44 [inline] | force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302 | force_sig_fault_to_task kernel/signal.c:1699 [inline] | force_sig_fault+0xc4/0x110 kernel/signal.c:1704 | arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265 | send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline] | single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257 | do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002 | el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756 | el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline, removed code that previously avoided sleeping context issues when handling debug exceptions: Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/...
This appears to be triggered when force_sig_fault() is called from debug exception context, which is not sleepable under PREEMPT_RT.
I understand that this path is primarily for debugging, but I would like to discuss whether the patch needs some adjustment for PREEMPT_RT.
I also found that the issue can be reproduced depending on the changes introduced by the following commit: Link: https://github.com/torvalds/linux/commit/d8bb6718c4d
arm64: Make debug exception handlers visible from RCU Make debug exceptions visible from RCU so that synchronize_rcu() correctly tracks the debug exception handler.
This also introduces sanity checks for user-mode exceptions as same as x86's ist_enter()/ist_exit().
The debug exception can interrupt in idle task. For example, it warns if we put a kprobe on a function called from idle task as below. The warning message showed that the rcu_read_lock() caused this problem. But actually, this means the RCU lost the context which was already in NMI/IRQ.
/sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events /sys/kernel/debug/tracing # echo 1 > events/kprobes/enable ...
For reference: - v5.2.10: https://elixir.bootlin.com/linux/v5.2.10/source/arch/arm64/mm/fault.c#L810 - v5.3-rc3: https://elixir.bootlin.com/linux/v5.3-rc3/source/arch/arm64/mm/fault.c#L787
Do we need to restore some form of non-sleeping signal delivery in debug exception context for PREEMPT_RT, or is there another preferred fix?
Thanks, Yunseong
Hi Yunseong,
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | Preemption disabled at: | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline] | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997 | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) | __dump_stack+0x30/0x40 lib/dump_stack.c:94 | dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120 | dump_stack+0x1c/0x3c lib/dump_stack.c:129 | __might_resched+0x2e4/0x52c kernel/sched/core.c:8800 | __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline] | rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57 | spin_lock include/linux/spinlock_rt.h:44 [inline] | force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302 | force_sig_fault_to_task kernel/signal.c:1699 [inline] | force_sig_fault+0xc4/0x110 kernel/signal.c:1704 | arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265 | send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline] | single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257 | do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002 | el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756 | el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline, removed code that previously avoided sleeping context issues when handling debug exceptions: Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/...
No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry) solves your splat since el0_brk64() doesn't call debug_exception_enter() by spliting el0/el1 brk64 entry exception entry.
Formerly, el(0/1)_dbg() are handled in do_debug_exception() together and it calls debug_exception_enter() disabling preemption and this makes your splat while handling brk excepttion from el0.
[...]
Thanks.
-- Sincerely, Yeoreum Yun
Hi Yeoreum,
Thank you for pointing it!
On 8/13/25 3:56 PM, Yeoreum Yun wrote:
Hi Yunseong,
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | Preemption disabled at: | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline] | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997 | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) | __dump_stack+0x30/0x40 lib/dump_stack.c:94 | dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120 | dump_stack+0x1c/0x3c lib/dump_stack.c:129 | __might_resched+0x2e4/0x52c kernel/sched/core.c:8800 | __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline] | rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57 | spin_lock include/linux/spinlock_rt.h:44 [inline] | force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302 | force_sig_fault_to_task kernel/signal.c:1699 [inline] | force_sig_fault+0xc4/0x110 kernel/signal.c:1704 | arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265 | send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline] | single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257 | do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002 | el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756 | el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline, removed code that previously avoided sleeping context issues when handling debug exceptions: Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/...
No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry) solves your splat since el0_brk64() doesn't call debug_exception_enter() by spliting el0/el1 brk64 entry exception entry.
Formerly, el(0/1)_dbg() are handled in do_debug_exception() together and it calls debug_exception_enter() disabling preemption and this makes your splat while handling brk excepttion from el0.
Do you think a fix is necessary if this issue also affects the LTS kernel before 6.17-rc1? As far as I know, most production RT kernels are still based on the existing LTS versions.
-- Sincerely, Yeoreum Yun
Thank you, Yunseong
+Ada Couprie Diaz
Hi Yeoreum,
Thank you for pointing it!
On 8/13/25 3:56 PM, Yeoreum Yun wrote:
Hi Yunseong,
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | Preemption disabled at: | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline] | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997 | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) | __dump_stack+0x30/0x40 lib/dump_stack.c:94 | dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120 | dump_stack+0x1c/0x3c lib/dump_stack.c:129 | __might_resched+0x2e4/0x52c kernel/sched/core.c:8800 | __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline] | rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57 | spin_lock include/linux/spinlock_rt.h:44 [inline] | force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302 | force_sig_fault_to_task kernel/signal.c:1699 [inline] | force_sig_fault+0xc4/0x110 kernel/signal.c:1704 | arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265 | send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline] | single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257 | do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002 | el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756 | el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline, removed code that previously avoided sleeping context issues when handling debug exceptions: Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/...
No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry) solves your splat since el0_brk64() doesn't call debug_exception_enter() by spliting el0/el1 brk64 entry exception entry.
Formerly, el(0/1)_dbg() are handled in do_debug_exception() together and it calls debug_exception_enter() disabling preemption and this makes your splat while handling brk excepttion from el0.
Do you think a fix is necessary if this issue also affects the LTS kernel before 6.17-rc1? As far as I know, most production RT kernels are still based on the existing LTS versions.
IMHO, I think her patch should be backedported.
[0]: https://lore.kernel.org/all/20250707114109.35672-1-ada.coupriediaz@arm.com/
Thanks.
-- Sincerely, Yeoreum Yun
On Wed, Aug 13, 2025 at 09:59:06AM +0100, Yeoreum Yun wrote:
+Ada Couprie Diaz
Hi Yeoreum,
Thank you for pointing it!
On 8/13/25 3:56 PM, Yeoreum Yun wrote:
Hi Yunseong,
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | Preemption disabled at: | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline] | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997 | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) | __dump_stack+0x30/0x40 lib/dump_stack.c:94 | dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120 | dump_stack+0x1c/0x3c lib/dump_stack.c:129 | __might_resched+0x2e4/0x52c kernel/sched/core.c:8800 | __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline] | rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57 | spin_lock include/linux/spinlock_rt.h:44 [inline] | force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302 | force_sig_fault_to_task kernel/signal.c:1699 [inline] | force_sig_fault+0xc4/0x110 kernel/signal.c:1704 | arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265 | send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline] | single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257 | do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002 | el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756 | el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline, removed code that previously avoided sleeping context issues when handling debug exceptions: Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/...
No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry) solves your splat since el0_brk64() doesn't call debug_exception_enter() by spliting el0/el1 brk64 entry exception entry.
Formerly, el(0/1)_dbg() are handled in do_debug_exception() together and it calls debug_exception_enter() disabling preemption and this makes your splat while handling brk excepttion from el0.
Do you think a fix is necessary if this issue also affects the LTS kernel before 6.17-rc1? As far as I know, most production RT kernels are still based on the existing LTS versions.
IMHO, I think her patch should be backedported.
I also strongly suggest backporting Ada's patch series, as without them using anything that resorts to debug exceptions (ptrace, gdb, ...) on aarch64 with PREEMPT_RT enabled may result in a backtrace or worse.
Luis
Thanks.
-- Sincerely, Yeoreum Yun
---end quoted text---
Hi all,
On 13/08/2025 11:06, Luis Claudio R. Goncalves wrote:
On Wed, Aug 13, 2025 at 09:59:06AM +0100, Yeoreum Yun wrote:
+Ada Couprie Diaz
Thanks for the ping !
Hi Yeoreum,
Thank you for pointing it!
On 8/13/25 3:56 PM, Yeoreum Yun wrote:
Hi Yunseong,
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | Preemption disabled at: | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline] | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997 | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) | __dump_stack+0x30/0x40 lib/dump_stack.c:94 | dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120 | dump_stack+0x1c/0x3c lib/dump_stack.c:129 | __might_resched+0x2e4/0x52c kernel/sched/core.c:8800 | __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline] | rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57 | spin_lock include/linux/spinlock_rt.h:44 [inline] | force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302 | force_sig_fault_to_task kernel/signal.c:1699 [inline] | force_sig_fault+0xc4/0x110 kernel/signal.c:1704 | arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265 | send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline] | single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257 | do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002 | el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756 | el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline, removed code that previously avoided sleeping context issues when handling debug exceptions: Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/...
No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry) solves your splat since el0_brk64() doesn't call debug_exception_enter() by spliting el0/el1 brk64 entry exception entry.
Formerly, el(0/1)_dbg() are handled in do_debug_exception() together and it calls debug_exception_enter() disabling preemption and this makes your splat while handling brk excepttion from el0.
That's correct : one of the goal of the series was to be able to adapt each debug exception handler to what is needed, which allowed us to keep preemption enabled, or re-enable it much earlier, to prevent issues as above for some exceptions.
Do you think a fix is necessary if this issue also affects the LTS kernel before 6.17-rc1? As far as I know, most production RT kernels are still based on the existing LTS versions.
Luis originally reported the issue on kernels 6.13-rt and 6.14-rc1[1]. After some quick testing, the issue is present on 6.1-rt, 6.6-rt and 6.12-rt as well. 5.15-rt either doesn't have the issue, or doesn't report it.
IMHO, I think her patch should be backedported.
I also strongly suggest backporting Ada's patch series, as without them using anything that resorts to debug exceptions (ptrace, gdb, ...) on aarch64 with PREEMPT_RT enabled may result in a backtrace or worse.
Luis
Hopefully it shouldn't be too hard to backport for recent kernels, as I don't think those areas change a lot, but I haven't looked into it.
I'm not sure when I would have time to work on backporting, but I'd be happy to help anyway or do it if I have the time in the future, given there seems to be some interest (and good reasons).
Thanks.
-- Sincerely, Yeoreum Yun
Best, Ada [1]: https://lore.kernel.org/linux-arm-kernel/Z6YW_Kx4S2tmj2BP@uudg.org/
linux-stable-mirror@lists.linaro.org