Re: [PATCH] arm64: don't preempt_disable in do_debug_exception

25 Jun 2020


      On Tue, 23 Jun 2020 17:55:57 +0100
Mark Rutland mark.rutland@arm.com wrote:
...
On Tue, Jun 23, 2020 at 04:59:01PM +0100, Will Deacon wrote:
...
On Thu, Jun 18, 2020 at 01:29:29PM -0400, Paul Gortmaker wrote:
...
In commit d8bb6718c4db ("arm64: Make debug exception handlers visible
from RCU") debug_exception_enter and exit were added to deal with better
tracking of RCU state - however, in addition to that, but not mentioned
in the commit log, a preempt_disable/enable pair were also added.
Based on the comment (being removed here) it would seem that the pair
were not added to address a specific problem, but just as a proactive,
preventative measure - as in "seemed like a good idea at the time".
The problem is that do_debug_exception() eventually calls out to
generic kernel code like do_force_sig_info() which takes non-raw locks
and on -rt enabled kernels, results in things looking like the following,
since on -rt kernels, it is noticed that preemption is still disabled.
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
 in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
 Preemption disabled at:
 [<ffff000010081578>] do_debug_exception+0x38/0x1a4
 Call trace:
 dump_backtrace+0x0/0x138
 show_stack+0x24/0x30
 dump_stack+0x94/0xbc
 ___might_sleep+0x13c/0x168
 rt_spin_lock+0x40/0x80
 do_force_sig_info+0x30/0xe0
 force_sig_fault+0x64/0x90
 arm64_force_sig_fault+0x50/0x80
 send_user_sigtrap+0x50/0x80
 brk_handler+0x98/0xc8
 do_debug_exception+0x70/0x1a4
 el0_dbg+0x18/0x20
The reproducer was basically an automated gdb test that set a breakpoint
on a simple "hello world" program and then quit gdb once the breakpoint
was hit - i.e. "(gdb) A debugging session is active.  Quit anyway? "
Hmm, the debug exception handler path was definitely written with the
expectation that preemption is disabled, so this is unfortunate. For
exceptions from kernelspace, we need to keep that guarantee as we implement
things like BUG() using this path. For exceptions from userspace, it's
plausible that we could re-enable preemption, but then we should also
re-enable interrupts and debug exceptions too because we don't
context-switch pstate in switch_to() and we would end up with holes in our
kernel debug coverage (and these might be fatal if e.g. single step doesn't
work in a kprobe OOL buffer). However, that then means that any common code
when handling user and kernel debug exceptions needs to be re-entrant,
which it probably isn't at the moment (I haven't checked).
I'm pretty certain existing code is not reentrant, and regardless it's
going to be a mess to reason about this generally if we have to undo our
strict exception nesting rules.
Sounds like a kprobe post-handler hits another kprobe, which might invoke
the debug handler in debug context. If kprobes find that, it skips the
nested one, but it needs to do single stepping in it to exit.
Is that not possible on arm64?
Thank you,
...
I reckon we need to treat this like an NMI instead -- is that plausible?
Mark.
-- 
Masami Hiramatsu mhiramat@kernel.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] arm64: don't preempt_disable in do_debug_exception