When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following bug is triggered in the ksoftirqd context.
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 30, name: ksoftirqd/1 | preempt_count: 0, expected: 0 | RCU nest depth: 2, expected: 2 | CPU: 1 UID: 0 PID: 30 Comm: ksoftirqd/1 Tainted: G W 6.16.0-rc1-rt1 #11 PREEMPT_RT | Tainted: [W]=WARN | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c (C) | __dump_stack+0x30/0x40 | dump_stack_lvl+0x148/0x1d8 | dump_stack+0x1c/0x3c | __might_resched+0x2e4/0x52c | rt_spin_lock+0xa8/0x1bc | kcov_remote_start+0xb0/0x490 | __usb_hcd_giveback_urb+0x2d0/0x5e8 | usb_giveback_urb_bh+0x234/0x3c4 | process_scheduled_works+0x678/0xd18 | bh_worker+0x2f0/0x59c | workqueue_softirq_action+0x104/0x14c | tasklet_action+0x18/0x8c | handle_softirqs+0x208/0x63c | run_ksoftirqd+0x64/0x264 | smpboot_thread_fn+0x4ac/0x908 | kthread+0x5e8/0x734 | ret_from_fork+0x10/0x20
To reproduce on PREEMPT_RT kernel:
$ git remote add rt-devel git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git $ git fetch rt-devel $ git checkout -b v6.16-rc1-rt1 v6.16-rc1-rt1
I have attached the syzlang and the C source code converted by syz-prog2c:
Link: https://gist.github.com/kzall0c/9455aaa246f4aa1135353a51753adbbe
Then, run with a PREEMPT_RT config.
This issue was introduced by commit f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save() call establishes an atomic context where sleeping is forbidden. Inside this context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in a sleeping function called from invalid context.
On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario is already safely handled by the existing local_lock_t and the global kcov_remote_lock within kcov_remote_start(). Therefore, the outer local_irq_save() is not necessary.
This preserves the intended re-entrancy protection for non-RT kernels while resolving the locking violation on PREEMPT_RT kernels.
After making this modification and testing it, syzkaller fuzzing the PREEMPT_RT kernel is now running without stopping on latest announced Real-time Linux.
Link: https://lore.kernel.org/linux-rt-devel/20250610080307.LMm1hleC@linutronix.de... Fixes: f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq") Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Cc: Alan Stern stern@rowland.harvard.edu Cc: Dmitry Vyukov dvyukov@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Byungchul Park byungchul@sk.com Cc: stable@vger.kernel.org Cc: kasan-dev@googlegroups.com Cc: syzkaller@googlegroups.com Cc: linux-usb@vger.kernel.org Cc: linux-rt-devel@lists.linux.dev Signed-off-by: Yunseong Kim ysk@kzalloc.com --- include/linux/kcov.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/kcov.h b/include/linux/kcov.h index 75a2fb8b16c3..c5e1b2dd0bb7 100644 --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -85,7 +85,9 @@ static inline unsigned long kcov_remote_start_usb_softirq(u64 id) unsigned long flags = 0;
if (in_serving_softirq()) { +#ifndef CONFIG_PREEMPT_RT local_irq_save(flags); +#endif kcov_remote_start_usb(id); }
@@ -96,7 +98,9 @@ static inline void kcov_remote_stop_softirq(unsigned long flags) { if (in_serving_softirq()) { kcov_remote_stop(); +#ifndef CONFIG_PREEMPT_RT local_irq_restore(flags); +#endif } }
On Fri, Jul 25, 2025 at 08:14:01PM +0000, Yunseong Kim wrote:
When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following bug is triggered in the ksoftirqd context.
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 30, name: ksoftirqd/1 | preempt_count: 0, expected: 0 | RCU nest depth: 2, expected: 2 | CPU: 1 UID: 0 PID: 30 Comm: ksoftirqd/1 Tainted: G W 6.16.0-rc1-rt1 #11 PREEMPT_RT | Tainted: [W]=WARN | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c (C) | __dump_stack+0x30/0x40 | dump_stack_lvl+0x148/0x1d8 | dump_stack+0x1c/0x3c | __might_resched+0x2e4/0x52c | rt_spin_lock+0xa8/0x1bc | kcov_remote_start+0xb0/0x490 | __usb_hcd_giveback_urb+0x2d0/0x5e8 | usb_giveback_urb_bh+0x234/0x3c4 | process_scheduled_works+0x678/0xd18 | bh_worker+0x2f0/0x59c | workqueue_softirq_action+0x104/0x14c | tasklet_action+0x18/0x8c | handle_softirqs+0x208/0x63c | run_ksoftirqd+0x64/0x264 | smpboot_thread_fn+0x4ac/0x908 | kthread+0x5e8/0x734 | ret_from_fork+0x10/0x20
Why is this only a USB thing? What is unique about it to trigger this issue?
thanks,
greg k-h
On 2025/07/26 15:36, Greg Kroah-Hartman wrote:
Why is this only a USB thing? What is unique about it to trigger this issue?
I couldn't catch your question. But the answer could be that
__usb_hcd_giveback_urb() is a function which is a USB thing
and
kcov_remote_start_usb_softirq() is calling local_irq_save() despite CONFIG_PREEMPT_RT=y
as shown below.
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif } } (...snipped...) spin_lock(&kcov_remote_lock) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock.h#L... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock_rt.... // mapped to rt_mutex which might sleep #endif } (...snipped...) } } } } } (...snipped...) }
On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
On 2025/07/26 15:36, Greg Kroah-Hartman wrote:
Why is this only a USB thing? What is unique about it to trigger this issue?
I couldn't catch your question. But the answer could be that
__usb_hcd_giveback_urb() is a function which is a USB thing
and
kcov_remote_start_usb_softirq() is calling local_irq_save() despite CONFIG_PREEMPT_RT=y
as shown below.
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif } } (...snipped...) spin_lock(&kcov_remote_lock) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock.h#L... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock_rt.... // mapped to rt_mutex which might sleep #endif } (...snipped...) } } } } } (...snipped...) }
Ok, but then how does the big comment section for kcov_remote_start_usb_softirq() work, where it explicitly states:
* 2. Disables interrupts for the duration of the coverage collection section. * This allows avoiding nested remote coverage collection sections in the * softirq context (a softirq might occur during the execution of a work in * the BH workqueue, which runs with in_serving_softirq() > 0). * For example, usb_giveback_urb_bh() runs in the BH workqueue with * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in * the middle of its remote coverage collection section, and the interrupt * handler might invoke __usb_hcd_giveback_urb() again.
You are removing half of this function entirely, which feels very wrong to me as any sort of solution, as you have just said that all of that documentation entry is now not needed.
Are you sure this is ok?
thanks,
greg k-h
On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif
Right, it does not invoke local_irq_save(flags), but it takes the underlying lock, which means it prevents reentrance.
Ok, but then how does the big comment section for kcov_remote_start_usb_softirq() work, where it explicitly states:
- Disables interrupts for the duration of the coverage collection section.
- This allows avoiding nested remote coverage collection sections in the
- softirq context (a softirq might occur during the execution of a work in
- the BH workqueue, which runs with in_serving_softirq() > 0).
- For example, usb_giveback_urb_bh() runs in the BH workqueue with
- interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
- the middle of its remote coverage collection section, and the interrupt
- handler might invoke __usb_hcd_giveback_urb() again.
You are removing half of this function entirely, which feels very wrong to me as any sort of solution, as you have just said that all of that documentation entry is now not needed.
I'm not so sure because kcov_percpu_data.lock is only held within kcov_remote_start() and kcov_remote_stop(), but the above comment suggests that the whole section needs to be serialized.
Though I'm not a KCOV wizard and might be completely wrong here.
If the whole section is required to be serialized, then this need another local lock in kcov_percpu_data to work correctly on RT.
Thanks,
tglx
linux-stable-mirror@lists.linaro.org