When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following bug is triggered in the ksoftirqd context.
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 30, name: ksoftirqd/1 | preempt_count: 0, expected: 0 | RCU nest depth: 2, expected: 2 | CPU: 1 UID: 0 PID: 30 Comm: ksoftirqd/1 Tainted: G W 6.16.0-rc1-rt1 #11 PREEMPT_RT | Tainted: [W]=WARN | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c (C) | __dump_stack+0x30/0x40 | dump_stack_lvl+0x148/0x1d8 | dump_stack+0x1c/0x3c | __might_resched+0x2e4/0x52c | rt_spin_lock+0xa8/0x1bc | kcov_remote_start+0xb0/0x490 | __usb_hcd_giveback_urb+0x2d0/0x5e8 | usb_giveback_urb_bh+0x234/0x3c4 | process_scheduled_works+0x678/0xd18 | bh_worker+0x2f0/0x59c | workqueue_softirq_action+0x104/0x14c | tasklet_action+0x18/0x8c | handle_softirqs+0x208/0x63c | run_ksoftirqd+0x64/0x264 | smpboot_thread_fn+0x4ac/0x908 | kthread+0x5e8/0x734 | ret_from_fork+0x10/0x20
To reproduce on PREEMPT_RT kernel:
$ git remote add rt-devel git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git $ git fetch rt-devel $ git checkout -b v6.16-rc1-rt1 v6.16-rc1-rt1
I have attached the syzlang and the C source code converted by syz-prog2c:
Link: https://gist.github.com/kzall0c/9455aaa246f4aa1135353a51753adbbe
Then, run with a PREEMPT_RT config.
This issue was introduced by commit f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save() call establishes an atomic context where sleeping is forbidden. Inside this context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in a sleeping function called from invalid context.
On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario is already safely handled by the existing local_lock_t and the global kcov_remote_lock within kcov_remote_start(). Therefore, the outer local_irq_save() is not necessary.
This preserves the intended re-entrancy protection for non-RT kernels while resolving the locking violation on PREEMPT_RT kernels.
After making this modification and testing it, syzkaller fuzzing the PREEMPT_RT kernel is now running without stopping on latest announced Real-time Linux.
Link: https://lore.kernel.org/linux-rt-devel/20250610080307.LMm1hleC@linutronix.de... Fixes: f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq") Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Cc: Alan Stern stern@rowland.harvard.edu Cc: Dmitry Vyukov dvyukov@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Byungchul Park byungchul@sk.com Cc: stable@vger.kernel.org Cc: kasan-dev@googlegroups.com Cc: syzkaller@googlegroups.com Cc: linux-usb@vger.kernel.org Cc: linux-rt-devel@lists.linux.dev Signed-off-by: Yunseong Kim ysk@kzalloc.com --- include/linux/kcov.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/kcov.h b/include/linux/kcov.h index 75a2fb8b16c3..c5e1b2dd0bb7 100644 --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -85,7 +85,9 @@ static inline unsigned long kcov_remote_start_usb_softirq(u64 id) unsigned long flags = 0;
if (in_serving_softirq()) { +#ifndef CONFIG_PREEMPT_RT local_irq_save(flags); +#endif kcov_remote_start_usb(id); }
@@ -96,7 +98,9 @@ static inline void kcov_remote_stop_softirq(unsigned long flags) { if (in_serving_softirq()) { kcov_remote_stop(); +#ifndef CONFIG_PREEMPT_RT local_irq_restore(flags); +#endif } }
On Fri, Jul 25, 2025 at 08:14:01PM +0000, Yunseong Kim wrote:
When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following bug is triggered in the ksoftirqd context.
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 30, name: ksoftirqd/1 | preempt_count: 0, expected: 0 | RCU nest depth: 2, expected: 2 | CPU: 1 UID: 0 PID: 30 Comm: ksoftirqd/1 Tainted: G W 6.16.0-rc1-rt1 #11 PREEMPT_RT | Tainted: [W]=WARN | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025 | Call trace: | show_stack+0x2c/0x3c (C) | __dump_stack+0x30/0x40 | dump_stack_lvl+0x148/0x1d8 | dump_stack+0x1c/0x3c | __might_resched+0x2e4/0x52c | rt_spin_lock+0xa8/0x1bc | kcov_remote_start+0xb0/0x490 | __usb_hcd_giveback_urb+0x2d0/0x5e8 | usb_giveback_urb_bh+0x234/0x3c4 | process_scheduled_works+0x678/0xd18 | bh_worker+0x2f0/0x59c | workqueue_softirq_action+0x104/0x14c | tasklet_action+0x18/0x8c | handle_softirqs+0x208/0x63c | run_ksoftirqd+0x64/0x264 | smpboot_thread_fn+0x4ac/0x908 | kthread+0x5e8/0x734 | ret_from_fork+0x10/0x20
Why is this only a USB thing? What is unique about it to trigger this issue?
thanks,
greg k-h
On 2025/07/26 15:36, Greg Kroah-Hartman wrote:
Why is this only a USB thing? What is unique about it to trigger this issue?
I couldn't catch your question. But the answer could be that
__usb_hcd_giveback_urb() is a function which is a USB thing
and
kcov_remote_start_usb_softirq() is calling local_irq_save() despite CONFIG_PREEMPT_RT=y
as shown below.
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif } } (...snipped...) spin_lock(&kcov_remote_lock) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock.h#L... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock_rt.... // mapped to rt_mutex which might sleep #endif } (...snipped...) } } } } } (...snipped...) }
On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
On 2025/07/26 15:36, Greg Kroah-Hartman wrote:
Why is this only a USB thing? What is unique about it to trigger this issue?
I couldn't catch your question. But the answer could be that
__usb_hcd_giveback_urb() is a function which is a USB thing
and
kcov_remote_start_usb_softirq() is calling local_irq_save() despite CONFIG_PREEMPT_RT=y
as shown below.
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif } } (...snipped...) spin_lock(&kcov_remote_lock) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock.h#L... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock_rt.... // mapped to rt_mutex which might sleep #endif } (...snipped...) } } } } } (...snipped...) }
Ok, but then how does the big comment section for kcov_remote_start_usb_softirq() work, where it explicitly states:
* 2. Disables interrupts for the duration of the coverage collection section. * This allows avoiding nested remote coverage collection sections in the * softirq context (a softirq might occur during the execution of a work in * the BH workqueue, which runs with in_serving_softirq() > 0). * For example, usb_giveback_urb_bh() runs in the BH workqueue with * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in * the middle of its remote coverage collection section, and the interrupt * handler might invoke __usb_hcd_giveback_urb() again.
You are removing half of this function entirely, which feels very wrong to me as any sort of solution, as you have just said that all of that documentation entry is now not needed.
Are you sure this is ok?
thanks,
greg k-h
On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif
Right, it does not invoke local_irq_save(flags), but it takes the underlying lock, which means it prevents reentrance.
Ok, but then how does the big comment section for kcov_remote_start_usb_softirq() work, where it explicitly states:
- Disables interrupts for the duration of the coverage collection section.
- This allows avoiding nested remote coverage collection sections in the
- softirq context (a softirq might occur during the execution of a work in
- the BH workqueue, which runs with in_serving_softirq() > 0).
- For example, usb_giveback_urb_bh() runs in the BH workqueue with
- interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
- the middle of its remote coverage collection section, and the interrupt
- handler might invoke __usb_hcd_giveback_urb() again.
You are removing half of this function entirely, which feels very wrong to me as any sort of solution, as you have just said that all of that documentation entry is now not needed.
I'm not so sure because kcov_percpu_data.lock is only held within kcov_remote_start() and kcov_remote_stop(), but the above comment suggests that the whole section needs to be serialized.
Though I'm not a KCOV wizard and might be completely wrong here.
If the whole section is required to be serialized, then this need another local lock in kcov_percpu_data to work correctly on RT.
Thanks,
tglx
Huge thanks to everyone for the feedback!
While working on earlier patches, running syzkaller on PREEMPT_RT uncovered numerous sleep-in-atomic-context bugs and other synchronization issues unique to that environment. This highlighted the need to address these problems.
On 7/26/25 8:59 오후, Thomas Gleixner wrote:
On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif
Right, it does not invoke local_irq_save(flags), but it takes the underlying lock, which means it prevents reentrance.
Ok, but then how does the big comment section for kcov_remote_start_usb_softirq() work, where it explicitly states:
- Disables interrupts for the duration of the coverage collection section.
- This allows avoiding nested remote coverage collection sections in the
- softirq context (a softirq might occur during the execution of a work in
- the BH workqueue, which runs with in_serving_softirq() > 0).
- For example, usb_giveback_urb_bh() runs in the BH workqueue with
- interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
- the middle of its remote coverage collection section, and the interrupt
- handler might invoke __usb_hcd_giveback_urb() again.
You are removing half of this function entirely, which feels very wrong to me as any sort of solution, as you have just said that all of that documentation entry is now not needed.
I'm not so sure because kcov_percpu_data.lock is only held within kcov_remote_start() and kcov_remote_stop(), but the above comment suggests that the whole section needs to be serialized.
Though I'm not a KCOV wizard and might be completely wrong here.
If the whole section is required to be serialized, then this need another local lock in kcov_percpu_data to work correctly on RT.
Thanks,
tglx
After receiving comments from maintainers, I realized that my initial patch set wasn't heading in the right direction.
It seems that the following two patches conflict on PREEMPT_RT kernels:
1. kcov: replace local_irq_save() with a local_lock_t Link: https://github.com/torvalds/linux/commit/d5d2c51f1e5f 2. kcov, usb: disable interrupts in kcov_remote_start_usb_softirq Link: https://github.com/torvalds/linux/commit/f85d39dd7ed8
My current approach involves:
* Removing the existing 'kcov_percpu_data.lock' * Converting 'kcov->lock' and 'kcov_remote_lock' to raw spinlocks * Relocating the kmalloc call for kcov_remote_add() outside kcov_ioctl_locked(), as GFP_ATOMIC allocations can potentially sleep under PREEMPT_RT. : As expected from further testing, keeping the GFP_ATOMIC allocation inside kcov_remote_add() still leads to sleep in atomic context.
This approach allows us to keep Andrey’s patch d5d2c51f1e5f while making modifications as Sebastian suggested in his commit f85d39dd7ed8 message, which I found particularly insightful and full of helpful hints.
The work I'm doing on PATCH v2 involves a number of changes, and I would truly appreciate any critical feedback. I'm always happy to hear insights!
Best regards, Yunseong Kim
On 2025-07-25 20:14:01 [+0000], Yunseong Kim wrote:
When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following bug is triggered in the ksoftirqd context.
…
This issue was introduced by commit f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save() call establishes an atomic context where sleeping is forbidden. Inside this context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in a sleeping function called from invalid context.
On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario is already safely handled by the existing local_lock_t and the global kcov_remote_lock within kcov_remote_start(). Therefore, the outer local_irq_save() is not necessary.
This preserves the intended re-entrancy protection for non-RT kernels while resolving the locking violation on PREEMPT_RT kernels.
After making this modification and testing it, syzkaller fuzzing the PREEMPT_RT kernel is now running without stopping on latest announced Real-time Linux.
This looks oddly familiar because I removed the irq-disable bits while adding local-locks.
Commit f85d39dd7ed8 looks wrong not that it shouldn't disable interrupts. The statement in the added comment
| + * 2. Disables interrupts for the duration of the coverage collection section. | + * This allows avoiding nested remote coverage collection sections in the | + * softirq context (a softirq might occur during the execution of a work in | + * the BH workqueue, which runs with in_serving_softirq() > 0).
is wrong. Softirqs are never nesting. While the BH workqueue is running another softirq does not occur. The softirq is raised (again) and will be handled _after_ BH workqueue is done. So this is already serialised.
The issue is __usb_hcd_giveback_urb() always invokes kcov_remote_start_usb_softirq(). __usb_hcd_giveback_urb() itself is invoked from BH context (for the majority of HCDs) and from hardirq context for the root-HUB. This gets us to the scenario that that we are in the give-back path in softirq context and then invoke the function once again in hardirq context.
I have no idea how kcov works but reverting the original commit and avoiding the false nesting due to hardirq context should do the trick, an untested patch follows.
This isn't any different than the tasklet handling that was used before so I am not sure why it is now a problem.
Could someone maybe test this?
--- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1636,7 +1636,6 @@ static void __usb_hcd_giveback_urb(struct urb *urb) struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); struct usb_anchor *anchor = urb->anchor; int status = urb->unlinked; - unsigned long flags;
urb->hcpriv = NULL; if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) && @@ -1654,14 +1653,13 @@ static void __usb_hcd_giveback_urb(struct urb *urb) /* pass ownership to the completion handler */ urb->status = status; /* - * Only collect coverage in the softirq context and disable interrupts - * to avoid scenarios with nested remote coverage collection sections - * that KCOV does not support. - * See the comment next to kcov_remote_start_usb_softirq() for details. + * This function can be called in task context inside another remote + * coverage collection section, but kcov doesn't support that kind of + * recursion yet. Only collect coverage in softirq context for now. */ - flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum); + kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum); urb->complete(urb); - kcov_remote_stop_softirq(flags); + kcov_remote_stop_softirq();
usb_anchor_resume_wakeups(anchor); atomic_dec(&urb->use_count); diff --git a/include/linux/kcov.h b/include/linux/kcov.h index 75a2fb8b16c32..0143358874b07 100644 --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -57,47 +57,21 @@ static inline void kcov_remote_start_usb(u64 id)
/* * The softirq flavor of kcov_remote_*() functions is introduced as a temporary - * workaround for KCOV's lack of nested remote coverage sections support. - * - * Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337. - * - * kcov_remote_start_usb_softirq(): - * - * 1. Only collects coverage when called in the softirq context. This allows - * avoiding nested remote coverage collection sections in the task context. - * For example, USB/IP calls usb_hcd_giveback_urb() in the task context - * within an existing remote coverage collection section. Thus, KCOV should - * not attempt to start collecting coverage within the coverage collection - * section in __usb_hcd_giveback_urb() in this case. - * - * 2. Disables interrupts for the duration of the coverage collection section. - * This allows avoiding nested remote coverage collection sections in the - * softirq context (a softirq might occur during the execution of a work in - * the BH workqueue, which runs with in_serving_softirq() > 0). - * For example, usb_giveback_urb_bh() runs in the BH workqueue with - * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in - * the middle of its remote coverage collection section, and the interrupt - * handler might invoke __usb_hcd_giveback_urb() again. + * work around for kcov's lack of nested remote coverage sections support in + * task context. Adding support for nested sections is tracked in: + * https://bugzilla.kernel.org/show_bug.cgi?id=210337 */
-static inline unsigned long kcov_remote_start_usb_softirq(u64 id) +static inline void kcov_remote_start_usb_softirq(u64 id) { - unsigned long flags = 0; - - if (in_serving_softirq()) { - local_irq_save(flags); + if (in_serving_softirq() && !in_hardirq()) kcov_remote_start_usb(id); - } - - return flags; }
-static inline void kcov_remote_stop_softirq(unsigned long flags) +static inline void kcov_remote_stop_softirq(void) { - if (in_serving_softirq()) { + if (in_serving_softirq() && !in_hardirq()) kcov_remote_stop(); - local_irq_restore(flags); - } }
#ifdef CONFIG_64BIT @@ -131,11 +105,8 @@ static inline u64 kcov_common_handle(void) } static inline void kcov_remote_start_common(u64 id) {} static inline void kcov_remote_start_usb(u64 id) {} -static inline unsigned long kcov_remote_start_usb_softirq(u64 id) -{ - return 0; -} -static inline void kcov_remote_stop_softirq(unsigned long flags) {} +static inline void kcov_remote_start_usb_softirq(u64 id) {} +static inline void kcov_remote_stop_softirq(void) {}
#endif /* CONFIG_KCOV */ #endif /* _LINUX_KCOV_H */
Hi Sebastian,
I was waiting for your review — thanks!
On 8/9/25 1:33 오전, Sebastian Andrzej Siewior wrote:
On 2025-07-25 20:14:01 [+0000], Yunseong Kim wrote:
When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following bug is triggered in the ksoftirqd context.
…
This issue was introduced by commit f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save() call establishes an atomic context where sleeping is forbidden. Inside this context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in a sleeping function called from invalid context.
On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario is already safely handled by the existing local_lock_t and the global kcov_remote_lock within kcov_remote_start(). Therefore, the outer local_irq_save() is not necessary.
This preserves the intended re-entrancy protection for non-RT kernels while resolving the locking violation on PREEMPT_RT kernels.
After making this modification and testing it, syzkaller fuzzing the PREEMPT_RT kernel is now running without stopping on latest announced Real-time Linux.
This looks oddly familiar because I removed the irq-disable bits while adding local-locks.
Commit f85d39dd7ed8 looks wrong not that it shouldn't disable interrupts. The statement in the added comment
| + * 2. Disables interrupts for the duration of the coverage collection section. | + * This allows avoiding nested remote coverage collection sections in the | + * softirq context (a softirq might occur during the execution of a work in | + * the BH workqueue, which runs with in_serving_softirq() > 0).
is wrong. Softirqs are never nesting. While the BH workqueue is running another softirq does not occur. The softirq is raised (again) and will be handled _after_ BH workqueue is done. So this is already serialised.
The issue is __usb_hcd_giveback_urb() always invokes kcov_remote_start_usb_softirq(). __usb_hcd_giveback_urb() itself is invoked from BH context (for the majority of HCDs) and from hardirq context for the root-HUB. This gets us to the scenario that that we are in the give-back path in softirq context and then invoke the function once again in hardirq context.
I have no idea how kcov works but reverting the original commit and avoiding the false nesting due to hardirq context should do the trick, an untested patch follows.
This isn't any different than the tasklet handling that was used before so I am not sure why it is now a problem.
Thank you for the detailed analysis and the patch. Your explanation about the real re-entrancy issue being "softirq vs. hardirq" and the faulty premise in the original commit makes perfect sense.
Could someone maybe test this?
As you requested, I have tested your patch on my setup.
I can check that your patch resolves the issue. I have been running the syzkaller for several hours, and the "sleeping function called from invalid context" bug is no longer triggered.
--- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1636,7 +1636,6 @@ static void __usb_hcd_giveback_urb(struct urb *urb) struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); struct usb_anchor *anchor = urb->anchor; int status = urb->unlinked;
- unsigned long flags;
urb->hcpriv = NULL; if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) && @@ -1654,14 +1653,13 @@ static void __usb_hcd_giveback_urb(struct urb *urb) /* pass ownership to the completion handler */ urb->status = status; /*
* Only collect coverage in the softirq context and disable interrupts
* to avoid scenarios with nested remote coverage collection sections
* that KCOV does not support.
* See the comment next to kcov_remote_start_usb_softirq() for details.
* This function can be called in task context inside another remote
* coverage collection section, but kcov doesn't support that kind of
*/* recursion yet. Only collect coverage in softirq context for now.
- flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
- kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum); urb->complete(urb);
- kcov_remote_stop_softirq(flags);
- kcov_remote_stop_softirq();
usb_anchor_resume_wakeups(anchor); atomic_dec(&urb->use_count); diff --git a/include/linux/kcov.h b/include/linux/kcov.h index 75a2fb8b16c32..0143358874b07 100644 --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -57,47 +57,21 @@ static inline void kcov_remote_start_usb(u64 id) /*
- The softirq flavor of kcov_remote_*() functions is introduced as a temporary
- workaround for KCOV's lack of nested remote coverage sections support.
- Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337.
- kcov_remote_start_usb_softirq():
- Only collects coverage when called in the softirq context. This allows
- avoiding nested remote coverage collection sections in the task context.
- For example, USB/IP calls usb_hcd_giveback_urb() in the task context
- within an existing remote coverage collection section. Thus, KCOV should
- not attempt to start collecting coverage within the coverage collection
- section in __usb_hcd_giveback_urb() in this case.
- Disables interrupts for the duration of the coverage collection section.
- This allows avoiding nested remote coverage collection sections in the
- softirq context (a softirq might occur during the execution of a work in
- the BH workqueue, which runs with in_serving_softirq() > 0).
- For example, usb_giveback_urb_bh() runs in the BH workqueue with
- interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
- the middle of its remote coverage collection section, and the interrupt
- handler might invoke __usb_hcd_giveback_urb() again.
- work around for kcov's lack of nested remote coverage sections support in
- task context. Adding support for nested sections is tracked in:
- */
-static inline unsigned long kcov_remote_start_usb_softirq(u64 id) +static inline void kcov_remote_start_usb_softirq(u64 id) {
- unsigned long flags = 0;
- if (in_serving_softirq()) {
local_irq_save(flags);
- if (in_serving_softirq() && !in_hardirq()) kcov_remote_start_usb(id);
- }
- return flags;
} -static inline void kcov_remote_stop_softirq(unsigned long flags) +static inline void kcov_remote_stop_softirq(void) {
- if (in_serving_softirq()) {
- if (in_serving_softirq() && !in_hardirq()) kcov_remote_stop();
local_irq_restore(flags);
- }
} #ifdef CONFIG_64BIT @@ -131,11 +105,8 @@ static inline u64 kcov_common_handle(void) } static inline void kcov_remote_start_common(u64 id) {} static inline void kcov_remote_start_usb(u64 id) {} -static inline unsigned long kcov_remote_start_usb_softirq(u64 id) -{
- return 0;
-} -static inline void kcov_remote_stop_softirq(unsigned long flags) {} +static inline void kcov_remote_start_usb_softirq(u64 id) {} +static inline void kcov_remote_stop_softirq(void) {} #endif /* CONFIG_KCOV */ #endif /* _LINUX_KCOV_H */
I really impressed your "How to Not Break PREEMPT_RT" talk at LPC 22.
Tested-by: Yunseong Kim ysk@kzalloc.com
Thanks,
Yunseong Kim
On 2025-08-09 02:35:48 [+0900], Yunseong Kim wrote:
Hi Sebastian,
Hi Yunseong,
Could someone maybe test this?
As you requested, I have tested your patch on my setup.
I can check that your patch resolves the issue. I have been running the syzkaller for several hours, and the "sleeping function called from invalid context" bug is no longer triggered.
Thank you. I just sent this as a proper patch assuming kcov still does what it should. I just don't understand why this triggers after moving to workqueues and did not with the tasklet setup. Other that than workqueue code has a bit more overhead, it is the same thing.
I really impressed your "How to Not Break PREEMPT_RT" talk at LPC 22.
Thank you.
Tested-by: Yunseong Kim ysk@kzalloc.com
Thanks,
Yunseong Kim
Sebastian
linux-stable-mirror@lists.linaro.org