From: Andrey Konovalov andreyknvl@gmail.com
Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") switched dummy_hcd to use hrtimer and made the timer's callback be executed in the hardirq context.
With that change, __usb_hcd_giveback_urb now gets executed in the hardirq context, which causes problems for KCOV and KMSAN.
One problem is that KCOV now is unable to collect coverage from the USB code that gets executed from the dummy_hcd's timer callback, as KCOV cannot collect coverage in the hardirq context.
Another problem is that the dummy_hcd hrtimer might get triggered in the middle of a softirq with KCOV remote coverage collection enabled, and that causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch to shut down this WARNING, but that doesn't fix the other two issues.)
Finally, KMSAN appears to ignore tracking memory copying operations that happen in the hardirq context, which causes false positive kernel-infoleaks, as reported by syzbot.
Change the hrtimer in dummy_hcd to execute the callback in the softirq context.
Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac Reported-by: syzbot+17ca2339e34a1d863aad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=17ca2339e34a1d863aad Fixes: a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") Cc: stable@vger.kernel.org Signed-off-by: Andrey Konovalov andreyknvl@gmail.com
---
Marcello, would this change be acceptable for your use case?
If we wanted to keep the hardirq hrtimer, we would need teach KCOV to collect coverage in the hardirq context (or disable it, which would be unfortunate) and also fix whatever is wrong with KMSAN, but all that requires some work. --- drivers/usb/gadget/udc/dummy_hcd.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c index f37b0d8386c1a..ff7bee78bcc49 100644 --- a/drivers/usb/gadget/udc/dummy_hcd.c +++ b/drivers/usb/gadget/udc/dummy_hcd.c @@ -1304,7 +1304,8 @@ static int dummy_urb_enqueue(
/* kick the scheduler, it'll do the rest */ if (!hrtimer_active(&dum_hcd->timer)) - hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), + HRTIMER_MODE_REL_SOFT);
done: spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); @@ -1325,7 +1326,7 @@ static int dummy_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) rc = usb_hcd_check_unlink_urb(hcd, urb, status); if (!rc && dum_hcd->rh_state != DUMMY_RH_RUNNING && !list_empty(&dum_hcd->urbp_list)) - hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL_SOFT);
spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); return rc; @@ -1995,7 +1996,8 @@ static enum hrtimer_restart dummy_timer(struct hrtimer *t) dum_hcd->udev = NULL; } else if (dum_hcd->rh_state == DUMMY_RH_RUNNING) { /* want a 1 msec delay here */ - hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS), + HRTIMER_MODE_REL_SOFT); }
spin_unlock_irqrestore(&dum->lock, flags); @@ -2389,7 +2391,7 @@ static int dummy_bus_resume(struct usb_hcd *hcd) dum_hcd->rh_state = DUMMY_RH_RUNNING; set_link_state(dum_hcd); if (!list_empty(&dum_hcd->urbp_list)) - hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL); + hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL_SOFT); hcd->state = HC_STATE_RUNNING; } spin_unlock_irq(&dum_hcd->dum->lock); @@ -2467,7 +2469,7 @@ static DEVICE_ATTR_RO(urbs);
static int dummy_start_ss(struct dummy_hcd *dum_hcd) { - hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); dum_hcd->timer.function = dummy_timer; dum_hcd->rh_state = DUMMY_RH_RUNNING; dum_hcd->stream_en_ep = 0; @@ -2497,7 +2499,7 @@ static int dummy_start(struct usb_hcd *hcd) return dummy_start_ss(dum_hcd);
spin_lock_init(&dum_hcd->dum->lock); - hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); dum_hcd->timer.function = dummy_timer; dum_hcd->rh_state = DUMMY_RH_RUNNING;
Hi Andrey,
On Mon, 2024-07-29 at 04:23 +0200, andrey.konovalov@linux.dev wrote:
From: Andrey Konovalov andreyknvl@gmail.com
Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") switched dummy_hcd to use hrtimer and made the timer's callback be executed in the hardirq context.
With that change, __usb_hcd_giveback_urb now gets executed in the hardirq context, which causes problems for KCOV and KMSAN.
One problem is that KCOV now is unable to collect coverage from the USB code that gets executed from the dummy_hcd's timer callback, as KCOV cannot collect coverage in the hardirq context.
Another problem is that the dummy_hcd hrtimer might get triggered in the middle of a softirq with KCOV remote coverage collection enabled, and that causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch to shut down this WARNING, but that doesn't fix the other two issues.)
Finally, KMSAN appears to ignore tracking memory copying operations that happen in the hardirq context, which causes false positive kernel-infoleaks, as reported by syzbot.
Change the hrtimer in dummy_hcd to execute the callback in the softirq context.
Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac Reported-by: syzbot+17ca2339e34a1d863aad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=17ca2339e34a1d863aad Fixes: a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") Cc: stable@vger.kernel.org Signed-off-by: Andrey Konovalov andreyknvl@gmail.com
Marcello, would this change be acceptable for your use case?
Thanks for investigating and finding the cause of this problem. I have already submitted an identical patch to change the hrtimer to softirq: https://lkml.org/lkml/2024/6/26/969
However, your commit messages contain more useful information about the problem at hand. So I'm happy to drop my patch in favor of yours.
Btw, the same problem has also been reported by the intel kernel test robot. So we should add additional tags to mark this patch as the fix.
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202406141323.413a90d2-lkp@intel.com Acked-by: Marcello Sylvester Bauer sylv@sylv.io
Thanks, Marcello
If we wanted to keep the hardirq hrtimer, we would need teach KCOV to collect coverage in the hardirq context (or disable it, which would be unfortunate) and also fix whatever is wrong with KMSAN, but all that requires some work.
drivers/usb/gadget/udc/dummy_hcd.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c index f37b0d8386c1a..ff7bee78bcc49 100644 --- a/drivers/usb/gadget/udc/dummy_hcd.c +++ b/drivers/usb/gadget/udc/dummy_hcd.c @@ -1304,7 +1304,8 @@ static int dummy_urb_enqueue( /* kick the scheduler, it'll do the rest */ if (!hrtimer_active(&dum_hcd->timer))
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS),
HRTIMER_MODE_REL);
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS),
- HRTIMER_MODE_REL_SOFT);
done: spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); @@ -1325,7 +1326,7 @@ static int dummy_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) rc = usb_hcd_check_unlink_urb(hcd, urb, status); if (!rc && dum_hcd->rh_state != DUMMY_RH_RUNNING && !list_empty(&dum_hcd->urbp_list))
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL);
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(0),
HRTIMER_MODE_REL_SOFT); spin_unlock_irqrestore(&dum_hcd->dum->lock, flags); return rc; @@ -1995,7 +1996,8 @@ static enum hrtimer_restart dummy_timer(struct hrtimer *t) dum_hcd->udev = NULL; } else if (dum_hcd->rh_state == DUMMY_RH_RUNNING) { /* want a 1 msec delay here */
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS),
HRTIMER_MODE_REL);
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(DUMMY_TIMER_INT_NSECS),
- HRTIMER_MODE_REL_SOFT);
} spin_unlock_irqrestore(&dum->lock, flags); @@ -2389,7 +2391,7 @@ static int dummy_bus_resume(struct usb_hcd *hcd) dum_hcd->rh_state = DUMMY_RH_RUNNING; set_link_state(dum_hcd); if (!list_empty(&dum_hcd->urbp_list))
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(0), HRTIMER_MODE_REL);
- hrtimer_start(&dum_hcd->timer, ns_to_ktime(0),
HRTIMER_MODE_REL_SOFT); hcd->state = HC_STATE_RUNNING; } spin_unlock_irq(&dum_hcd->dum->lock); @@ -2467,7 +2469,7 @@ static DEVICE_ATTR_RO(urbs); static int dummy_start_ss(struct dummy_hcd *dum_hcd) {
- hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
- hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL_SOFT); dum_hcd->timer.function = dummy_timer; dum_hcd->rh_state = DUMMY_RH_RUNNING; dum_hcd->stream_en_ep = 0; @@ -2497,7 +2499,7 @@ static int dummy_start(struct usb_hcd *hcd) return dummy_start_ss(dum_hcd); spin_lock_init(&dum_hcd->dum->lock);
- hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
- hrtimer_init(&dum_hcd->timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL_SOFT); dum_hcd->timer.function = dummy_timer; dum_hcd->rh_state = DUMMY_RH_RUNNING;
On Mon, Jul 29, 2024 at 10:26 AM Marcello Sylvester Bauer sylv@sylv.io wrote:
Hi Andrey,
Hi Marcello,
Thanks for investigating and finding the cause of this problem. I have already submitted an identical patch to change the hrtimer to softirq: https://lkml.org/lkml/2024/6/26/969
Ah, I missed that, that's great!
However, your commit messages contain more useful information about the problem at hand. So I'm happy to drop my patch in favor of yours.
That's very considerate, thank you. I'll leave this up to Greg - I don't mind using either patch.
Btw, the same problem has also been reported by the intel kernel test robot. So we should add additional tags to mark this patch as the fix.
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202406141323.413a90d2-lkp@intel.com Acked-by: Marcello Sylvester Bauer sylv@sylv.io
Let's also add the syzbot reports mentioned in your patch:
Reported-by: syzbot+c793a7eca38803212c61@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c793a7eca38803212c61 Reported-by: syzbot+1e6e0b916b211bee1bd6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1e6e0b916b211bee1bd6
And I also found one more:
Reported-by: syzbot+edd9fe0d3a65b14588d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5
Thank you!
On Mon, Jul 29, 2024 at 06:14:30PM +0200, Andrey Konovalov wrote:
On Mon, Jul 29, 2024 at 10:26 AM Marcello Sylvester Bauer sylv@sylv.io wrote:
Hi Andrey,
Hi Marcello,
Thanks for investigating and finding the cause of this problem. I have already submitted an identical patch to change the hrtimer to softirq: https://lkml.org/lkml/2024/6/26/969
Ah, I missed that, that's great!
However, your commit messages contain more useful information about the problem at hand. So I'm happy to drop my patch in favor of yours.
That's very considerate, thank you. I'll leave this up to Greg - I don't mind using either patch.
Btw, the same problem has also been reported by the intel kernel test robot. So we should add additional tags to mark this patch as the fix.
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202406141323.413a90d2-lkp@intel.com Acked-by: Marcello Sylvester Bauer sylv@sylv.io
Let's also add the syzbot reports mentioned in your patch:
Reported-by: syzbot+c793a7eca38803212c61@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c793a7eca38803212c61 Reported-by: syzbot+1e6e0b916b211bee1bd6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1e6e0b916b211bee1bd6
And I also found one more:
Reported-by: syzbot+edd9fe0d3a65b14588d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5
You need to be careful about claiming that this patch will fix those bug reports. At least one of them (the last one above) still fails with the patch applied. See:
https://lore.kernel.org/linux-usb/ade15714-6aa3-4988-8b45-719fc9d74727@rowla...
and the following response.
Alan Stern
On Mon, Jul 29, 2024 at 8:01 PM Alan Stern stern@rowland.harvard.edu wrote:
And I also found one more:
Reported-by: syzbot+edd9fe0d3a65b14588d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5
You need to be careful about claiming that this patch will fix those bug reports. At least one of them (the last one above) still fails with the patch applied. See:
https://lore.kernel.org/linux-usb/ade15714-6aa3-4988-8b45-719fc9d74727@rowla...
and the following response.
Ah, right, that one is something else, so let's not add those last Reported-by/Closes.
However, that crash was bisected to the same guilty patch, so the issue is somehow related. Even if we were to mark it as to be fixed with the patch I sent, this wouldn't be critical: syzbot would just rereport it, and with fresher stack traces.
Thank you!
On Mon, Jul 29, 2024 at 4:23 AM andrey.konovalov@linux.dev wrote:
From: Andrey Konovalov andreyknvl@gmail.com
Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") switched dummy_hcd to use hrtimer and made the timer's callback be executed in the hardirq context.
With that change, __usb_hcd_giveback_urb now gets executed in the hardirq context, which causes problems for KCOV and KMSAN.
One problem is that KCOV now is unable to collect coverage from the USB code that gets executed from the dummy_hcd's timer callback, as KCOV cannot collect coverage in the hardirq context.
Another problem is that the dummy_hcd hrtimer might get triggered in the middle of a softirq with KCOV remote coverage collection enabled, and that causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch to shut down this WARNING, but that doesn't fix the other two issues.)
Finally, KMSAN appears to ignore tracking memory copying operations that happen in the hardirq context, which causes false positive kernel-infoleaks, as reported by syzbot.
Hi Andrey,
FWIW this problem is tracked as https://github.com/google/kmsan/issues/92, I'll try to revisit it in September.
On Mon, Jul 29, 2024 at 4:23 AM andrey.konovalov@linux.dev wrote:
From: Andrey Konovalov andreyknvl@gmail.com
Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") switched dummy_hcd to use hrtimer and made the timer's callback be executed in the hardirq context.
With that change, __usb_hcd_giveback_urb now gets executed in the hardirq context, which causes problems for KCOV and KMSAN.
One problem is that KCOV now is unable to collect coverage from the USB code that gets executed from the dummy_hcd's timer callback, as KCOV cannot collect coverage in the hardirq context.
Another problem is that the dummy_hcd hrtimer might get triggered in the middle of a softirq with KCOV remote coverage collection enabled, and that causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch to shut down this WARNING, but that doesn't fix the other two issues.)
Finally, KMSAN appears to ignore tracking memory copying operations that happen in the hardirq context, which causes false positive kernel-infoleaks, as reported by syzbot.
Change the hrtimer in dummy_hcd to execute the callback in the softirq context.
Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac Reported-by: syzbot+17ca2339e34a1d863aad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=17ca2339e34a1d863aad Fixes: a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") Cc: stable@vger.kernel.org Signed-off-by: Andrey Konovalov andreyknvl@gmail.com
Hi Greg,
Could you pick up either this or Marcello's patch (https://lkml.org/lkml/2024/6/26/969)? In case they got lost.
Thank you!
On Tue, Aug 27, 2024 at 02:02:00AM +0200, Andrey Konovalov wrote:
On Mon, Jul 29, 2024 at 4:23 AM andrey.konovalov@linux.dev wrote:
From: Andrey Konovalov andreyknvl@gmail.com
Commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") switched dummy_hcd to use hrtimer and made the timer's callback be executed in the hardirq context.
With that change, __usb_hcd_giveback_urb now gets executed in the hardirq context, which causes problems for KCOV and KMSAN.
One problem is that KCOV now is unable to collect coverage from the USB code that gets executed from the dummy_hcd's timer callback, as KCOV cannot collect coverage in the hardirq context.
Another problem is that the dummy_hcd hrtimer might get triggered in the middle of a softirq with KCOV remote coverage collection enabled, and that causes a WARNING in KCOV, as reported by syzbot. (I sent a separate patch to shut down this WARNING, but that doesn't fix the other two issues.)
Finally, KMSAN appears to ignore tracking memory copying operations that happen in the hardirq context, which causes false positive kernel-infoleaks, as reported by syzbot.
Change the hrtimer in dummy_hcd to execute the callback in the softirq context.
Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac Reported-by: syzbot+17ca2339e34a1d863aad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=17ca2339e34a1d863aad Fixes: a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler") Cc: stable@vger.kernel.org Signed-off-by: Andrey Konovalov andreyknvl@gmail.com
Hi Greg,
Could you pick up either this or Marcello's patch (https://lkml.org/lkml/2024/6/26/969)? In case they got lost.
Both are lost now, (and please use lore.kernel.org, not lkml.org), can you resend the one that you wish to see accepted?
thanks,
greg k-h
On Tue, Sep 3, 2024 at 9:09 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Hi Greg,
Could you pick up either this or Marcello's patch (https://lkml.org/lkml/2024/6/26/969)? In case they got lost.
Both are lost now, (and please use lore.kernel.org, not lkml.org), can you resend the one that you wish to see accepted?
Done: https://lore.kernel.org/linux-usb/20240904013051.4409-1-andrey.konovalov@lin...
Thanks!
linux-stable-mirror@lists.linaro.org