xHCI 4.9.1 requires HCs to obey the IOC flag we set on the last TRB even after an error has been reported on an earlier TRB. This typically means that an error mid TD is followed by a success event for the last TRB.
On SuperSpeed (and only SS) isochronous endpoints Etron hosts were found to emit a success event also after an error on the last TRB of a TD.
Reuse the machinery for handling errors mid TD to handle these otherwise unexpected events. Avoid printing "TRB not part of current TD" errors, ensure proper tracking of HC's internal dequeue pointer and distinguish this known quirk from other bogus events caused by ordinary bugs.
This patch was found to eliminate all related warnings and errors while running for 30 minutes with a UVC camera on a flaky cable which produces transaction errors about every second. An altsetting was chosen which causes some TDs to be multi-TRB, dynamic debug was used to confirm that errors both mid TD and on the last TRB are handled as expected:
[ 6028.439776] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint [ 6028.439784] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1 [ 6028.440268] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0 [ 6028.440270] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error [ 6029.123683] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint [ 6029.123694] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=0 [ 6029.123697] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0 [ 6029.123700] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error
Handling of Stopped events is unaffected: finish_td() is called but it does nothing and the TD waits until it's unlinked:
[ 7081.705544] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint [ 7081.705546] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1 [ 7081.705630] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2 [ 7081.705633] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error [ 7081.705678] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2 [ 7081.705680] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error [ 7081.705759] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2 [ 7081.705799] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2
Reported-by: Kuangyi Chiang ki.chiang65@gmail.com Closes: https://lore.kernel.org/linux-usb/20250205053750.28251-1-ki.chiang65@gmail.c... Signed-off-by: Michal Pecio michal.pecio@gmail.com ---
Hi Mathias,
This is the best I was able to do. It does add a few lines, but I don't think it's too scary and IMO the switch looks even better this way. It accurately predicts those events while not breaking anything else that I can see or think of, save for the risk of firmware bugfix adding one ESIT of latency on errors.
I tried to also test your Etron patch but it has whitespace damage all over the place and would be hard to apply.
Regards, Michal
drivers/usb/host/xhci-ring.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 965bffce301e..7ff5075e5890 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2437,6 +2437,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, bool sum_trbs_for_length = false; u32 remaining, requested, ep_trb_len; int short_framestatus; + bool error_event = false, etron_quirk = false;
trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len)); urb_priv = td->urb->hcpriv; @@ -2473,8 +2474,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, fallthrough; case COMP_ISOCH_BUFFER_OVERRUN: frame->status = -EOVERFLOW; - if (ep_trb != td->end_trb) - td->error_mid_td = true; + error_event = true; break; case COMP_INCOMPATIBLE_DEVICE_ERROR: case COMP_STALL_ERROR: @@ -2483,8 +2483,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, case COMP_USB_TRANSACTION_ERROR: frame->status = -EPROTO; sum_trbs_for_length = true; - if (ep_trb != td->end_trb) - td->error_mid_td = true; + error_event = true; break; case COMP_STOPPED: sum_trbs_for_length = true; @@ -2518,8 +2517,17 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, td->urb->actual_length += frame->actual_length;
finish_td: + /* An error event mid TD will be followed by more events, xHCI 4.9.1 */ + td->error_mid_td |= error_event && (ep_trb != td->end_trb); + + /* Etron treats *all* SuperSpeed isoc errors like errors mid TD */ + if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) { + td->error_mid_td |= error_event; + etron_quirk |= error_event; + } + /* Don't give back TD yet if we encountered an error mid TD */ - if (td->error_mid_td && ep_trb != td->end_trb) { + if (td->error_mid_td && (ep_trb != td->end_trb || etron_quirk)) { xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n"); td->urb_length_set = true; return;