Hub driver warm-resets ports in SS.Inactive or Compliance mode to recover a possible connected device. The port reset code correctly detects if a connection is lost during reset, but hub driver port_event() fails to take this into account in some cases. port_event() ends up using stale values and assumes there is a connected device, and will try all means to recover it, including power-cycling the port.
Details: This case was triggered when xHC host was suspended with DbC (Debug Capability) enabled and connected. DbC turns one xHC port into a simple usb debug device, allowing debugging a system with an A-to-A USB debug cable.
xhci DbC code disables DbC when xHC is system suspended to D3, and enables it back during resume. We essentially end up with two hosts connected to each other during suspend, and, for a short while during resume, until DbC is enabled back. The suspended xHC host notices some activity on the roothub port, but can't train the link due to being suspended, so xHC hardware sets a CAS (Cold Attach Status) flag for this port to inform xhci host driver that the port needs to be warm reset once xHC resumes.
CAS is xHCI specific, and not part of USB specification, so xhci driver tells usb core that the port has a connection and link is in compliance mode. Recovery from complinace mode is similar to CAS recovery.
xhci CAS driver support that fakes a compliance mode connection was added in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Once xHCI resumes and DbC is enabled back, all activity on the xHC roothub host side port disappears. The hub driver will anyway think port has a connection and link is in compliance mode, and hub driver will try to recover it.
The port power-cycle during recovery seems to cause issues to the active DbC connection.
Fix this by clearing connect_change flag if hub_port_reset() returns -ENOTCONN, thus avoiding the whole unnecessary port recovery and initialization attempt.
Cc: stable@vger.kernel.org Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS") Tested-by: Łukasz Bartosik ukaszb@chromium.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com --- drivers/usb/core/hub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 6bb6e92cb0a4..f981e365be36 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -5754,6 +5754,7 @@ static void port_event(struct usb_hub *hub, int port1) struct usb_device *hdev = hub->hdev; u16 portstatus, portchange; int i = 0; + int err;
connect_change = test_bit(port1, hub->change_bits); clear_bit(port1, hub->event_bits); @@ -5850,8 +5851,11 @@ static void port_event(struct usb_hub *hub, int port1) } else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION) || udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&port_dev->dev, "do warm reset, port only\n"); - if (hub_port_reset(hub, port1, NULL, - HUB_BH_RESET_TIME, true) < 0) + err = hub_port_reset(hub, port1, NULL, + HUB_BH_RESET_TIME, true); + if (!udev && err == -ENOTCONN) + connect_change = 0; + else if (err < 0) hub_port_disable(hub, port1, 1); } else { dev_dbg(&port_dev->dev, "do warm reset, full device\n");
On Mon, Jun 23, 2025 at 04:39:47PM +0300, Mathias Nyman wrote:
Hub driver warm-resets ports in SS.Inactive or Compliance mode to recover a possible connected device. The port reset code correctly detects if a connection is lost during reset, but hub driver port_event() fails to take this into account in some cases. port_event() ends up using stale values and assumes there is a connected device, and will try all means to recover it, including power-cycling the port.
Details: This case was triggered when xHC host was suspended with DbC (Debug Capability) enabled and connected. DbC turns one xHC port into a simple usb debug device, allowing debugging a system with an A-to-A USB debug cable.
xhci DbC code disables DbC when xHC is system suspended to D3, and enables it back during resume. We essentially end up with two hosts connected to each other during suspend, and, for a short while during resume, until DbC is enabled back. The suspended xHC host notices some activity on the roothub port, but can't train the link due to being suspended, so xHC hardware sets a CAS (Cold Attach Status) flag for this port to inform xhci host driver that the port needs to be warm reset once xHC resumes.
CAS is xHCI specific, and not part of USB specification, so xhci driver tells usb core that the port has a connection and link is in compliance mode. Recovery from complinace mode is similar to CAS recovery.
xhci CAS driver support that fakes a compliance mode connection was added in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Once xHCI resumes and DbC is enabled back, all activity on the xHC roothub host side port disappears. The hub driver will anyway think port has a connection and link is in compliance mode, and hub driver will try to recover it.
The port power-cycle during recovery seems to cause issues to the active DbC connection.
Fix this by clearing connect_change flag if hub_port_reset() returns -ENOTCONN, thus avoiding the whole unnecessary port recovery and initialization attempt.
Cc: stable@vger.kernel.org Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS") Tested-by: Łukasz Bartosik ukaszb@chromium.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
drivers/usb/core/hub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
Alan, any objection to this?
thanks,
greg k-h
On Tue, Jul 15, 2025 at 07:48:50PM +0200, Greg KH wrote:
On Mon, Jun 23, 2025 at 04:39:47PM +0300, Mathias Nyman wrote:
Hub driver warm-resets ports in SS.Inactive or Compliance mode to recover a possible connected device. The port reset code correctly detects if a connection is lost during reset, but hub driver port_event() fails to take this into account in some cases. port_event() ends up using stale values and assumes there is a connected device, and will try all means to recover it, including power-cycling the port.
Details: This case was triggered when xHC host was suspended with DbC (Debug Capability) enabled and connected. DbC turns one xHC port into a simple usb debug device, allowing debugging a system with an A-to-A USB debug cable.
xhci DbC code disables DbC when xHC is system suspended to D3, and enables it back during resume. We essentially end up with two hosts connected to each other during suspend, and, for a short while during resume, until DbC is enabled back. The suspended xHC host notices some activity on the roothub port, but can't train the link due to being suspended, so xHC hardware sets a CAS (Cold Attach Status) flag for this port to inform xhci host driver that the port needs to be warm reset once xHC resumes.
CAS is xHCI specific, and not part of USB specification, so xhci driver tells usb core that the port has a connection and link is in compliance mode. Recovery from complinace mode is similar to CAS recovery.
xhci CAS driver support that fakes a compliance mode connection was added in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Once xHCI resumes and DbC is enabled back, all activity on the xHC roothub host side port disappears. The hub driver will anyway think port has a connection and link is in compliance mode, and hub driver will try to recover it.
The port power-cycle during recovery seems to cause issues to the active DbC connection.
Fix this by clearing connect_change flag if hub_port_reset() returns -ENOTCONN, thus avoiding the whole unnecessary port recovery and initialization attempt.
Cc: stable@vger.kernel.org Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS") Tested-by: Łukasz Bartosik ukaszb@chromium.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
drivers/usb/core/hub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
Alan, any objection to this?
No objection, it looks okay to me.
Acked-by: Alan Stern stern@rowland.harvard.edu
Alan Stern
On 23. 06. 25, 15:39, Mathias Nyman wrote:
Hub driver warm-resets ports in SS.Inactive or Compliance mode to recover a possible connected device. The port reset code correctly detects if a connection is lost during reset, but hub driver port_event() fails to take this into account in some cases. port_event() ends up using stale values and assumes there is a connected device, and will try all means to recover it, including power-cycling the port.
Details: This case was triggered when xHC host was suspended with DbC (Debug Capability) enabled and connected. DbC turns one xHC port into a simple usb debug device, allowing debugging a system with an A-to-A USB debug cable.
xhci DbC code disables DbC when xHC is system suspended to D3, and enables it back during resume. We essentially end up with two hosts connected to each other during suspend, and, for a short while during resume, until DbC is enabled back. The suspended xHC host notices some activity on the roothub port, but can't train the link due to being suspended, so xHC hardware sets a CAS (Cold Attach Status) flag for this port to inform xhci host driver that the port needs to be warm reset once xHC resumes.
CAS is xHCI specific, and not part of USB specification, so xhci driver tells usb core that the port has a connection and link is in compliance mode. Recovery from complinace mode is similar to CAS recovery.
xhci CAS driver support that fakes a compliance mode connection was added in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Once xHCI resumes and DbC is enabled back, all activity on the xHC roothub host side port disappears. The hub driver will anyway think port has a connection and link is in compliance mode, and hub driver will try to recover it.
The port power-cycle during recovery seems to cause issues to the active DbC connection.
Fix this by clearing connect_change flag if hub_port_reset() returns -ENOTCONN, thus avoiding the whole unnecessary port recovery and initialization attempt.
Cc: stable@vger.kernel.org Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS") Tested-by: Łukasz Bartosik ukaszb@chromium.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
drivers/usb/core/hub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 6bb6e92cb0a4..f981e365be36 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -5754,6 +5754,7 @@ static void port_event(struct usb_hub *hub, int port1) struct usb_device *hdev = hub->hdev; u16 portstatus, portchange; int i = 0;
- int err;
connect_change = test_bit(port1, hub->change_bits); clear_bit(port1, hub->event_bits); @@ -5850,8 +5851,11 @@ static void port_event(struct usb_hub *hub, int port1) } else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION) || udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&port_dev->dev, "do warm reset, port only\n");
if (hub_port_reset(hub, port1, NULL,
HUB_BH_RESET_TIME, true) < 0)
err = hub_port_reset(hub, port1, NULL,
HUB_BH_RESET_TIME, true);
if (!udev && err == -ENOTCONN)
connect_change = 0;
else if (err < 0) hub_port_disable(hub, port1, 1);
This was reported to break the USB on one box:
[Wed Aug 6 16:51:33 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: device not accepting address 12, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: WARN: invalid context state for evaluate context command. [Wed Aug 6 16:51:36 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:36 2025] [ C10] xhci_hcd 0000:0e:00.0: ERROR unknown event type 2 [Wed Aug 6 16:51:36 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:37 2025] [ C10] xhci_hcd 0000:0e:00.0: ERROR unknown event type 2 [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: Abort failed to stop command ring: -110 [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: xHCI host controller not responding, assume dead [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: HC died; cleaning up [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-1: USB disconnect, device number 13 [Wed Aug 6 16:52:50 2025] [ T355745] xhci_hcd 0000:0e:00.0: Timeout while waiting for setup device command [Wed Aug 6 16:52:50 2025] [ T362645] usb 2-3: USB disconnect, device number 2 [Wed Aug 6 16:52:50 2025] [ T362839] cdc_acm 1-5:1.5: acm_port_activate - usb_submit_urb(ctrl irq) failed [Wed Aug 6 16:52:50 2025] [ T355745] usb 1-2: device not accepting address 12, error -62 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-2: USB disconnect, device number 12 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-3: USB disconnect, device number 4 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-3.1: USB disconnect, device number 6 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-4: USB disconnect, device number 16 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-5: USB disconnect, device number 15 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-7: USB disconnect, device number 8
Using 6.16 minus this 2521106fc732b0b makes it works again.
The same happens with 6.15.8 as this was backported there. (6.15.6 is fine).
lsusb --tree
/: Bus 001.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/12p, 480M |__ Port 003: Dev 006, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 001: Dev 008, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 001: Dev 008, If 1, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 001: Dev 008, If 2, Class=Chip/SmartCard, Driver=usbfs, 12M |__ Port 004: Dev 007, If 0, Class=Audio, Driver=snd-usb-audio, 480M |__ Port 004: Dev 007, If 1, Class=Audio, Driver=snd-usb-audio, 480M |__ Port 004: Dev 007, If 2, Class=Application Specific Interface, Driver=[none], 480M |__ Port 004: Dev 007, If 3, Class=Communications, Driver=cdc_acm, 480M |__ Port 004: Dev 007, If 4, Class=CDC Data, Driver=cdc_acm, 480M |__ Port 005: Dev 009, If 0, Class=Audio, Driver=snd-usb-audio, 480M |__ Port 005: Dev 009, If 1, Class=Audio, Driver=snd-usb-audio, 480M |__ Port 005: Dev 009, If 2, Class=Audio, Driver=snd-usb-audio, 480M |__ Port 005: Dev 009, If 3, Class=Audio, Driver=snd-usb-audio, 480M |__ Port 005: Dev 009, If 4, Class=Audio, Driver=snd-usb-audio, 480M |__ Port 005: Dev 009, If 5, Class=Communications, Driver=cdc_acm, 480M |__ Port 005: Dev 009, If 6, Class=CDC Data, Driver=cdc_acm, 480M |__ Port 007: Dev 010, If 0, Class=Vendor Specific Class, Driver=[none], 12M |__ Port 007: Dev 010, If 2, Class=Human Interface Device, Driver=usbhid, 12M /: Bus 002.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/5p, 20000M/x2 |__ Port 003: Dev 002, If 0, Class=Hub, Driver=hub/4p, 5000M /: Bus 003.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/12p, 480M |__ Port 003: Dev 002, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 003: Dev 002, If 1, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 003: Dev 002, If 2, Class=Human Interface Device, Driver=usbhid, 12M /: Bus 004.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/5p, 20000M/x2 /: Bus 005.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 480M /: Bus 006.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 10000M /: Bus 007.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 480M |__ Port 002: Dev 002, If 0, Class=Human Interface Device, Driver=usbhid, 480M |__ Port 002: Dev 002, If 1, Class=Human Interface Device, Driver=usbhid, 480M |__ Port 002: Dev 002, If 2, Class=Human Interface Device, Driver=usbhid, 480M /: Bus 008.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 10000M /: Bus 009.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/1p, 480M /: Bus 010.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/0p, 5000M
Any ideas? What would you need to debug this?
thanks,
On 11. 08. 25, 8:16, Jiri Slaby wrote:
@@ -5850,8 +5851,11 @@ static void port_event(struct usb_hub *hub, int port1) } else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION) || udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&port_dev->dev, "do warm reset, port only\n"); - if (hub_port_reset(hub, port1, NULL, - HUB_BH_RESET_TIME, true) < 0) + err = hub_port_reset(hub, port1, NULL, + HUB_BH_RESET_TIME, true); + if (!udev && err == -ENOTCONN) + connect_change = 0; + else if (err < 0) hub_port_disable(hub, port1, 1);
FTR this is now tracked downstream as: https://bugzilla.suse.com/show_bug.cgi?id=1247895
This was reported to break the USB on one box:
[Wed Aug 6 16:51:33 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71
thanks,
On Mon, Aug 11, 2025 at 01:06:03PM +0200, Jiri Slaby wrote:
On 11. 08. 25, 8:16, Jiri Slaby wrote:
@@ -5850,8 +5851,11 @@ static void port_event(struct usb_hub *hub, int port1) } else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION) || udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&port_dev->dev, "do warm reset, port only\n"); - if (hub_port_reset(hub, port1, NULL, - HUB_BH_RESET_TIME, true) < 0) + err = hub_port_reset(hub, port1, NULL, + HUB_BH_RESET_TIME, true); + if (!udev && err == -ENOTCONN) + connect_change = 0; + else if (err < 0) hub_port_disable(hub, port1, 1);
FTR this is now tracked downstream as: https://bugzilla.suse.com/show_bug.cgi?id=1247895
This was reported to break the USB on one box:
[Wed Aug 6 16:51:33 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71
What shows up in the kernel log (with usbcore dynamic debugging enabled) if the commit is present and if the commit is reverted?
Alan Stern
On Mon, 11 Aug 2025 08:16:06 +0200, Jiri Slaby wrote:
This was reported to break the USB on one box:
[Wed Aug 6 16:51:33 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: device not accepting address 12, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: WARN: invalid context state for evaluate context command. [Wed Aug 6 16:51:36 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:36 2025] [ C10] xhci_hcd 0000:0e:00.0: ERROR unknown event type 2 [Wed Aug 6 16:51:36 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:37 2025] [ C10] xhci_hcd 0000:0e:00.0: ERROR unknown event type 2 [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: Abort failed to stop command ring: -110 [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: xHCI host controller not responding, assume dead [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: HC died; cleaning up [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-1: USB disconnect, device number 13 [Wed Aug 6 16:52:50 2025] [ T355745] xhci_hcd 0000:0e:00.0: Timeout while waiting for setup device command [Wed Aug 6 16:52:50 2025] [ T362645] usb 2-3: USB disconnect, device number 2 [Wed Aug 6 16:52:50 2025] [ T362839] cdc_acm 1-5:1.5: acm_port_activate - usb_submit_urb(ctrl irq) failed [Wed Aug 6 16:52:50 2025] [ T355745] usb 1-2: device not accepting address 12, error -62 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-2: USB disconnect, device number 12 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-3: USB disconnect, device number 4 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-3.1: USB disconnect, device number 6 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-4: USB disconnect, device number 16 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-5: USB disconnect, device number 15 [Wed Aug 6 16:52:50 2025] [ T359046] usb 1-7: USB disconnect, device number 8
Is the problem that this USB device fails to work, or that it takes down the whole bus while failing to work as usual?
The latter issue looks like some ASMedia xHCI controller being unhappy about something. What does 'lspci' say about this 0e:00.0?
So far I failed to repro this on v6.16.0 with a few of my ASMedias and a dummy device which never responds to any packet.
Can you mount debugfs and get these two files after the HC goes dead?
/sys/kernel/debug/usb/xhci/0000:0e:00.0/command-ring/trbs /sys/kernel/debug/usb/xhci/0000:0e:00.0/event-ring/trbs
Regards, Michal
Hi
This was reported to break the USB on one box:
[Wed Aug 6 16:51:33 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71
Protocol error (EPROTO) reading 64 bytes of device descriptor
[Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:34 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: device descriptor read/64, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: Device not responding to setup address.
The xhci "address device" command failed with a transaction error Slot does not reach "addressed" state
[Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: device not accepting address 12, error -71 [Wed Aug 6 16:51:35 2025] [ T355745] usb 1-2: WARN: invalid context state for evaluate context command.
xhci evaluate context command failed, probably due to slot not in addressed state
[Wed Aug 6 16:51:36 2025] [ T355745] usb 1-2: reset full-speed USB device number 12 using xhci_hcd [Wed Aug 6 16:51:36 2025] [ C10] xhci_hcd 0000:0e:00.0: ERROR unknown event type 2
This is odd, TRBs of type "2" should not exists on event rings, TRB type id 2 are supposed to be the setup TRB for control transfers, and only exist on transfer rings.
[Wed Aug 6 16:51:36 2025] [ T355745] usb 1-2: Device not responding to setup address. [Wed Aug 6 16:51:37 2025] [ C10] xhci_hcd 0000:0e:00.0: ERROR unknown event type 2 [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: Abort failed to stop command ring: -110
Aborting command due to driver not seeing command completions. The missing command completions are probably those mangled "unknown" events
[Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: xHCI host controller not responding, assume dead [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: HC died; cleaning up
Tear down xhci.
Any ideas? What would you need to debug this?
Could be that this patch reveals some underlying race in xhci re-enumeration path.
Could also be related to ep0 max packet size setting as this is a full-speed device. (max packet size is unknown until host reads first 8 bytes of descriptor, then adjusts it on the fly with an evaluate context command)
Appreciated if this could be reproduced with as few usb devices as possible, and with xhci tracing and dynamic debug enabled:
mount -t debugfs none /sys/kernel/debug echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable echo 1 > /sys/kernel/debug/tracing/tracing_on < Reproduce issue > Send output of dmesg Send content of /sys/kernel/debug/tracing/trace
Thanks Mathias
On Tue, 2025-08-12 at 13:48 +0300, Mathias Nyman wrote:
[Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: xHCI host controller not responding, assume dead [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: HC died; cleaning up
Tear down xhci.
so usb is not dead completely. I can connect my keyboard to the charging cable of my mouse and it starts working again. but it seems all my devices hanging on that part of the usb tree are dead (DAC/keyboard)
lspci is here
https://bugzilla.opensuse.org/show_bug.cgi?id=1247895#c3
Mainboard is a ASUS ProArt X870E-CREATOR WIFI
Any ideas? What would you need to debug this?
Could be that this patch reveals some underlying race in xhci re- enumeration path.
possible.
Could also be related to ep0 max packet size setting as this is a full-speed device. (max packet size is unknown until host reads first 8 bytes of descriptor, then adjusts it on the fly with an evaluate context command)
Appreciated if this could be reproduced with as few usb devices as possible, and with xhci tracing and dynamic debug enabled:
sadly this is not really reproducible on command. sometimes it happens after only a few hours. sometimes it happens after a day or 2.
mount -t debugfs none /sys/kernel/debug echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable echo 1 > /sys/kernel/debug/tracing/tracing_on
Running with this now.
< Reproduce issue > Send output of dmesg Send content of /sys/kernel/debug/tracing/trace
Will do once it happened again.
darix
On Tue, 12 Aug 2025 20:15:13 +0200, Marcus Rückert wrote:
On Tue, 2025-08-12 at 13:48 +0300, Mathias Nyman wrote:
[Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: xHCI host controller not responding, assume dead [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: HC died; cleaning up
Tear down xhci.
so usb is not dead completely. I can connect my keyboard to the charging cable of my mouse and it starts working again. but it seems all my devices hanging on that part of the usb tree are dead (DAC/keyboard)
You have multiple USB buses on multiple xHCI controllers. Controller responsible for bus 1 goes belly up and its devices are lost, but the rest keeps working.
It would make sense to figure out what was this device on port 2 of bus 1 which triggered the failure. Your lsusb output shows no such device, so it was either disconnected, connected to another port or it malfunctioned and failed to enumerate at the time. Do you know?
What's the output of these commands right now? dmesg |grep 'usb 1-2' dmesg |grep 'descriptor read'
Do you have logs? Can you look at them to see if it was always "usb 1-2" causing trouble in the past?
lspci is here
https://bugzilla.opensuse.org/show_bug.cgi?id=1247895#c3
Mainboard is a ASUS ProArt X870E-CREATOR WIFI
Thanks. Unfortunately I don't have this exact chipset, but it's an AMD chipset made by ASMedia, as suspected.
The situation is somewhat similar (though different) to this bug: https://bugzilla.kernel.org/show_bug.cgi?id=220069 Random failures for no clear reason, apparently triggered by some repetitive background activity. Very annoying.
On Wed, 2025-08-13 at 00:02 +0200, Michał Pecio wrote:
On Tue, 12 Aug 2025 20:15:13 +0200, Marcus Rückert wrote:
On Tue, 2025-08-12 at 13:48 +0300, Mathias Nyman wrote:
[Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: xHCI host controller not responding, assume dead [Wed Aug 6 16:52:50 2025] [ T362645] xhci_hcd 0000:0e:00.0: HC died; cleaning up
Tear down xhci.
so usb is not dead completely. I can connect my keyboard to the charging cable of my mouse and it starts working again. but it seems all my devices hanging on that part of the usb tree are dead (DAC/keyboard)
You have multiple USB buses on multiple xHCI controllers. Controller responsible for bus 1 goes belly up and its devices are lost, but the rest keeps working.
It would make sense to figure out what was this device on port 2 of bus 1 which triggered the failure. Your lsusb output shows no such device, so it was either disconnected, connected to another port or it malfunctioned and failed to enumerate at the time. Do you know?
What's the output of these commands right now? dmesg |grep 'usb 1-2' dmesg |grep 'descriptor read'
dmesg |grep 'usb 1-2' ; dmesg |grep 'descriptor read' [ 2.686292] [ T787] usb 1-2: new full-speed USB device number 3 using xhci_hcd [ 3.054496] [ T787] usb 1-2: New USB device found, idVendor=31e3, idProduct=1322, bcdDevice= 2.30 [ 3.054499] [ T787] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 3.054500] [ T787] usb 1-2: Product: Wooting 60HE+ [ 3.054501] [ T787] usb 1-2: Manufacturer: Wooting
the device is running firmware 2.11.0b-beta.3
Do you have logs? Can you look at them to see if it was always "usb 1-2" causing trouble in the past?
looks like it according to journalctl --since 2025-07-01 --grep "reset full-speed USB device number"
Jul 24 15:56:34 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:35 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:36 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:37 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 31 19:53:02 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:03 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Aug 06 16:51:34 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:35 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd
lspci is here
https://bugzilla.opensuse.org/show_bug.cgi?id=1247895#c3
Mainboard is a ASUS ProArt X870E-CREATOR WIFI
Thanks. Unfortunately I don't have this exact chipset, but it's an AMD chipset made by ASMedia, as suspected.
I will drop wooting a mail so they are in the loop.
darix
On Wed, 13 Aug 2025 03:58:07 +0200, Marcus Rückert wrote:
dmesg |grep 'usb 1-2' ; dmesg |grep 'descriptor read' [ 2.686292] [ T787] usb 1-2: new full-speed USB device number 3 using xhci_hcd [ 3.054496] [ T787] usb 1-2: New USB device found, idVendor=31e3, idProduct=1322, bcdDevice= 2.30 [ 3.054499] [ T787] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 3.054500] [ T787] usb 1-2: Product: Wooting 60HE+ [ 3.054501] [ T787] usb 1-2: Manufacturer: Wooting
OK, so you had a keyboard in this port during the last boot. Is this keyboard always connected to the same port? There is no bus 1 port 2 device on your earlier lsusb output, so it was either not connected there or not detected due to malfunction.
journalctl --since 2025-07-01 --grep "reset full-speed USB device number"
Jul 24 15:56:34 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:35 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:36 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:37 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 31 19:53:02 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:03 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Aug 06 16:51:34 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:35 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd
So this port was getting reset in the past. Can you also check: - how many of those resets were followed by "HC died" - if all "HC died" events were caused by resets of port usb 1-2 (or some other port)
And for the record, what exactly was the original problem which you reported to Suse and believe to be caused by a kernel upgrade? Was it "HC died" and loss of multiple devices, or just the keyborad failing to work and spamming "reset USB device numebr x", or something else?
Regards, Michal
On Wed, 2025-08-13 at 08:42 +0200, Michał Pecio wrote:
On Wed, 13 Aug 2025 03:58:07 +0200, Marcus Rückert wrote:
dmesg |grep 'usb 1-2' ; dmesg |grep 'descriptor read' [ 2.686292] [ T787] usb 1-2: new full-speed USB device number 3 using xhci_hcd [ 3.054496] [ T787] usb 1-2: New USB device found, idVendor=31e3, idProduct=1322, bcdDevice= 2.30 [ 3.054499] [ T787] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 3.054500] [ T787] usb 1-2: Product: Wooting 60HE+ [ 3.054501] [ T787] usb 1-2: Manufacturer: Wooting
OK, so you had a keyboard in this port during the last boot. Is this keyboard always connected to the same port? There is no bus 1 port 2 device on your earlier lsusb output, so it was either not connected there or not detected due to malfunction.
yes it is always connected to that port. the setup is quite static.
So this port was getting reset in the past. Can you also check:
- how many of those resets were followed by "HC died"
- if all "HC died" events were caused by resets of port usb 1-2
(or some other port)
Jul 24 15:56:34 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:35 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:36 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:37 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:57:56 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up Jul 31 19:53:02 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:03 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:55:05 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up Aug 06 16:51:34 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:35 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:52:50 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up
all HC died events were connected to reset full-speed.
And for the record, what exactly was the original problem which you reported to Suse and believe to be caused by a kernel upgrade? Was it "HC died" and loss of multiple devices, or just the keyborad failing to work and spamming "reset USB device numebr x", or something else?
The spamming I wouldnt have noticed. but the loss of the other devices from the "HC died" I did notice. So I asked Jiri if the recent kernel updates included USB changes and we started debugging :)
darix
On Wed, 13 Aug 2025 11:14:04 +0200, Marcus Rückert wrote:
Jul 24 15:56:34 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:35 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:36 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:37 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:57:56 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up Jul 31 19:53:02 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:03 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:55:05 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up Aug 06 16:51:34 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:35 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:52:50 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up
all HC died events were connected to reset full-speed.
OK, three reset loops and three HC died in the last month, both at the same time, about once a week. Possibly not a coincidence ;)
Not sure if we can confidently say that reverting this patch helped, because a week is just passing today. But the same hardware worked fine for weeks/months/years? before a recent kernel upgrade, correct?
Random idea: would anything happen if you run 'usbreset' to manually reset this device? Maybe a few times.
On Wed, 2025-08-13 at 11:48 +0200, Michał Pecio wrote:
OK, three reset loops and three HC died in the last month, both at the same time, about once a week. Possibly not a coincidence ;)
Not sure if we can confidently say that reverting this patch helped, because a week is just passing today. But the same hardware worked fine for weeks/months/years? before a recent kernel upgrade, correct?
From 2024-07 until end of July this year (when I upgraded to kernel 6.15.7) everything was working fine. Also since I run with the kernel where the patch is reverted the issue has not shown up again.
Random idea: would anything happen if you run 'usbreset' to manually reset this device? Maybe a few times.
How do I do that?
darix
On Wed, 13 Aug 2025 12:05:16 +0200, Marcus Rückert wrote:
On Wed, 2025-08-13 at 11:48 +0200, Michał Pecio wrote:
OK, three reset loops and three HC died in the last month, both at the same time, about once a week. Possibly not a coincidence ;)
Not sure if we can confidently say that reverting this patch helped, because a week is just passing today. But the same hardware worked fine for weeks/months/years? before a recent kernel upgrade, correct?
From 2024-07 until end of July this year (when I upgraded to kernel 6.15.7) everything was working fine. Also since I run with the kernel where the patch is reverted the issue has not shown up again.
Considering rarity of those events I think you would need to run for a few weeks to be sure that the problem is gone.
There is also a chance that some hardware change wich doesn't involve the "usb 1-2" keyboard caused it. In bug 220069, another AMD chipset was dying every few days if and only if two particular devices were connected to the same USB controller (the chipset had two controllers).
Random idea: would anything happen if you run 'usbreset' to manually reset this device? Maybe a few times.
How do I do that?
Run usbreset without arguments (as root) and it will print a small help text and a list of devices it can reset. If you don't have usbreset, ask Suse. Normally it should be in usbutils package like lsusb.
But I suspect nothing will happen (ie. the device will reset normally). We tried it in bug 220069 as well.
So it will be waiting until it crashes spontaneously again.
On 13.8.2025 12.48, Michał Pecio wrote:
On Wed, 13 Aug 2025 11:14:04 +0200, Marcus Rückert wrote:
Jul 24 15:56:34 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:35 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:36 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:56:37 kernel: usb 1-2: reset full-speed USB device number 14 using xhci_hcd Jul 24 15:57:56 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up Jul 31 19:53:02 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:03 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:53:04 kernel: usb 1-2: reset full-speed USB device number 50 using xhci_hcd Jul 31 19:55:05 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up Aug 06 16:51:34 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:35 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:51:36 kernel: usb 1-2: reset full-speed USB device number 12 using xhci_hcd Aug 06 16:52:50 kernel: xhci_hcd 0000:0e:00.0: HC died; cleaning up
all HC died events were connected to reset full-speed.
OK, three reset loops and three HC died in the last month, both at the same time, about once a week. Possibly not a coincidence ;)
Not sure if we can confidently say that reverting this patch helped, because a week is just passing today. But the same hardware worked fine for weeks/months/years? before a recent kernel upgrade, correct?
This patch also only concerns SuperSpeed and SuperSpeedPlus (USB 3) devices, so it's unlikely the real cause.
It is possible it reveals some existig race between the SuperSpeed bus and the slower High- and Full-speed bus. Both those buses are handled by the same xHCI controller.
In this setup usb1 is the high+fFull speed bus, and usb2 the SuperSpeed bus
Thanks -Mathias
On Wed, 2025-08-13 at 00:02 +0200, Michał Pecio wrote:
It would make sense to figure out what was this device on port 2 of bus 1 which triggered the failure. Your lsusb output shows no such device, so it was either disconnected, connected to another port or it malfunctioned and failed to enumerate at the time. Do you know?
I forgot to answer this part: I am only using that keyboard for gaming. so it goes into power save mode at some point. maybe it doesnt properly unregister for that?
linux-stable-mirror@lists.linaro.org