[PATCH 2/7] usb: xhci: Check endpoint is valid before dereferencing it

List overview All Threads
Download

newer

older

stable-rc/queue/4.19 baseline: 128...

stable-rc/queue/4.14 baseline: 125...

Mathias Nyman

16 Jan 2023 16 Jan '23

2:22 p.m.

From: Jimmy Hu hhhuuu@google.com

When the host controller is not responding, all URBs queued to all endpoints need to be killed. This can cause a kernel panic if we dereference an invalid endpoint.

Fix this by using xhci_get_virt_ep() helper to find the endpoint and checking if the endpoint is valid before dereferencing it.

[233311.853271] xhci-hcd xhci-hcd.1.auto: xHCI host controller not responding, assume dead [233311.853393] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8

[233311.853964] pc : xhci_hc_died+0x10c/0x270 [233311.853971] lr : xhci_hc_died+0x1ac/0x270

[233311.854077] Call trace: [233311.854085] xhci_hc_died+0x10c/0x270 [233311.854093] xhci_stop_endpoint_command_watchdog+0x100/0x1a4 [233311.854105] call_timer_fn+0x50/0x2d4 [233311.854112] expire_timers+0xac/0x2e4 [233311.854118] run_timer_softirq+0x300/0xabc [233311.854127] __do_softirq+0x148/0x528 [233311.854135] irq_exit+0x194/0x1a8 [233311.854143] __handle_domain_irq+0x164/0x1d0 [233311.854149] gic_handle_irq.22273+0x10c/0x188 [233311.854156] el1_irq+0xfc/0x1a8 [233311.854175] lpm_cpuidle_enter+0x25c/0x418 [msm_pm] [233311.854185] cpuidle_enter_state+0x1f0/0x764 [233311.854194] do_idle+0x594/0x6ac [233311.854201] cpu_startup_entry+0x7c/0x80 [233311.854209] secondary_start_kernel+0x170/0x198

Fixes: 50e8725e7c42 ("xhci: Refactor command watchdog and fix split string.") Cc: stable@vger.kernel.org Signed-off-by: Jimmy Hu hhhuuu@google.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com --- drivers/usb/host/xhci-ring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index ddc30037f9ce..f5b0e1ce22af 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1169,7 +1169,10 @@ static void xhci_kill_endpoint_urbs(struct xhci_hcd *xhci, struct xhci_virt_ep *ep; struct xhci_ring *ring;

- ep = &xhci->devs[slot_id]->eps[ep_index]; + ep = xhci_get_virt_ep(xhci, slot_id, ep_index); + if (!ep) + return; + if ((ep->ep_state & EP_HAS_STREAMS) || (ep->ep_state & EP_GETTING_NO_STREAMS)) { int stream_id;

-- 2.25.1

Show replies by date

Ladislav Michl

16 Jan 16 Jan

4:59 p.m.

Hi Mathias,

On Mon, Jan 16, 2023 at 04:22:11PM +0200, Mathias Nyman wrote:

...

From: Jimmy Hu hhhuuu@google.com

When the host controller is not responding, all URBs queued to all endpoints need to be killed. This can cause a kernel panic if we dereference an invalid endpoint.

Fix this by using xhci_get_virt_ep() helper to find the endpoint and checking if the endpoint is valid before dereferencing it.

I'm a bit confused this goes in and even to stable. Let me quote your own analysis from Message-ID: 0fe978ed-8269-9774-1c40-f8a98c17e838@linux.intel.com On Thu, Dec 22, 2022 at 03:18:53PM +0200, Mathias Nyman wrote:

...

I think root cause is that freeing xhci->devs[i] and including rings isn't protected by the lock, this happens in xhci_free_virt_device() called by xhci_free_dev(), which in turn may be called by usbcore at any time

So xhci->devs[i] might just suddenly disappear

Patch just checks more often if xhci->devs[i] is valid, between every endpoint. So the race between xhci_free_virt_device() and xhci_kill_endpoint_urbs() doesn't trigger null pointer deref as easily.

I believe the above is correct and even Jimmy was unable to verify your later patch (3rd in this serie), which brings a question how could be this patch tested. It just burns a bug a bit deeper and I do not think it is the right approach.

ladis

...

[233311.853271] xhci-hcd xhci-hcd.1.auto: xHCI host controller not responding, assume dead [233311.853393] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8

[233311.853964] pc : xhci_hc_died+0x10c/0x270 [233311.853971] lr : xhci_hc_died+0x1ac/0x270

[233311.854077] Call trace: [233311.854085] xhci_hc_died+0x10c/0x270 [233311.854093] xhci_stop_endpoint_command_watchdog+0x100/0x1a4 [233311.854105] call_timer_fn+0x50/0x2d4 [233311.854112] expire_timers+0xac/0x2e4 [233311.854118] run_timer_softirq+0x300/0xabc [233311.854127] __do_softirq+0x148/0x528 [233311.854135] irq_exit+0x194/0x1a8 [233311.854143] __handle_domain_irq+0x164/0x1d0 [233311.854149] gic_handle_irq.22273+0x10c/0x188 [233311.854156] el1_irq+0xfc/0x1a8 [233311.854175] lpm_cpuidle_enter+0x25c/0x418 [msm_pm] [233311.854185] cpuidle_enter_state+0x1f0/0x764 [233311.854194] do_idle+0x594/0x6ac [233311.854201] cpu_startup_entry+0x7c/0x80 [233311.854209] secondary_start_kernel+0x170/0x198

Fixes: 50e8725e7c42 ("xhci: Refactor command watchdog and fix split string.") Cc: stable@vger.kernel.org Signed-off-by: Jimmy Hu hhhuuu@google.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com

drivers/usb/host/xhci-ring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index ddc30037f9ce..f5b0e1ce22af 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1169,7 +1169,10 @@ static void xhci_kill_endpoint_urbs(struct xhci_hcd *xhci, struct xhci_virt_ep *ep; struct xhci_ring *ring;

ep = &xhci->devs[slot_id]->eps[ep_index];
ep = xhci_get_virt_ep(xhci, slot_id, ep_index);

if (!ep)
return;
if ((ep->ep_state & EP_HAS_STREAMS) || (ep->ep_state & EP_GETTING_NO_STREAMS)) { int stream_id;
-- 2.25.1

Mathias Nyman

17 Jan 17 Jan

10:02 a.m.

On 16.1.2023 18.59, Ladislav Michl wrote:

...

Hi Mathias,

On Mon, Jan 16, 2023 at 04:22:11PM +0200, Mathias Nyman wrote:

...
From: Jimmy Hu hhhuuu@google.com

When the host controller is not responding, all URBs queued to all endpoints need to be killed. This can cause a kernel panic if we dereference an invalid endpoint.

Fix this by using xhci_get_virt_ep() helper to find the endpoint and checking if the endpoint is valid before dereferencing it.

I'm a bit confused this goes in and even to stable. Let me quote your own analysis from Message-ID: 0fe978ed-8269-9774-1c40-f8a98c17e838@linux.intel.com On Thu, Dec 22, 2022 at 03:18:53PM +0200, Mathias Nyman wrote:

...
I think root cause is that freeing xhci->devs[i] and including rings isn't protected by the lock, this happens in xhci_free_virt_device() called by xhci_free_dev(), which in turn may be called by usbcore at any time

So xhci->devs[i] might just suddenly disappear

Patch just checks more often if xhci->devs[i] is valid, between every endpoint. So the race between xhci_free_virt_device() and xhci_kill_endpoint_urbs() doesn't trigger null pointer deref as easily.>

I believe the above is correct and even Jimmy was unable to verify your later patch (3rd in this serie), which brings a question how could be this patch tested. It just burns a bug a bit deeper and I do not think it is the right approach.

As I said in a direct response to the original patch I think this is a valid fix for older kernels where we used to unlock xhci->lock while giving back URBs. Together with PATCH 3/7 the issue should be completely resolved. For later kernels PATCH 3/7 should be enough by itself, but no harm in keeping this.

See Message-ID: 379b395f-b65c-96fe-7ecc-f18e3740b990@linux.intel.com

Older kernels are all before v5.5 that lack commit 36dc01657b49 usb: host: xhci: Support running urb giveback in tasklet context.

I haven't been able to trigger this issue myself, but based on the report and finding in the code I still think this the right approach. The internal testing this has been through could only prove these patches (2/7 and 3/7) don't cause any additional issues.

If you think the analysis or solution is incorrect let me know, and we can come up with a better one.

Thanks Mathias

youling257

23 Feb 23 Feb

4:26 p.m.

I used type-c 20Gbps USB3.2 GEN2x2 PCIe Expansion Card, may be this patch cause USB3.2 GEN2x2 PCIe Expansion Card not work.

[ 0.285088] xhci_hcd 0000:09:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 [ 0.285334] usb usb7: We don't know the algorithms for LPM for this host, disabling LPM. [ 0.285347] xhci_hcd 0000:09:00.0: xHCI Host Controller [ 0.285407] hub 7-0:1.0: USB hub found [ 0.285415] hub 7-0:1.0: 4 ports detected [ 0.285783] xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 8 [ 0.285787] xhci_hcd 0000:09:00.0: Host supports USB 3.2 Enhanced SuperSpeed [ 0.285889] hub 4-0:1.0: USB hub found [ 0.285901] hub 4-0:1.0: 1 port detected [ 0.285988] usb usb8: We don't know the algorithms for LPM for this host, disabling LPM. [ 3277.156054] xhci_hcd 0000:09:00.0: Abort failed to stop command ring: -110 [ 3277.156091] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead [ 3277.156103] xhci_hcd 0000:09:00.0: HC died; cleaning up

may be this patch cause "xhci_hcd 0000:09:00.0: HC died; cleaning up" problem.

Mathias Nyman

24 Feb 24 Feb

10:29 a.m.

On 23.2.2023 18.26, youling257 wrote:

...

I used type-c 20Gbps USB3.2 GEN2x2 PCIe Expansion Card, may be this patch cause USB3.2 GEN2x2 PCIe Expansion Card not work.

[ 0.285088] xhci_hcd 0000:09:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 [ 0.285334] usb usb7: We don't know the algorithms for LPM for this host, disabling LPM. [ 0.285347] xhci_hcd 0000:09:00.0: xHCI Host Controller [ 0.285407] hub 7-0:1.0: USB hub found [ 0.285415] hub 7-0:1.0: 4 ports detected [ 0.285783] xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 8 [ 0.285787] xhci_hcd 0000:09:00.0: Host supports USB 3.2 Enhanced SuperSpeed [ 0.285889] hub 4-0:1.0: USB hub found [ 0.285901] hub 4-0:1.0: 1 port detected [ 0.285988] usb usb8: We don't know the algorithms for LPM for this host, disabling LPM. [ 3277.156054] xhci_hcd 0000:09:00.0: Abort failed to stop command ring: -110 [ 3277.156091] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead [ 3277.156103] xhci_hcd 0000:09:00.0: HC died; cleaning up

may be this patch cause "xhci_hcd 0000:09:00.0: HC died; cleaning up" problem.

Unlikely, this patch only touches code called after HC already died.

Does reverting this patch fix the issue?

Thanks Mathias

youling 257

3:58 p.m.

February 17, when i used linux 6.2-rc8, happen "Abort failed to stop command ring: -110", google search history February 17 search "Abort failed to stop command ring: -110" and "Usbreset No such device found".

Date: Fri, 17 Feb 2023 23:59:29 +0800 Subject: [PATCH] Revert "usb: xhci: Check endpoint is valid before dereferencing it" This reverts commit e8fb5bc76eb86437ab87002d4a36d6da02165654.

a week never see usb not work. may be revert it fix my problem.

2023-02-24 18:29 GMT+08:00, Mathias Nyman mathias.nyman@linux.intel.com:

...

On 23.2.2023 18.26, youling257 wrote:

...
I used type-c 20Gbps USB3.2 GEN2x2 PCIe Expansion Card, may be this patch cause USB3.2 GEN2x2 PCIe Expansion Card not work.

[ 0.285088] xhci_hcd 0000:09:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 [ 0.285334] usb usb7: We don't know the algorithms for LPM for this host, disabling LPM. [ 0.285347] xhci_hcd 0000:09:00.0: xHCI Host Controller [ 0.285407] hub 7-0:1.0: USB hub found [ 0.285415] hub 7-0:1.0: 4 ports detected [ 0.285783] xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 8 [ 0.285787] xhci_hcd 0000:09:00.0: Host supports USB 3.2 Enhanced SuperSpeed [ 0.285889] hub 4-0:1.0: USB hub found [ 0.285901] hub 4-0:1.0: 1 port detected [ 0.285988] usb usb8: We don't know the algorithms for LPM for this host, disabling LPM. [ 3277.156054] xhci_hcd 0000:09:00.0: Abort failed to stop command ring: -110 [ 3277.156091] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead [ 3277.156103] xhci_hcd 0000:09:00.0: HC died; cleaning up

may be this patch cause "xhci_hcd 0000:09:00.0: HC died; cleaning up" problem.

Unlikely, this patch only touches code called after HC already died.

Does reverting this patch fix the issue?

Thanks Mathias

youling 257

4:03 p.m.

By the way, i used this patch on linux kernel has been a year, https://lore.kernel.org/all/6908aa69-469b-8f92-8e19-60685f524f9c@synopsys.co...

2023-02-24 23:58 GMT+08:00, youling 257 youling257@gmail.com:

...

February 17, when i used linux 6.2-rc8, happen "Abort failed to stop command ring: -110", google search history February 17 search "Abort failed to stop command ring: -110" and "Usbreset No such device found".

Date: Fri, 17 Feb 2023 23:59:29 +0800 Subject: [PATCH] Revert "usb: xhci: Check endpoint is valid before dereferencing it" This reverts commit e8fb5bc76eb86437ab87002d4a36d6da02165654.

a week never see usb not work. may be revert it fix my problem.

2023-02-24 18:29 GMT+08:00, Mathias Nyman mathias.nyman@linux.intel.com:

...
On 23.2.2023 18.26, youling257 wrote:

...
I used type-c 20Gbps USB3.2 GEN2x2 PCIe Expansion Card, may be this patch cause USB3.2 GEN2x2 PCIe Expansion Card not work.

[ 0.285088] xhci_hcd 0000:09:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 [ 0.285334] usb usb7: We don't know the algorithms for LPM for this host, disabling LPM. [ 0.285347] xhci_hcd 0000:09:00.0: xHCI Host Controller [ 0.285407] hub 7-0:1.0: USB hub found [ 0.285415] hub 7-0:1.0: 4 ports detected [ 0.285783] xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 8 [ 0.285787] xhci_hcd 0000:09:00.0: Host supports USB 3.2 Enhanced SuperSpeed [ 0.285889] hub 4-0:1.0: USB hub found [ 0.285901] hub 4-0:1.0: 1 port detected [ 0.285988] usb usb8: We don't know the algorithms for LPM for this host, disabling LPM. [ 3277.156054] xhci_hcd 0000:09:00.0: Abort failed to stop command ring: -110 [ 3277.156091] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead [ 3277.156103] xhci_hcd 0000:09:00.0: HC died; cleaning up

may be this patch cause "xhci_hcd 0000:09:00.0: HC died; cleaning up" problem.

Unlikely, this patch only touches code called after HC already died.

Does reverting this patch fix the issue?

Thanks Mathias

450

days inactive

489

days old

linux-stable-mirror@lists.linaro.org

6 comments

participants

tags (0)

participants (4)

Ladislav Michl
Mathias Nyman
youling 257
youling257