On 16.1.2023 18.59, Ladislav Michl wrote:
Hi Mathias,
On Mon, Jan 16, 2023 at 04:22:11PM +0200, Mathias Nyman wrote:
From: Jimmy Hu hhhuuu@google.com
When the host controller is not responding, all URBs queued to all endpoints need to be killed. This can cause a kernel panic if we dereference an invalid endpoint.
Fix this by using xhci_get_virt_ep() helper to find the endpoint and checking if the endpoint is valid before dereferencing it.
I'm a bit confused this goes in and even to stable. Let me quote your own analysis from Message-ID: 0fe978ed-8269-9774-1c40-f8a98c17e838@linux.intel.com On Thu, Dec 22, 2022 at 03:18:53PM +0200, Mathias Nyman wrote:
I think root cause is that freeing xhci->devs[i] and including rings isn't protected by the lock, this happens in xhci_free_virt_device() called by xhci_free_dev(), which in turn may be called by usbcore at any time
So xhci->devs[i] might just suddenly disappear
Patch just checks more often if xhci->devs[i] is valid, between every endpoint. So the race between xhci_free_virt_device() and xhci_kill_endpoint_urbs() doesn't trigger null pointer deref as easily.>
I believe the above is correct and even Jimmy was unable to verify your later patch (3rd in this serie), which brings a question how could be this patch tested. It just burns a bug a bit deeper and I do not think it is the right approach.
As I said in a direct response to the original patch I think this is a valid fix for older kernels where we used to unlock xhci->lock while giving back URBs. Together with PATCH 3/7 the issue should be completely resolved. For later kernels PATCH 3/7 should be enough by itself, but no harm in keeping this.
See Message-ID: 379b395f-b65c-96fe-7ecc-f18e3740b990@linux.intel.com
Older kernels are all before v5.5 that lack commit 36dc01657b49 usb: host: xhci: Support running urb giveback in tasklet context.
I haven't been able to trigger this issue myself, but based on the report and finding in the code I still think this the right approach. The internal testing this has been through could only prove these patches (2/7 and 3/7) don't cause any additional issues.
If you think the analysis or solution is incorrect let me know, and we can come up with a better one.
Thanks Mathias