In Google internal bug 265639009 we've received an (as yet) unreproducible crash report from an aarch64 GKI 5.10.149-android13 running device.
AFAICT the source code is at: https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_13...
The call stack is: ncm_close() -> ncm_notify() -> ncm_do_notify() with the crash at: ncm_do_notify+0x98/0x270 Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b)
Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...):
// halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification) 0B 0D 00 79 strh w11, [x8, #6]
// word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request) 6C 0A 00 B9 str w12, [x19, #8]
// x10 (NULL) was read here from offset 0 of valid pointer x9 // IMHO we're reading 'cdev->gadget' and getting NULL // gadget is indeed at offset 0 of struct usb_composite_dev 2A 01 40 F9 ldr x10, [x9]
// loading req->buf pointer, which is at offset 0 of struct usb_request 69 02 40 F9 ldr x9, [x19]
// x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed 4B 5D 40 B9 ldr w11, [x10, #0x5c]
which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment:
event->wLength = cpu_to_le16(8); req->length = NCM_STATUS_BYTECOUNT;
/* SPEED_CHANGE data is up/down speeds in bits/sec */ data = req->buf + sizeof *event; data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
My analysis of registers and NULL ptr deref crash offset (Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c) heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing: data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget)); which calls: ncm_bitrate(NULL) which then calls: gadget_is_superspeed(NULL) which reads ((struct usb_gadget *)NULL)->max_speed and hits a panic.
AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C. (remember there's a GKI KABI reservation of 16 bytes in struct work_struct)
It's not at all clear to me how this is all supposed to work... but returning 0 seems much better than panic-ing...
Cc: Felipe Balbi balbi@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Lorenzo Colitti lorenzo@google.com Cc: Carlos Llamas cmllamas@google.com Cc: stable@vger.kernel.org Signed-off-by: Maciej Żenczykowski maze@google.com --- drivers/usb/gadget/function/f_ncm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index c36bcfa0e9b4..424bb3b666db 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -83,7 +83,9 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f) /* peak (theoretical) bulk transfer rate in bits-per-second */ static inline unsigned ncm_bitrate(struct usb_gadget *g) { - if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS) + if (!g) + return 0; + else if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS) return 4250000000U; else if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) return 3750000000U;
On Tue, Jan 17, 2023 at 05:18:39AM -0800, Maciej Żenczykowski wrote:
In Google internal bug 265639009 we've received an (as yet) unreproducible crash report from an aarch64 GKI 5.10.149-android13 running device.
AFAICT the source code is at: https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_13...
The call stack is: ncm_close() -> ncm_notify() -> ncm_do_notify() with the crash at: ncm_do_notify+0x98/0x270 Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b)
Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...):
// halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification) 0B 0D 00 79 strh w11, [x8, #6]
// word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request) 6C 0A 00 B9 str w12, [x19, #8]
// x10 (NULL) was read here from offset 0 of valid pointer x9 // IMHO we're reading 'cdev->gadget' and getting NULL // gadget is indeed at offset 0 of struct usb_composite_dev 2A 01 40 F9 ldr x10, [x9]
// loading req->buf pointer, which is at offset 0 of struct usb_request 69 02 40 F9 ldr x9, [x19]
// x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed 4B 5D 40 B9 ldr w11, [x10, #0x5c]
which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment:
event->wLength = cpu_to_le16(8); req->length = NCM_STATUS_BYTECOUNT;
/* SPEED_CHANGE data is up/down speeds in bits/sec */ data = req->buf + sizeof *event; data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
My analysis of registers and NULL ptr deref crash offset (Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c) heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing: data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget)); which calls: ncm_bitrate(NULL) which then calls: gadget_is_superspeed(NULL) which reads ((struct usb_gadget *)NULL)->max_speed and hits a panic.
AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C. (remember there's a GKI KABI reservation of 16 bytes in struct work_struct)
It's not at all clear to me how this is all supposed to work... but returning 0 seems much better than panic-ing...
Cc: Felipe Balbi balbi@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Lorenzo Colitti lorenzo@google.com Cc: Carlos Llamas cmllamas@google.com Cc: stable@vger.kernel.org Signed-off-by: Maciej Żenczykowski maze@google.com
drivers/usb/gadget/function/f_ncm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index c36bcfa0e9b4..424bb3b666db 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -83,7 +83,9 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f) /* peak (theoretical) bulk transfer rate in bits-per-second */ static inline unsigned ncm_bitrate(struct usb_gadget *g) {
- if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS)
- if (!g)
return 0;
- else if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS) return 4250000000U; else if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) return 3750000000U;
This looks like the wrong place to fix things - if this case is hit, don't we go on to call usb_eq_queue() which can't be valid if the gadget has been destroyed?
I don't see how cdev->gadget can be set to null without cdev being freed so is this actually a use-after-free not a simple null-dereference?
My guess is that somehow the gadget is being destroyed while the network interface is held open (we've seen similar issues in other, non-network, gadget functions), but I don't know enough about the network side of things to know how to cause that from userspace.
John
This looks like the wrong place to fix things - if this case is hit, don't we go on to call usb_eq_queue() which can't be valid if the gadget has been destroyed?
I don't see how cdev->gadget can be set to null without cdev being freed so is this actually a use-after-free not a simple null-dereference?
My guess is that somehow the gadget is being destroyed while the network interface is held open (we've seen similar issues in other, non-network, gadget functions), but I don't know enough about the network side of things to know how to cause that from userspace.
I'm still waiting on confirmation of whether this fixes things. So far we've seen it crash twice without the fix...
I don't know what triggers it - it's being triggered in some huge automated test framework. Whether the issue is lack of bind, or a too early gadget unbind... or something else...
As mentioned in the patch, I'm not entirely sure if this is the right fix... I certainly don't claim to understand the usb/gadget stack.
It does seem like usb_ep_queue() at least has some protections in place... though no idea if they're enough - and whether we'll hit a WARN_ON_ONCE now instead?
Honestly I don't even understand *why* we're sending out this speed notification out of ncm_close... (and if we do send speed notification of out ncm_close()... shouldn't it always just say speed 0?)
linux-stable-mirror@lists.linaro.org