Hi,
John Stultz wrote:
On Fri, Jun 28, 2019 at 3:58 PM Sasha Levin sashal@kernel.org wrote:
On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote:
With recent changes in AOSP, adb is using asynchronous io, which causes the following crash usually on a reboot:
[ 184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104 [ 184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a [ 184.316034] Preemption disabled at: [ 184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398 [ 184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S 4.19.43-00669-g8e4970572c43-dirty #356 [ 184.334963] Hardware name: HiKey960 (DT) [ 184.338892] Call trace: [ 184.341352] dump_backtrace+0x0/0x158 [ 184.345025] show_stack+0x14/0x20 [ 184.348355] dump_stack+0x80/0xa4 [ 184.351685] __schedule_bug+0x6c/0xc0 [ 184.355363] __schedule+0x64c/0x978 [ 184.358863] schedule+0x2c/0x90 [ 184.362053] dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3] [ 184.367210] usb_ep_dequeue+0x24/0xf8 [ 184.370884] ffs_aio_cancel+0x3c/0x80 [ 184.374561] free_ioctx_users+0x40/0x148 [ 184.378500] percpu_ref_switch_to_atomic_rcu+0x180/0x1c0 [ 184.383830] rcu_process_callbacks+0x24c/0x5d8 [ 184.388283] __do_softirq+0x13c/0x398 [ 184.391959] run_ksoftirqd+0x3c/0x48 [ 184.395549] smpboot_thread_fn+0x220/0x288 [ 184.399660] kthread+0x12c/0x130 [ 184.402901] ret_from_fork+0x10/0x1c
This happens as usb_ep_dequeue can be called in interrupt context, and dwc3_gadget_ep_dequeue() then calls wait_event_lock_irq() which can sleep.
Upstream kernels are not affected due to the change fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which removes the wait_even_lock_irq code. Unfortunately that change has a number of dependencies, which I'm submitting here.
Also, to match upstream, in this series I've reverted one change that was backported to -stable, to replace it with the cherry-picked upstream commit (as the dependencies are now there)
This issue also affects 4.14,4.9 and I believe 4.4 kernels, however I don't know how to best backport this functionality that far back. Help from the maintainers would be very much appreciated!
New in v2:
- Reordered the patchset to put the revert patch first, which
avoids any bisection build issues. (Thanks to Jack Pham for the suggestion!)
Feedback and comments would be welcome!
I've queued it up for 4.19.
Is it the case that for older kernels the dependency list is too long?
Yea. It gets ugly and I'm not enough of an expert on the driver to feel comfortable knowing if I'm doing the right thing reworking this stack onto an even older tree.
But I do see crashes on reboot w/ 4.14 and 4.9 (I and suspect 4.4 as well), so I'll need to figure out something eventually.
If you're backporting this series, then you also need to apply these fixes for this series:
This fixes a race issue: c5353b225df9 ("usb: dwc3: gadget: don't enable interrupt when disabling endpoint")
This fixes incorrect TRB skip: c7152763f02e ("usb: dwc3: Reset num_trbs after skipping")
BR, Thinh