5.0-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 54f64d5c983f939901dacc8cfc0983727c5c742e ]
Since the 5.0 merge window opened, I've been seeing frequent crashes on suspend and reboot with the trace:
[ 36.911170] Unable to handle kernel paging request at virtual address ffffff801153d660 [ 36.912769] Unable to handle kernel paging request at virtual address ffffff800004b564 ... [ 36.950666] Call trace: [ 36.950670] queued_spin_lock_slowpath+0x1cc/0x2c8 [ 36.950681] _raw_spin_lock_irqsave+0x64/0x78 [ 36.950692] complete+0x28/0x70 [ 36.950703] ffs_epfile_io_complete+0x3c/0x50 [ 36.950713] usb_gadget_giveback_request+0x34/0x108 [ 36.950721] dwc3_gadget_giveback+0x50/0x68 [ 36.950723] dwc3_thread_interrupt+0x358/0x1488 [ 36.950731] irq_thread_fn+0x30/0x88 [ 36.950734] irq_thread+0x114/0x1b0 [ 36.950739] kthread+0x104/0x130 [ 36.950747] ret_from_fork+0x10/0x1c
I isolated this down to in ffs_epfile_io(): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Where the completion done is setup on the stack: DECLARE_COMPLETION_ONSTACK(done);
Then later we setup a request and queue it, and wait for it: if (unlikely(wait_for_completion_interruptible(&done))) { /* * To avoid race condition with ffs_epfile_io_complete, * dequeue the request first then check * status. usb_ep_dequeue API should guarantee no race * condition with req->complete callback. */ usb_ep_dequeue(ep->ep, req); interrupted = ep->status < 0; }
The problem is, that we end up being interrupted, dequeue the request, and exit.
But then the irq triggers and we try calling complete() on the context pointer which points to now random stack space, which results in the panic.
Alan Stern pointed out there is a bug here, in that the snippet above "assumes that usb_ep_dequeue() waits until the request has been completed." And that:
wait_for_completion(&done);
Is needed right after the usb_ep_dequeue().
Thus this patch implements that change. With it I no longer see the crashes on suspend or reboot.
This issue seems to have been uncovered by behavioral changes in the dwc3 driver in commit fec9095bdef4e ("usb: dwc3: gadget: remove wait_end_transfer").
Cc: Alan Stern stern@rowland.harvard.edu Cc: Felipe Balbi balbi@kernel.org Cc: Zeng Tao prime.zeng@hisilicon.com Cc: Jack Pham jackp@codeaurora.org Cc: Thinh Nguyen thinh.nguyen@synopsys.com Cc: Chen Yu chenyu56@huawei.com Cc: Jerry Zhang zhangjerry@google.com Cc: Lars-Peter Clausen lars@metafoo.de Cc: Vincent Pelletier plr.vincent@gmail.com Cc: Andrzej Pietrasiewicz andrzej.p@samsung.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Linux USB List linux-usb@vger.kernel.org Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: John Stultz john.stultz@linaro.org Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/function/f_fs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 1e5430438703..0f8d16de7a37 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -1082,6 +1082,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data) * condition with req->complete callback. */ usb_ep_dequeue(ep->ep, req); + wait_for_completion(&done); interrupted = ep->status < 0; }