chipidea udc calls usb_udc_vbus_handler from udc_start gadget ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler processing.
============================================ WARNING: possible recursive locking detected 640-rc1-000-devel-00005-gcda3c69ebc14 #1 Not tainted -------------------------------------------
CPU0 ---- lock(&udc->connect_lock); lock(&udc->connect_lock);
DEADLOCK
stack backtrace: CPU: 1 PID: 566 Comm: echo Not tainted 640-rc1-000-devel-00005-gcda3c69ebc14 #1 Hardware name: Freescale iMX7 Dual (Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x70/0xb0 dump_stack_lvl from __lock_acquire+0x924/0x22c4 __lock_acquire from lock_acquire+0x100/0x370 lock_acquire from __mutex_lock+0xa8/0xfb4 __mutex_lock from mutex_lock_nested+0x1c/0x24 mutex_lock_nested from usb_udc_vbus_handler+0x1c/0x60 usb_udc_vbus_handler from ci_udc_start+0x74/0x9c ci_udc_start from gadget_bind_driver+0x130/0x230 gadget_bind_driver from really_probe+0xd8/0x3fc really_probe from __driver_probe_device+0x94/0x1f0 __driver_probe_device from driver_probe_device+0x2c/0xc4 driver_probe_device from __driver_attach+0x114/0x1cc __driver_attach from bus_for_each_dev+0x7c/0xcc bus_for_each_dev from bus_add_driver+0xd4/0x200 bus_add_driver from driver_register+0x7c/0x114 driver_register from usb_gadget_register_driver_owner+0x40/0xe0 usb_gadget_register_driver_owner from gadget_dev_desc_UDC_store+0xd4/0x110 gadget_dev_desc_UDC_store from configfs_write_iter+0xac/0x118 configfs_write_iter from vfs_write+0x1b4/0x40c vfs_write from ksys_write+0x70/0xf8 ksys_write from ret_fast_syscall+0x0/0x1c
Fixes: 0db213ea8eed ("usb: gadget: udc: core: Invoke usb_gadget_connect only when started") Cc: stable@vger.kernel.org Reported-by: Stephan Gerhold stephan@gerhold.net Closes: https://lore.kernel.org/all/ZF4bMptC3Lf2Hnee@gerhold.net/ Reported-by: Francesco Dolcini francesco.dolcini@toradex.com Closes: https://lore.kernel.org/all/ZF4BvgsOyoKxdPFF@francesco-nb.int.toradex.com/ Reported-by: Alistair alistair@alistair23.me Closes: https://lore.kernel.org/lkml/0cf8c588b701d7cf25ffe1a9217b81716e6a5c51.camel@... Signed-off-by: Badhri Jagan Sridharan badhri@google.com --- Changes since v1: - Address Alan Stern's comment on usb_udc_vbus_handler invocation from atomic context: * vbus_events_lock is now a spinlock and allocations in * usb_udc_vbus_handler are atomic now. --- drivers/usb/gadget/udc/core.c | 63 +++++++++++++++++++++++++++++++---- 1 file changed, 57 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c index 69041cca5d24..ee612387b39c 100644 --- a/drivers/usb/gadget/udc/core.c +++ b/drivers/usb/gadget/udc/core.c @@ -41,6 +41,9 @@ static const struct bus_type gadget_bus_type; * functions. usb_gadget_connect_locked, usb_gadget_disconnect_locked, * usb_udc_connect_control_locked, usb_gadget_udc_start_locked, usb_gadget_udc_stop_locked are * called with this lock held. + * @vbus_events: list head for processing vbus updates on usb_udc_vbus_handler. + * @vbus_events_lock: protects vbus_events list + * @vbus_work: work item that invokes usb_udc_connect_control_locked. * * This represents the internal data structure which is used by the UDC-class * to hold information about udc driver and gadget together. @@ -53,6 +56,19 @@ struct usb_udc { bool vbus; bool started; struct mutex connect_lock; + struct list_head vbus_events; + spinlock_t vbus_events_lock; + struct work_struct vbus_work; +}; + +/** + * struct vbus_event - used to notify vbus updates posted through usb_udc_vbus_handler. + * @vbus_on: true when vbus is on. false other wise. + * @node: list node for maintaining a list of pending updates to be processed. + */ +struct vbus_event { + bool vbus_on; + struct list_head node; };
static struct class *udc_class; @@ -1134,6 +1150,30 @@ static int usb_udc_connect_control_locked(struct usb_udc *udc) __must_hold(&udc- return ret; }
+static void vbus_event_work(struct work_struct *work) +{ + struct vbus_event *event, *n; + struct usb_udc *udc = container_of(work, struct usb_udc, vbus_work); + unsigned long flags; + + spin_lock_irqsave(&udc->vbus_events_lock, flags); + list_for_each_entry_safe(event, n, &udc->vbus_events, node) { + list_del(&event->node); + /* OK to drop the lock here as it suffice to syncrhronize udc->vbus_events node + * retrieval and deletion against usb_udc_vbus_handler. usb_udc_vbus_handler does + * list_add_tail so n would be the same even if the lock is dropped. + */ + spin_unlock_irqrestore(&udc->vbus_events_lock, flags); + mutex_lock(&udc->connect_lock); + udc->vbus = event->vbus_on; + usb_udc_connect_control_locked(udc); + kfree(event); + mutex_unlock(&udc->connect_lock); + spin_lock_irqsave(&udc->vbus_events_lock, flags); + } + spin_unlock_irqrestore(&udc->vbus_events_lock, flags); +} + /** * usb_udc_vbus_handler - updates the udc core vbus status, and try to * connect or disconnect gadget @@ -1146,13 +1186,21 @@ static int usb_udc_connect_control_locked(struct usb_udc *udc) __must_hold(&udc- void usb_udc_vbus_handler(struct usb_gadget *gadget, bool status) { struct usb_udc *udc = gadget->udc; + struct vbus_event *vbus_event; + unsigned long flags;
- mutex_lock(&udc->connect_lock); - if (udc) { - udc->vbus = status; - usb_udc_connect_control_locked(udc); - } - mutex_unlock(&udc->connect_lock); + if (!udc) + return; + + vbus_event = kzalloc(sizeof(*vbus_event), GFP_ATOMIC); + if (!vbus_event) + return; + + spin_lock_irqsave(&udc->vbus_events_lock, flags); + vbus_event->vbus_on = status; + list_add_tail(&vbus_event->node, &udc->vbus_events); + spin_unlock_irqrestore(&udc->vbus_events_lock, flags); + schedule_work(&udc->vbus_work); } EXPORT_SYMBOL_GPL(usb_udc_vbus_handler);
@@ -1379,6 +1427,9 @@ int usb_add_gadget(struct usb_gadget *gadget) udc->gadget = gadget; gadget->udc = udc; mutex_init(&udc->connect_lock); + INIT_LIST_HEAD(&udc->vbus_events); + spin_lock_init(&udc->vbus_events_lock); + INIT_WORK(&udc->vbus_work, vbus_event_work);
udc->started = false;
base-commit: a4422ff221429c600c3dc5d0394fb3738b89d040
On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
chipidea udc calls usb_udc_vbus_handler from udc_start gadget ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler processing.
Look, this is way overkill.
usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call usb_udc_connect_control(). Furthermore, it gets called from only two drivers: chipidea and max3420.
Why not have the callers set udc->vbus themselves and then call usb_gadget_{dis}connect() directly? Then we could eliminate usb_udc_vbus_handler() entirely. And the unnecessary calls -- the ones causing deadlocks -- from within udc_start() and udc_stop() handlers can be removed with no further consequence.
This approach simplifies and removes code. Whereas your approach complicates and adds code for no good reason.
Alan Stern
On Fri, May 19, 2023 at 10:49:49AM -0400, Alan Stern wrote:
On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
chipidea udc calls usb_udc_vbus_handler from udc_start gadget ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler processing.
Look, this is way overkill.
usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call usb_udc_connect_control(). Furthermore, it gets called from only two drivers: chipidea and max3420.
Why not have the callers set udc->vbus themselves and then call usb_gadget_{dis}connect() directly? Then we could eliminate usb_udc_vbus_handler() entirely. And the unnecessary calls -- the ones causing deadlocks -- from within udc_start() and udc_stop() handlers can be removed with no further consequence.
This approach simplifies and removes code. Whereas your approach complicates and adds code for no good reason.
I changed my mind.
After looking more closely, I found the comment in gadget.h about ->disconnect() callbacks happening in interrupt context. This means we cannot use a mutex to protect the associated state, and therefore the connect_lock _must_ be a spinlock, not a mutex.
This also probably means that udc_start and udc_stop callbacks should not be invoked with the lock held. In fact, you might want to avoid using the lock at all with gadget_bind_driver() and gadget_unbind_driver() -- use it only in the functions that these routines call.
So it appears the whole connect_lock thing needs to be redesigned with these ideas in mind. However, it's still true that the UDC drivers shouldn't try to set the connection state from within their udc_start and udc_stop callbacks, because the core takes care of this automatically.
Alan Stern
On Fri, May 19, 2023 at 8:07 AM Alan Stern stern@rowland.harvard.edu wrote:
On Fri, May 19, 2023 at 10:49:49AM -0400, Alan Stern wrote:
On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
chipidea udc calls usb_udc_vbus_handler from udc_start gadget ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler processing.
Look, this is way overkill.
usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call usb_udc_connect_control(). Furthermore, it gets called from only two drivers: chipidea and max3420.
Why not have the callers set udc->vbus themselves and then call usb_gadget_{dis}connect() directly? Then we could eliminate usb_udc_vbus_handler() entirely. And the unnecessary calls -- the ones causing deadlocks -- from within udc_start() and udc_stop() handlers can be removed with no further consequence.
This approach simplifies and removes code. Whereas your approach complicates and adds code for no good reason.
I changed my mind.
After looking more closely, I found the comment in gadget.h about ->disconnect() callbacks happening in interrupt context. This means we cannot use a mutex to protect the associated state, and therefore the connect_lock _must_ be a spinlock, not a mutex.
Quick observation so that I don't misunderstand. I already see gadget->udc->driver->disconnect(gadget) being called with udc_lock being held.
mutex_lock(&udc_lock); if (gadget->udc->driver) gadget->udc->driver->disconnect(gadget); mutex_unlock(&udc_lock);
The below patch seems to have introduced it: 1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex
Are you referring to some other ->disconnect() callback ? If so, can you point me to which one ?
This also probably means that udc_start and udc_stop callbacks should not be invoked with the lock held. In fact, you might want to avoid using the lock at all with gadget_bind_driver() and gadget_unbind_driver() -- use it only in the functions that these routines call.
So it appears the whole connect_lock thing needs to be redesigned with these ideas in mind. However, it's still true that the UDC drivers shouldn't try to set the connection state from within their udc_start and udc_stop callbacks, because the core takes care of this automatically.
Alan Stern
Thanks for your inputs ! Badhri
On Fri, May 19, 2023 at 08:44:57AM -0700, Badhri Jagan Sridharan wrote:
On Fri, May 19, 2023 at 8:07 AM Alan Stern stern@rowland.harvard.edu wrote:
On Fri, May 19, 2023 at 10:49:49AM -0400, Alan Stern wrote:
On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
chipidea udc calls usb_udc_vbus_handler from udc_start gadget ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler processing.
Look, this is way overkill.
usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call usb_udc_connect_control(). Furthermore, it gets called from only two drivers: chipidea and max3420.
Why not have the callers set udc->vbus themselves and then call usb_gadget_{dis}connect() directly? Then we could eliminate usb_udc_vbus_handler() entirely. And the unnecessary calls -- the ones causing deadlocks -- from within udc_start() and udc_stop() handlers can be removed with no further consequence.
This approach simplifies and removes code. Whereas your approach complicates and adds code for no good reason.
I changed my mind.
After looking more closely, I found the comment in gadget.h about ->disconnect() callbacks happening in interrupt context. This means we cannot use a mutex to protect the associated state, and therefore the connect_lock _must_ be a spinlock, not a mutex.
Quick observation so that I don't misunderstand. I already see gadget->udc->driver->disconnect(gadget) being called with udc_lock being held.
mutex_lock(&udc_lock); if (gadget->udc->driver) gadget->udc->driver->disconnect(gadget); mutex_unlock(&udc_lock);
The below patch seems to have introduced it: 1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex
Hmmm... You're right about this. A big problem with the USB gadget framework is that it does not clearly state which routines have to run in process context and which have to run in interrupt/atomic context. People therefore don't think about it and frequently get it wrong.
So now the problem is that the UDC or transceiver driver may detect (typically in an interrupt handler) that VBUS power has appeared or disappeared, and it wants to tell the core to adjust the D+/D- pullup signals appropriately. The core notifies the UDC driver about this, and then in the case of a disconnection, it has to notify the gadget driver. But notifying the gadget driver requires process context for the udc_lock mutex, the ultimate reason being that disconnect notifications can race with gadget driver binding and unbinding.
If we could prevent those races in some other way then we wouldn't need to hold udc_lock in usb_gadget_disconnect(). This seems like a sensible thing to do in any case; the UDC core should never allow a connection to occur before a gadget driver is bound or after it is unbound.
The first approach that occurs to me is to add a boolean allow_connect flag to struct usb_udc, together with a global spinlock to synchronize access to it. Then usb_gadget_disconnect() could check the flag before calling driver->disconnect(), gadget_bind_driver() could set the flag before calling usb_udc_connect_control(), and gadget_unbind_driver() could clear the flag before calling usb_gadget_disconnect().
(Another possible approach would be to change gadget->deactivated into a counter. It would still need to be synchronized by a spinlock, however.)
This will simplify matters considerably. udc_lock can remain a mutex and the deadlock problem should go away.
Do you want to try adding allow_connect as described here or would you prefer that I do it?
(And in any case, we should prevent the udc_start and udc_stop callbacks in the chipidea and max3420 drivers from trying to update the connection status.)
Alan Stern
Hi Alan,
Thanks for taking the time out to share more details ! +1 on your comment: " A big problem with the USB gadget framework is that it does not clearly state which routines have to run in process context and which have to run in interrupt/atomic context."
I started to work on allow_connect and other suggestions that you had made. In one of the previous comments you had mentioned that the connect_lock should be a spinlock and not a mutex. Right now there are four conditions that seem to be deciding whether pullup needs to be enabled or disabled through gadget->ops->pullup(). 1. Gadget not deactivated through usb_gadget_deactivate() 2. Gadget has to be started through usb_gadget_udc_start(). soft_connect_store() can start/stop gadget. 3. usb_gadget has been connected through usb_gadget_connect(). This is assuming we are getting rid of usb_udc_vbus_handler. 4. allow_connect is true
I have so far identified two constraints here: a. gadget->ops->pullup() can sleep in some implementations. For instance: BUG: scheduling while atomic: init/1/0x00000002 .. [ 26.990631][ T1] Call trace: [ 26.993759][ T1] dump_backtrace+0x104/0x128 [ 26.998281][ T1] show_stack+0x20/0x30 [ 27.002279][ T1] dump_stack_lvl+0x6c/0x9c [ 27.006627][ T1] __schedule_bug+0x84/0xb4 [ 27.010973][ T1] __schedule+0x6f0/0xaec [ 27.015147][ T1] schedule+0xc8/0x134 [ 27.019059][ T1] schedule_timeout+0x98/0x134 [ 27.023666][ T1] msleep+0x34/0x4c [ 27.027317][ T1] dwc3_core_soft_reset+0xf0/0x354 [ 27.032273][ T1] dwc3_gadget_pullup+0xec/0x1d8 [ 27.037055][ T1] usb_gadget_pullup_update_locked+0xa0/0x1e0 [ 27.042967][ T1] udc_bind_to_driver+0x1e4/0x30c [ 27.047835][ T1] usb_gadget_probe_driver+0xd0/0x178 [ 27.053051][ T1] gadget_dev_desc_UDC_store+0xf0/0x13c [ 27.058442][ T1] configfs_write_iter+0x100/0x178 [ 27.063399][ T1] vfs_write+0x278/0x3c4 [ 27.067483][ T1] ksys_write+0x80/0xf4
b. gadget->ops->udc_start can also sleep in some implementations. For example: [ 28.024255][ T1] BUG: scheduling while atomic: init/1/0x00000002 .... [ 28.324996][ T1] Call trace: [ 28.328126][ T1] dump_backtrace+0x104/0x128 [ 28.332647][ T1] show_stack+0x20/0x30 [ 28.336645][ T1] dump_stack_lvl+0x6c/0x9c [ 28.340993][ T1] __schedule_bug+0x84/0xb4 [ 28.345340][ T1] __schedule+0x6f0/0xaec [ 28.349513][ T1] schedule+0xc8/0x134 [ 28.353425][ T1] schedule_timeout+0x4c/0x134 [ 28.358033][ T1] wait_for_common+0xac/0x13c [ 28.362554][ T1] wait_for_completion_killable+0x20/0x3c [ 28.368118][ T1] __kthread_create_on_node+0xe4/0x1ec [ 28.373422][ T1] kthread_create_on_node+0x54/0x80 [ 28.378464][ T1] setup_irq_thread+0x50/0x108 [ 28.383072][ T1] __setup_irq+0x90/0x87c [ 28.387245][ T1] request_threaded_irq+0x144/0x180 [ 28.392287][ T1] dwc3_gadget_start+0x50/0xac [ 28.396866][ T1] udc_bind_to_driver+0x14c/0x31c [ 28.401763][ T1] usb_gadget_probe_driver+0xd0/0x178 [ 28.406980][ T1] gadget_dev_desc_UDC_store+0xf0/0x13c [ 28.412370][ T1] configfs_write_iter+0x100/0x178 [ 28.417325][ T1] vfs_write+0x278/0x3c4 [ 28.421411][ T1] ksys_write+0x80/0xf4
static int dwc3_gadget_start(struct usb_gadget *g, struct usb_gadget_driver *driver) { struct dwc3 *dwc = gadget_to_dwc(g); ... irq = dwc->irq_gadget; ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt, IRQF_SHARED, "dwc3", dwc->ev_buf);
Given that "1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex" has been there for a while and no one has reported issues so far, perhaps ->disconnect() callback is no longer being invoked in atomic context and the documentation is what that needs to be updated ?
Thanks, Badhri
On Fri, May 19, 2023 at 10:27 AM Alan Stern stern@rowland.harvard.edu wrote:
On Fri, May 19, 2023 at 08:44:57AM -0700, Badhri Jagan Sridharan wrote:
On Fri, May 19, 2023 at 8:07 AM Alan Stern stern@rowland.harvard.edu wrote:
On Fri, May 19, 2023 at 10:49:49AM -0400, Alan Stern wrote:
On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
chipidea udc calls usb_udc_vbus_handler from udc_start gadget ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler processing.
Look, this is way overkill.
usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call usb_udc_connect_control(). Furthermore, it gets called from only two drivers: chipidea and max3420.
Why not have the callers set udc->vbus themselves and then call usb_gadget_{dis}connect() directly? Then we could eliminate usb_udc_vbus_handler() entirely. And the unnecessary calls -- the ones causing deadlocks -- from within udc_start() and udc_stop() handlers can be removed with no further consequence.
This approach simplifies and removes code. Whereas your approach complicates and adds code for no good reason.
I changed my mind.
After looking more closely, I found the comment in gadget.h about ->disconnect() callbacks happening in interrupt context. This means we cannot use a mutex to protect the associated state, and therefore the connect_lock _must_ be a spinlock, not a mutex.
Quick observation so that I don't misunderstand. I already see gadget->udc->driver->disconnect(gadget) being called with udc_lock being held.
mutex_lock(&udc_lock); if (gadget->udc->driver) gadget->udc->driver->disconnect(gadget); mutex_unlock(&udc_lock);
The below patch seems to have introduced it: 1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex
Hmmm... You're right about this. A big problem with the USB gadget framework is that it does not clearly state which routines have to run in process context and which have to run in interrupt/atomic context. People therefore don't think about it and frequently get it wrong.
So now the problem is that the UDC or transceiver driver may detect (typically in an interrupt handler) that VBUS power has appeared or disappeared, and it wants to tell the core to adjust the D+/D- pullup signals appropriately. The core notifies the UDC driver about this, and then in the case of a disconnection, it has to notify the gadget driver. But notifying the gadget driver requires process context for the udc_lock mutex, the ultimate reason being that disconnect notifications can race with gadget driver binding and unbinding.
If we could prevent those races in some other way then we wouldn't need to hold udc_lock in usb_gadget_disconnect(). This seems like a sensible thing to do in any case; the UDC core should never allow a connection to occur before a gadget driver is bound or after it is unbound.
The first approach that occurs to me is to add a boolean allow_connect flag to struct usb_udc, together with a global spinlock to synchronize access to it. Then usb_gadget_disconnect() could check the flag before calling driver->disconnect(), gadget_bind_driver() could set the flag before calling usb_udc_connect_control(), and gadget_unbind_driver() could clear the flag before calling usb_gadget_disconnect().
(Another possible approach would be to change gadget->deactivated into a counter. It would still need to be synchronized by a spinlock, however.)
This will simplify matters considerably. udc_lock can remain a mutex and the deadlock problem should go away.
Do you want to try adding allow_connect as described here or would you prefer that I do it?
(And in any case, we should prevent the udc_start and udc_stop callbacks in the chipidea and max3420 drivers from trying to update the connection status.)
Alan Stern
On Mon, May 22, 2023 at 12:48 AM Badhri Jagan Sridharan badhri@google.com wrote:
Hi Alan,
Thanks for taking the time out to share more details ! +1 on your comment: " A big problem with the USB gadget framework is that it does not clearly state which routines have to run in process context and which have to run in interrupt/atomic context."
I started to work on allow_connect and other suggestions that you had made. In one of the previous comments you had mentioned that the connect_lock should be a spinlock and not a mutex. Right now there are four conditions that seem to be deciding whether pullup needs to be enabled or disabled through gadget->ops->pullup().
- Gadget not deactivated through usb_gadget_deactivate()
- Gadget has to be started through usb_gadget_udc_start().
soft_connect_store() can start/stop gadget. 3. usb_gadget has been connected through usb_gadget_connect(). This is assuming we are getting rid of usb_udc_vbus_handler. 4. allow_connect is true
I have so far identified two constraints here: a. gadget->ops->pullup() can sleep in some implementations. For instance: BUG: scheduling while atomic: init/1/0x00000002 .. [ 26.990631][ T1] Call trace: [ 26.993759][ T1] dump_backtrace+0x104/0x128 [ 26.998281][ T1] show_stack+0x20/0x30 [ 27.002279][ T1] dump_stack_lvl+0x6c/0x9c [ 27.006627][ T1] __schedule_bug+0x84/0xb4 [ 27.010973][ T1] __schedule+0x6f0/0xaec [ 27.015147][ T1] schedule+0xc8/0x134 [ 27.019059][ T1] schedule_timeout+0x98/0x134 [ 27.023666][ T1] msleep+0x34/0x4c
Adding more context to make sure that I am more articulate. I am aware that alternatives such as mdelay can be used to work around in this specific instance. However, my concern is more around whether gadget->ops->pullup() of other implementations were designed as atomic. I only have dwc3 based hardware so can't test other udc implementations. Hence the concern.
Thanks, Badhri
[ 27.027317][ T1] dwc3_core_soft_reset+0xf0/0x354 [ 27.032273][ T1] dwc3_gadget_pullup+0xec/0x1d8 [ 27.037055][ T1] usb_gadget_pullup_update_locked+0xa0/0x1e0 [ 27.042967][ T1] udc_bind_to_driver+0x1e4/0x30c [ 27.047835][ T1] usb_gadget_probe_driver+0xd0/0x178 [ 27.053051][ T1] gadget_dev_desc_UDC_store+0xf0/0x13c [ 27.058442][ T1] configfs_write_iter+0x100/0x178 [ 27.063399][ T1] vfs_write+0x278/0x3c4 [ 27.067483][ T1] ksys_write+0x80/0xf4
b. gadget->ops->udc_start can also sleep in some implementations. For example: [ 28.024255][ T1] BUG: scheduling while atomic: init/1/0x00000002 .... [ 28.324996][ T1] Call trace: [ 28.328126][ T1] dump_backtrace+0x104/0x128 [ 28.332647][ T1] show_stack+0x20/0x30 [ 28.336645][ T1] dump_stack_lvl+0x6c/0x9c [ 28.340993][ T1] __schedule_bug+0x84/0xb4 [ 28.345340][ T1] __schedule+0x6f0/0xaec [ 28.349513][ T1] schedule+0xc8/0x134 [ 28.353425][ T1] schedule_timeout+0x4c/0x134 [ 28.358033][ T1] wait_for_common+0xac/0x13c [ 28.362554][ T1] wait_for_completion_killable+0x20/0x3c [ 28.368118][ T1] __kthread_create_on_node+0xe4/0x1ec [ 28.373422][ T1] kthread_create_on_node+0x54/0x80 [ 28.378464][ T1] setup_irq_thread+0x50/0x108 [ 28.383072][ T1] __setup_irq+0x90/0x87c [ 28.387245][ T1] request_threaded_irq+0x144/0x180 [ 28.392287][ T1] dwc3_gadget_start+0x50/0xac [ 28.396866][ T1] udc_bind_to_driver+0x14c/0x31c [ 28.401763][ T1] usb_gadget_probe_driver+0xd0/0x178 [ 28.406980][ T1] gadget_dev_desc_UDC_store+0xf0/0x13c [ 28.412370][ T1] configfs_write_iter+0x100/0x178 [ 28.417325][ T1] vfs_write+0x278/0x3c4 [ 28.421411][ T1] ksys_write+0x80/0xf4
static int dwc3_gadget_start(struct usb_gadget *g, struct usb_gadget_driver *driver) { struct dwc3 *dwc = gadget_to_dwc(g); ... irq = dwc->irq_gadget; ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt, IRQF_SHARED, "dwc3", dwc->ev_buf);
Given that "1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex" has been there for a while and no one has reported issues so far, perhaps ->disconnect() callback is no longer being invoked in atomic context and the documentation is what that needs to be updated ?
Thanks, Badhri
On Fri, May 19, 2023 at 10:27 AM Alan Stern stern@rowland.harvard.edu wrote:
On Fri, May 19, 2023 at 08:44:57AM -0700, Badhri Jagan Sridharan wrote:
On Fri, May 19, 2023 at 8:07 AM Alan Stern stern@rowland.harvard.edu wrote:
On Fri, May 19, 2023 at 10:49:49AM -0400, Alan Stern wrote:
On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
chipidea udc calls usb_udc_vbus_handler from udc_start gadget ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler processing.
Look, this is way overkill.
usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call usb_udc_connect_control(). Furthermore, it gets called from only two drivers: chipidea and max3420.
Why not have the callers set udc->vbus themselves and then call usb_gadget_{dis}connect() directly? Then we could eliminate usb_udc_vbus_handler() entirely. And the unnecessary calls -- the ones causing deadlocks -- from within udc_start() and udc_stop() handlers can be removed with no further consequence.
This approach simplifies and removes code. Whereas your approach complicates and adds code for no good reason.
I changed my mind.
After looking more closely, I found the comment in gadget.h about ->disconnect() callbacks happening in interrupt context. This means we cannot use a mutex to protect the associated state, and therefore the connect_lock _must_ be a spinlock, not a mutex.
Quick observation so that I don't misunderstand. I already see gadget->udc->driver->disconnect(gadget) being called with udc_lock being held.
mutex_lock(&udc_lock); if (gadget->udc->driver) gadget->udc->driver->disconnect(gadget); mutex_unlock(&udc_lock);
The below patch seems to have introduced it: 1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex
Hmmm... You're right about this. A big problem with the USB gadget framework is that it does not clearly state which routines have to run in process context and which have to run in interrupt/atomic context. People therefore don't think about it and frequently get it wrong.
So now the problem is that the UDC or transceiver driver may detect (typically in an interrupt handler) that VBUS power has appeared or disappeared, and it wants to tell the core to adjust the D+/D- pullup signals appropriately. The core notifies the UDC driver about this, and then in the case of a disconnection, it has to notify the gadget driver. But notifying the gadget driver requires process context for the udc_lock mutex, the ultimate reason being that disconnect notifications can race with gadget driver binding and unbinding.
If we could prevent those races in some other way then we wouldn't need to hold udc_lock in usb_gadget_disconnect(). This seems like a sensible thing to do in any case; the UDC core should never allow a connection to occur before a gadget driver is bound or after it is unbound.
The first approach that occurs to me is to add a boolean allow_connect flag to struct usb_udc, together with a global spinlock to synchronize access to it. Then usb_gadget_disconnect() could check the flag before calling driver->disconnect(), gadget_bind_driver() could set the flag before calling usb_udc_connect_control(), and gadget_unbind_driver() could clear the flag before calling usb_gadget_disconnect().
(Another possible approach would be to change gadget->deactivated into a counter. It would still need to be synchronized by a spinlock, however.)
This will simplify matters considerably. udc_lock can remain a mutex and the deadlock problem should go away.
Do you want to try adding allow_connect as described here or would you prefer that I do it?
(And in any case, we should prevent the udc_start and udc_stop callbacks in the chipidea and max3420 drivers from trying to update the connection status.)
Alan Stern
linux-stable-mirror@lists.linaro.org