Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver: - .ndo_open/.ndo_stop control the phylink start/stop lifecycle. - ethtool/phylib entry points run in process context, not PM. - phylink MAC ops program the MAC on link changes after resume.
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") Reported-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de --- drivers/net/usb/asix_devices.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 792ddda1ad49..1e8f7089f5e8 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,15 +607,8 @@ static const struct net_device_ops ax88772_netdev_ops = {
static void ax88772_suspend(struct usbnet *dev) { - struct asix_common_private *priv = dev->driver_priv; u16 medium;
- if (netif_running(dev->net)) { - rtnl_lock(); - phylink_suspend(priv->phylink, false); - rtnl_unlock(); - } - /* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE; @@ -644,12 +637,6 @@ static void ax88772_resume(struct usbnet *dev) for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break; - - if (netif_running(dev->net)) { - rtnl_lock(); - phylink_resume(priv->phylink); - rtnl_unlock(); - } }
static int asix_resume(struct usb_interface *intf) -- 2.47.3
On Mon Sep 8, 2025 at 1:26 PM CEST, Oleksij Rempel wrote:
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.
Thanks for the patch! Applied to v6.17-rc5, it fixes the problem for me.
Tested-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink")
It does, but v5.15 (including v5.15.191 LTS) is affected as well, from 4a2c7217cd5a ("net: usb: asix: ax88772: manage PHY PM from MAC"). I think it could also use a patch, but I won't insist.
Reported-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de
drivers/net/usb/asix_devices.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 792ddda1ad49..1e8f7089f5e8 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,15 +607,8 @@ static const struct net_device_ops ax88772_netdev_ops = {
static void ax88772_suspend(struct usbnet *dev) {
struct asix_common_private *priv = dev->driver_priv; u16 medium;
if (netif_running(dev->net)) {
rtnl_lock();
phylink_suspend(priv->phylink, false);
rtnl_unlock();
}
/* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE;
@@ -644,12 +637,6 @@ static void ax88772_resume(struct usbnet *dev) for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break;
- if (netif_running(dev->net)) {
rtnl_lock();
phylink_resume(priv->phylink);
rtnl_unlock();
- }
}
static int asix_resume(struct usb_interface *intf)
2.47.3
On Mon, Sep 08, 2025 at 07:00:09PM +0200, Hubert Wiśniewski wrote:
On Mon Sep 8, 2025 at 1:26 PM CEST, Oleksij Rempel wrote:
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.
Thanks for the patch! Applied to v6.17-rc5, it fixes the problem for me.
Tested-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com
Thank you for testing!
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink")
It does, but v5.15 (including v5.15.191 LTS) is affected as well, from 4a2c7217cd5a ("net: usb: asix: ax88772: manage PHY PM from MAC"). I think it could also use a patch, but I won't insist.
Ack, I'll try do address it later.
Best Regards, Oleksij
On Tue, 9 Sep 2025 09:17:17 +0200 Oleksij Rempel wrote:
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink")
It does, but v5.15 (including v5.15.191 LTS) is affected as well, from 4a2c7217cd5a ("net: usb: asix: ax88772: manage PHY PM from MAC"). I think it could also use a patch, but I won't insist.
Ack, I'll try do address it later.
Any idea what the problem is there? Deadlocking on a different lock?
On Tue, Sep 09, 2025 at 04:56:45PM -0700, Jakub Kicinski wrote:
On Tue, 9 Sep 2025 09:17:17 +0200 Oleksij Rempel wrote:
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink")
It does, but v5.15 (including v5.15.191 LTS) is affected as well, from 4a2c7217cd5a ("net: usb: asix: ax88772: manage PHY PM from MAC"). I think it could also use a patch, but I won't insist.
Ack, I'll try do address it later.
Any idea what the problem is there? Deadlocking on a different lock?
Yes, it is PM lock taken in MDIO access inside of PM routine. We can't reliably detect a PM context from MDIO bus read/write calls. Taking local counters or setting flags would partially work, but they all are not safe against race conditions.
On Mon, Sep 08, 2025 at 01:26:19PM +0200, Oleksij Rempel wrote:
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.
I just meet the same issue. It fix the issue for me!
Tested-by: Xu Yang xu.yang_2@nxp.com
Thanks, Xu Yang
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") Reported-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de
drivers/net/usb/asix_devices.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 792ddda1ad49..1e8f7089f5e8 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,15 +607,8 @@ static const struct net_device_ops ax88772_netdev_ops = {
static void ax88772_suspend(struct usbnet *dev) {
struct asix_common_private *priv = dev->driver_priv; u16 medium;
if (netif_running(dev->net)) {
rtnl_lock();
phylink_suspend(priv->phylink, false);
rtnl_unlock();
}
/* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE;
@@ -644,12 +637,6 @@ static void ax88772_resume(struct usbnet *dev) for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break;
- if (netif_running(dev->net)) {
rtnl_lock();
phylink_resume(priv->phylink);
rtnl_unlock();
- }
}
static int asix_resume(struct usb_interface *intf)
2.47.3
On Tue, Sep 09, 2025 at 04:58:03PM -0700, Jakub Kicinski wrote:
On Mon, 8 Sep 2025 13:26:19 +0200 Oleksij Rempel wrote:
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
Meaning the interface is never suspended when open?
Ack.
On Wed, 10 Sep 2025 06:11:07 +0200 Oleksij Rempel wrote:
On Tue, Sep 09, 2025 at 04:58:03PM -0700, Jakub Kicinski wrote:
On Mon, 8 Sep 2025 13:26:19 +0200 Oleksij Rempel wrote:
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
Meaning the interface is never suspended when open?
Ack.
Alright, last time we touched usbnet we broke Linus's setup. So let's say a quick prayer and.. :D
Hello:
This patch was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Mon, 8 Sep 2025 13:26:19 +0200 you wrote:
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
[...]
Here is the summary with links: - [net,v1,1/1] net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups https://git.kernel.org/netdev/net/c/5537a4679403
You are awesome, thank you!
On 08.09.2025 13:26, Oleksij Rempel wrote:
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") Reported-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de
This patch landed in today's linux-next as commit 5537a4679403 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"). In my tests I found that it breaks operation of asix ethernet usb dongle after system suspend-resume cycle. The ethernet device is still present in the system, but it is completely dysfunctional. Here is the log:
root@target:~# time rtcwake -s10 -mmem rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 11 13:02:23 2025 PM: suspend entry (deep) Filesystems sync: 0.002 seconds Freezing user space processes Freezing user space processes completed (elapsed 0.003 seconds) OOM killer disabled. Freezing remaining freezable tasks Freezing remaining freezable tasks completed (elapsed 0.024 seconds) ... usb usb1: root hub lost power or was reset ... usb usb2: root hub lost power or was reset xhci-hcd xhci-hcd.7.auto: xHC error in resume, USBSTS 0x401, Reinit usb usb3: root hub lost power or was reset usb usb4: root hub lost power or was reset asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to enable software MII access asix 2-1:1.0 eth0: Failed to read reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to enable software MII access ... (the above error repeated many times) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 9 at drivers/net/phy/phy.c:1346 _phy_state_machine+0x158/0x2d0 phy_check_link_status+0x0/0x140: returned: -110 Modules linked in: cmac bnep mwifiex_sdio mwifiex btmrvl_sdio btmrvl sha256 bluetooth cfg80211 s5p_mfc exynos_gsc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev ecdh_generic ecc mc s5p_cec CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.17.0-rc4-00221-g5537a4679403 #11106 PREEMPT Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events_power_efficient phy_state_machine Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from __warn+0x80/0x1d0 __warn from warn_slowpath_fmt+0x124/0x1bc warn_slowpath_fmt from _phy_state_machine+0x158/0x2d0 _phy_state_machine from phy_state_machine+0x24/0x44 phy_state_machine from process_one_work+0x24c/0x70c process_one_work from worker_thread+0x1b8/0x3bc worker_thread from kthread+0x13c/0x264 kthread from ret_from_fork+0x14/0x28 Exception stack(0xf0879fb0 to 0xf0879ff8) ... irq event stamp: 221553 hardirqs last enabled at (221559): [<c01bae94>] __up_console_sem+0x50/0x60 hardirqs last disabled at (221564): [<c01bae80>] __up_console_sem+0x3c/0x60 softirqs last enabled at (219346): [<c013b93c>] handle_softirqs+0x328/0x520 softirqs last disabled at (219327): [<c013bce0>] __irq_exit_rcu+0x144/0x1f0 ---[ end trace 0000000000000000 ]--- asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to enable software MII access asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to write Medium Mode mode to 0x0000: ffffff8f asix 2-1:1.0 eth0: Link is Down asix 2-1:1.0 eth0: Failed to read reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to enable software MII access asix 2-1:1.0 eth0: Failed to read reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to enable software MII access ... (the above error repeated many times) usb 2-1: reset high-speed USB device number 2 using exynos-ehci OOM killer enabled. Restarting tasks: Starting Restarting tasks: Done PM: suspend exit
real 0m14.105s user 0m0.002s sys 0m2.025s root@target:~# root@target:~# ifconfig -a eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.100.17 netmask 255.255.255.0 broadcast 192.168.100.255 inet6 fe80::250:b6ff:fe18:92ee prefixlen 64 scopeid 0x20<link> ether 00:50:b6:18:92:ee txqueuelen 1000 (Ethernet) RX packets 242 bytes 18250 (17.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 258 bytes 22474 (21.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@target:~# ping host PING host (192.168.100.1) 56(84) bytes of data. ^C --- host ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1053ms
Reverting $subject on top of today's linux-next restores ethernet operation after system suspend-resume cycle.
drivers/net/usb/asix_devices.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 792ddda1ad49..1e8f7089f5e8 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,15 +607,8 @@ static const struct net_device_ops ax88772_netdev_ops = {
static void ax88772_suspend(struct usbnet *dev) {
struct asix_common_private *priv = dev->driver_priv; u16 medium;
if (netif_running(dev->net)) {
rtnl_lock();
phylink_suspend(priv->phylink, false);
rtnl_unlock();
}
/* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE;
@@ -644,12 +637,6 @@ static void ax88772_resume(struct usbnet *dev) for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break;
if (netif_running(dev->net)) {
rtnl_lock();
phylink_resume(priv->phylink);
rtnl_unlock();
} }
static int asix_resume(struct usb_interface *intf)
-- 2.47.3
Best regards
On Thu, Sep 11, 2025 at 03:58:50PM +0200, Marek Szyprowski wrote:
On 08.09.2025 13:26, Oleksij Rempel wrote:
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") Reported-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de
This patch landed in today's linux-next as commit 5537a4679403 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"). In my tests I found that it breaks operation of asix ethernet usb dongle after system suspend-resume cycle. The ethernet device is still present in the system, but it is completely dysfunctional. Here is the log:
root@target:~# time rtcwake -s10 -mmem rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 11 13:02:23 2025 PM: suspend entry (deep) Filesystems sync: 0.002 seconds Freezing user space processes Freezing user space processes completed (elapsed 0.003 seconds) OOM killer disabled. Freezing remaining freezable tasks Freezing remaining freezable tasks completed (elapsed 0.024 seconds) ... usb usb1: root hub lost power or was reset ... usb usb2: root hub lost power or was reset xhci-hcd xhci-hcd.7.auto: xHC error in resume, USBSTS 0x401, Reinit usb usb3: root hub lost power or was reset usb usb4: root hub lost power or was reset asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to enable software MII access asix 2-1:1.0 eth0: Failed to read reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to write reg index 0x0000: -113 asix 2-1:1.0 eth0: Failed to enable software MII access ... (the above error repeated many times) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 9 at drivers/net/phy/phy.c:1346 _phy_state_machine+0x158/0x2d0 phy_check_link_status+0x0/0x140: returned: -110
I'm not surprised. I'm guessing phylib is using polled mode, and removing the suspend/resume handling likely means that it's at the mercy of the timings of the phylib state machine running (which is what is complaining here) vs the MDIO bus being available for use.
Given that this happens, I'm convinced that the original patch is the wrong approach. The driver needs the phylink suspend/resume calls to shutdown and restart phylib polling, and the resume call needs to be placed in such a location that the MDIO bus is already accessible.
On Thu, 11 Sep 2025 15:39:20 +0100 Russell King (Oracle) wrote:
I'm not surprised. I'm guessing phylib is using polled mode, and removing the suspend/resume handling likely means that it's at the mercy of the timings of the phylib state machine running (which is what is complaining here) vs the MDIO bus being available for use.
Given that this happens, I'm convinced that the original patch is the wrong approach. The driver needs the phylink suspend/resume calls to shutdown and restart phylib polling, and the resume call needs to be placed in such a location that the MDIO bus is already accessible.
We keep having issues with rtnl_lock taken from resume. Honestly, I'm not sure anyone has found a good solution, yet. Mostly people just don't implement runtime PM.
If we were able to pass optional context to suspend/resume we could implement conditional locking. We'd lose a lot of self-respect but it'd make fixing such bugs easier..
On Thu, Sep 11, 2025 at 07:55:13AM -0700, Jakub Kicinski wrote:
On Thu, 11 Sep 2025 15:39:20 +0100 Russell King (Oracle) wrote:
I'm not surprised. I'm guessing phylib is using polled mode, and removing the suspend/resume handling likely means that it's at the mercy of the timings of the phylib state machine running (which is what is complaining here) vs the MDIO bus being available for use.
Given that this happens, I'm convinced that the original patch is the wrong approach. The driver needs the phylink suspend/resume calls to shutdown and restart phylib polling, and the resume call needs to be placed in such a location that the MDIO bus is already accessible.
We keep having issues with rtnl_lock taken from resume. Honestly, I'm not sure anyone has found a good solution, yet. Mostly people just don't implement runtime PM.
If we were able to pass optional context to suspend/resume we could implement conditional locking. We'd lose a lot of self-respect but it'd make fixing such bugs easier..
Normal drivers have the option of separate callbacks for runtime PM vs system suspend/resume states. It seems USB doesn't, just munging everything into one pair of suspend and resume ops without any way of telling them apart. I suggest that is part of the problem here.
However, I'm not a USB expert, so...
On Thu, Sep 11, 2025 at 09:46:35PM +0100, Russell King (Oracle) wrote:
On Thu, Sep 11, 2025 at 07:55:13AM -0700, Jakub Kicinski wrote:
We keep having issues with rtnl_lock taken from resume. Honestly, I'm not sure anyone has found a good solution, yet. Mostly people just don't implement runtime PM.
If we were able to pass optional context to suspend/resume we could implement conditional locking. We'd lose a lot of self-respect but it'd make fixing such bugs easier..
Normal drivers have the option of separate callbacks for runtime PM vs system suspend/resume states. It seems USB doesn't, just munging everything into one pair of suspend and resume ops without any way of telling them apart. I suggest that is part of the problem here.
However, I'm not a USB expert, so...
The USB subsystem uses only one pair of callbacks for suspend and resume because USB hardware has only one suspend state. However, the callbacks do get an extra pm_message_t parameter which they can use to distinguish between system sleep transitions and runtime PM transitions.
Alan Stern
On Thu, Sep 11, 2025 at 10:30:09PM -0400, Alan Stern wrote:
On Thu, Sep 11, 2025 at 09:46:35PM +0100, Russell King (Oracle) wrote:
On Thu, Sep 11, 2025 at 07:55:13AM -0700, Jakub Kicinski wrote:
We keep having issues with rtnl_lock taken from resume. Honestly, I'm not sure anyone has found a good solution, yet. Mostly people just don't implement runtime PM.
If we were able to pass optional context to suspend/resume we could implement conditional locking. We'd lose a lot of self-respect but it'd make fixing such bugs easier..
Normal drivers have the option of separate callbacks for runtime PM vs system suspend/resume states. It seems USB doesn't, just munging everything into one pair of suspend and resume ops without any way of telling them apart. I suggest that is part of the problem here.
However, I'm not a USB expert, so...
The USB subsystem uses only one pair of callbacks for suspend and resume because USB hardware has only one suspend state. However, the callbacks do get an extra pm_message_t parameter which they can use to distinguish between system sleep transitions and runtime PM transitions.
Unfortunately, this isn't the case. While a struct usb_device_driver's suspend()/resume() methods get the pm_message_t, a struct usb_driver's suspend()/resume() methods do not:
static int usb_resume_interface(struct usb_device *udev, struct usb_interface *intf, pm_message_t msg, int reset_resume) { struct usb_driver *driver; ... if (reset_resume) { if (driver->reset_resume) { status = driver->reset_resume(intf); ... } else { status = driver->resume(intf);
vs
static int usb_resume_device(struct usb_device *udev, pm_message_t msg) { struct usb_device_driver *udriver; ... if (status == 0 && udriver->resume) status = udriver->resume(udev, msg);
and in drivers/net/usb/asix_devices.c:
static struct usb_driver asix_driver = { ... .suspend = asix_suspend, .resume = asix_resume, .reset_resume = asix_resume,
where asix_resume() only takes one argument:
static int asix_resume(struct usb_interface *intf) {
Thanks.
On Fri, Sep 12, 2025 at 09:33:05AM +0100, Russell King (Oracle) wrote:
On Thu, Sep 11, 2025 at 10:30:09PM -0400, Alan Stern wrote:
The USB subsystem uses only one pair of callbacks for suspend and resume because USB hardware has only one suspend state. However, the callbacks do get an extra pm_message_t parameter which they can use to distinguish between system sleep transitions and runtime PM transitions.
Unfortunately, this isn't the case. While a struct usb_device_driver's suspend()/resume() methods get the pm_message_t, a struct usb_driver's suspend()/resume() methods do not:
static int usb_resume_interface(struct usb_device *udev, struct usb_interface *intf, pm_message_t msg, int reset_resume) { struct usb_driver *driver; ... if (reset_resume) { if (driver->reset_resume) { status = driver->reset_resume(intf); ... } else { status = driver->resume(intf);
vs
static int usb_resume_device(struct usb_device *udev, pm_message_t msg) { struct usb_device_driver *udriver; ... if (status == 0 && udriver->resume) status = udriver->resume(udev, msg);
and in drivers/net/usb/asix_devices.c:
static struct usb_driver asix_driver = { ... .suspend = asix_suspend, .resume = asix_resume, .reset_resume = asix_resume,
where asix_resume() only takes one argument:
static int asix_resume(struct usb_interface *intf) {
Your email made me go back and check the code more carefully, and it turns out that we were both half-right. :-)
The pm_message_t argument is passed to the usb_driver's ->suspend callback in usb_suspend_interface(), but not to the ->resume callback in usb_resume_interface(). Yes, it's inconsistent.
I suppose the API could be changed, at the cost of updating a lot of drivers. But it would be easier if this wasn't necessary, if there was some way to work around the problem. Unfortunately, I don't know anything about how the network stack handles suspend and resume, or what sort of locking it requires, so I can't offer any suggestions.
Alan Stern
On Fri, Sep 12, 2025 at 10:29:47AM -0400, Alan Stern wrote:
On Fri, Sep 12, 2025 at 09:33:05AM +0100, Russell King (Oracle) wrote:
On Thu, Sep 11, 2025 at 10:30:09PM -0400, Alan Stern wrote:
The USB subsystem uses only one pair of callbacks for suspend and resume because USB hardware has only one suspend state. However, the callbacks do get an extra pm_message_t parameter which they can use to distinguish between system sleep transitions and runtime PM transitions.
Unfortunately, this isn't the case. While a struct usb_device_driver's suspend()/resume() methods get the pm_message_t, a struct usb_driver's suspend()/resume() methods do not:
static int usb_resume_interface(struct usb_device *udev, struct usb_interface *intf, pm_message_t msg, int reset_resume) { struct usb_driver *driver; ... if (reset_resume) { if (driver->reset_resume) { status = driver->reset_resume(intf); ... } else { status = driver->resume(intf);
vs
static int usb_resume_device(struct usb_device *udev, pm_message_t msg) { struct usb_device_driver *udriver; ... if (status == 0 && udriver->resume) status = udriver->resume(udev, msg);
and in drivers/net/usb/asix_devices.c:
static struct usb_driver asix_driver = { ... .suspend = asix_suspend, .resume = asix_resume, .reset_resume = asix_resume,
where asix_resume() only takes one argument:
static int asix_resume(struct usb_interface *intf) {
Your email made me go back and check the code more carefully, and it turns out that we were both half-right. :-)
The pm_message_t argument is passed to the usb_driver's ->suspend callback in usb_suspend_interface(), but not to the ->resume callback in usb_resume_interface(). Yes, it's inconsistent.
I suppose the API could be changed, at the cost of updating a lot of drivers. But it would be easier if this wasn't necessary, if there was some way to work around the problem. Unfortunately, I don't know anything about how the network stack handles suspend and resume, or what sort of locking it requires, so I can't offer any suggestions.
I, too, am unable to help further as I have no bandwidth available to deal with this. Sorry.
On Fri, Sep 12, 2025 at 03:37:52PM +0100, Russell King (Oracle) wrote:
On Fri, Sep 12, 2025 at 10:29:47AM -0400, Alan Stern wrote:
On Fri, Sep 12, 2025 at 09:33:05AM +0100, Russell King (Oracle) wrote:
On Thu, Sep 11, 2025 at 10:30:09PM -0400, Alan Stern wrote:
The USB subsystem uses only one pair of callbacks for suspend and resume because USB hardware has only one suspend state. However, the callbacks do get an extra pm_message_t parameter which they can use to distinguish between system sleep transitions and runtime PM transitions.
Unfortunately, this isn't the case. While a struct usb_device_driver's suspend()/resume() methods get the pm_message_t, a struct usb_driver's suspend()/resume() methods do not:
static int usb_resume_interface(struct usb_device *udev, struct usb_interface *intf, pm_message_t msg, int reset_resume) { struct usb_driver *driver; ... if (reset_resume) { if (driver->reset_resume) { status = driver->reset_resume(intf); ... } else { status = driver->resume(intf);
vs
static int usb_resume_device(struct usb_device *udev, pm_message_t msg) { struct usb_device_driver *udriver; ... if (status == 0 && udriver->resume) status = udriver->resume(udev, msg);
and in drivers/net/usb/asix_devices.c:
static struct usb_driver asix_driver = { ... .suspend = asix_suspend, .resume = asix_resume, .reset_resume = asix_resume,
where asix_resume() only takes one argument:
static int asix_resume(struct usb_interface *intf) {
Your email made me go back and check the code more carefully, and it turns out that we were both half-right. :-)
The pm_message_t argument is passed to the usb_driver's ->suspend callback in usb_suspend_interface(), but not to the ->resume callback in usb_resume_interface(). Yes, it's inconsistent.
I suppose the API could be changed, at the cost of updating a lot of drivers. But it would be easier if this wasn't necessary, if there was some way to work around the problem. Unfortunately, I don't know anything about how the network stack handles suspend and resume, or what sort of locking it requires, so I can't offer any suggestions.
I, too, am unable to help further as I have no bandwidth available to deal with this. Sorry.
Thanks for all the valuable input.
I’ll process the feedback and investigate possible ways to proceed. As a first step I’ll measure the actual power savings from USB auto-suspend on AX88772 to see if runtime PM is worth the added complexity.
Best Regards, Oleksij
linux-stable-mirror@lists.linaro.org