On Thu, Sep 11, 2025 at 07:55:13AM -0700, Jakub Kicinski wrote:
On Thu, 11 Sep 2025 15:39:20 +0100 Russell King (Oracle) wrote:
I'm not surprised. I'm guessing phylib is using polled mode, and removing the suspend/resume handling likely means that it's at the mercy of the timings of the phylib state machine running (which is what is complaining here) vs the MDIO bus being available for use.
Given that this happens, I'm convinced that the original patch is the wrong approach. The driver needs the phylink suspend/resume calls to shutdown and restart phylib polling, and the resume call needs to be placed in such a location that the MDIO bus is already accessible.
We keep having issues with rtnl_lock taken from resume. Honestly, I'm not sure anyone has found a good solution, yet. Mostly people just don't implement runtime PM.
If we were able to pass optional context to suspend/resume we could implement conditional locking. We'd lose a lot of self-respect but it'd make fixing such bugs easier..
Normal drivers have the option of separate callbacks for runtime PM vs system suspend/resume states. It seems USB doesn't, just munging everything into one pair of suspend and resume ops without any way of telling them apart. I suggest that is part of the problem here.
However, I'm not a USB expert, so...