On Mon, 1 Jun 2020 at 03:28, Russell King - ARM Linux admin linux@armlinux.org.uk wrote:
On Mon, Jun 01, 2020 at 12:00:16AM +0300, Vladimir Oltean wrote:
This is all relevant because our options for the stable trees boil down to 2 choices:
- Revert f62265b53ef34a372b657c99e23d32e95b464316, fix an API misuse
and a bug, but lose an (admittedly ad-hoc, but still) useful way of troubleshooting a system misconfiguration (hide the problem that Zefir Kurtisi was seeing).
Or maybe just allow at803x_aneg_done() to return non-zero but still print the warning (preferably identifying the affected PHY) so your hard-to-debug problem still gets a useful kernel message pointing out what the problem is?
Maybe.
- Apply this patch which make the PHY state machine work even with
this bent interpretation of the API. It's not as if all phylib users could migrate to phylink in stable trees, and even phylink doesn't catch all possible configuration cases currently.
I wasn't even proposing that as a solution.
And yes, I do have some copper SFP modules that have an (inaccessible) AR803x PHY on them - Microtik S-RJ01 to be exact. I forget exactly which variant it is, and no, I haven't seen any of this "SGMII fails to come up" - in fact, the in-band SGMII status is the only way to know what the PHY negotiated with its link partner... and this SFP module works with phylink with no issues.
See, you should also specify what kernel you're testing on. Since Heiner did the PHY_AN cleanup, phy_aneg_done is basically dead code from the state machine's perspective, only a few random drivers call it: https://elixir.bootlin.com/linux/latest/A/ident/phy_aneg_done So I would not be at all surprised that you're not hitting it simply because at803x_aneg_done is never in your call path.
-- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up
Thanks, -Vladimir