If the auto-negotiation fails to establish a gigabit link, the phy can try to 'down-shift': it resets the bits in MII_CTRL1000 to stop advertising 1Gbps and retries the negotiation at 100Mbps.
From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
in genphy_read_status") the content of MII_CTRL1000 is not checked anymore at the end of the negotiation, preventing the detection of phy 'down-shift'. In case of 'down-shift' phydev->advertising gets out-of-sync wrt MII_CTRL1000 and still includes modes that the phy have already dropped. The link partner could still advertise higher speeds, while the link is established at one of the common lower speeds. The logic 'and' in phy_resolve_aneg_linkmode() between phydev->advertising and phydev->lp_advertising will report an incorrect mode.
Issue detected with a local phy rtl8211f connected with a gigabit capable router through a two-pairs network cable.
After auto-negotiation, read back MII_CTRL1000 and mask-out from phydev->advertising the modes that have been eventually discarded due to the 'down-shift'.
Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Antonio Borneo antonio.borneo@st.com Link: https://lore.kernel.org/r/478f871a-583d-01f1-9cc5-2eea56d8c2a7@huawei.com --- To: Andrew Lunn andrew@lunn.ch To: Heiner Kallweit hkallweit1@gmail.com To: Russell King linux@armlinux.org.uk To: "David S. Miller" davem@davemloft.net To: Jakub Kicinski kuba@kernel.org To: netdev@vger.kernel.org To: Yonglong Liu liuyonglong@huawei.com Cc: linuxarm@huawei.com Cc: Salil Mehta salil.mehta@huawei.com Cc: linux-stm32@st-md-mailman.stormreply.com Cc: linux-kernel@vger.kernel.org Cc: Antonio Borneo antonio.borneo@st.com
drivers/net/phy/phy_device.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 5dab6be6fc38..5d1060aa1b25 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -2331,7 +2331,7 @@ EXPORT_SYMBOL(genphy_read_status_fixed); */ int genphy_read_status(struct phy_device *phydev) { - int err, old_link = phydev->link; + int adv, err, old_link = phydev->link;
/* Update the link, but return if there was an error */ err = genphy_update_link(phydev); @@ -2356,6 +2356,14 @@ int genphy_read_status(struct phy_device *phydev) return err;
if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) { + if (phydev->is_gigabit_capable) { + adv = phy_read(phydev, MII_CTRL1000); + if (adv < 0) + return adv; + /* update advertising in case of 'down-shift' */ + mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising, + adv); + } phy_resolve_aneg_linkmode(phydev); } else if (phydev->autoneg == AUTONEG_DISABLE) { err = genphy_read_status_fixed(phydev);
base-commit: d549699048b4b5c22dd710455bcdb76966e55aa3
On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
If the auto-negotiation fails to establish a gigabit link, the phy can try to 'down-shift': it resets the bits in MII_CTRL1000 to stop advertising 1Gbps and retries the negotiation at 100Mbps.
From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") the content of MII_CTRL1000 is not checked anymore at the end of the negotiation, preventing the detection of phy 'down-shift'. In case of 'down-shift' phydev->advertising gets out-of-sync wrt MII_CTRL1000 and still includes modes that the phy have already dropped. The link partner could still advertise higher speeds, while the link is established at one of the common lower speeds. The logic 'and' in phy_resolve_aneg_linkmode() between phydev->advertising and phydev->lp_advertising will report an incorrect mode.
Issue detected with a local phy rtl8211f connected with a gigabit capable router through a two-pairs network cable.
After auto-negotiation, read back MII_CTRL1000 and mask-out from phydev->advertising the modes that have been eventually discarded due to the 'down-shift'.
Sorry, but no. While your solution will appear to work, in introduces unexpected changes to the user visible APIs.
if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
if (phydev->is_gigabit_capable) {
adv = phy_read(phydev, MII_CTRL1000);
if (adv < 0)
return adv;
/* update advertising in case of 'down-shift' */
mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
adv);
If a down-shift occurs, this will cause the configured advertising mask to lose the 1G speed, which will be visible to userspace. Userspace doesn't expect the advertising mask to change beneath it. Since updates from userspace are done using a read-modify-write of the ksettings, this can have the undesired effect of removing 1G from the configured advertising mask.
We've had other PHYs have this behaviour; the correct solution is for the PHY driver to implement reading the resolution from the PHY rather than relying on the generic implementation if it can down-shift.
On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
If the auto-negotiation fails to establish a gigabit link, the phy can try to 'down-shift': it resets the bits in MII_CTRL1000 to stop advertising 1Gbps and retries the negotiation at 100Mbps.
From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") the content of MII_CTRL1000 is not checked anymore at the end of the negotiation, preventing the detection of phy 'down-shift'. In case of 'down-shift' phydev->advertising gets out-of-sync wrt MII_CTRL1000 and still includes modes that the phy have already dropped. The link partner could still advertise higher speeds, while the link is established at one of the common lower speeds. The logic 'and' in phy_resolve_aneg_linkmode() between phydev->advertising and phydev->lp_advertising will report an incorrect mode.
Issue detected with a local phy rtl8211f connected with a gigabit capable router through a two-pairs network cable.
After auto-negotiation, read back MII_CTRL1000 and mask-out from phydev->advertising the modes that have been eventually discarded due to the 'down-shift'.
Sorry, but no. While your solution will appear to work, in introduces unexpected changes to the user visible APIs.
if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
if (phydev->is_gigabit_capable) {
adv = phy_read(phydev, MII_CTRL1000);
if (adv < 0)
return adv;
/* update advertising in case of 'down-shift' */
mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
adv);
If a down-shift occurs, this will cause the configured advertising mask to lose the 1G speed, which will be visible to userspace.
You are right, it gets propagated to user that 1Gbps is not advertised
Userspace doesn't expect the advertising mask to change beneath it. Since updates from userspace are done using a read-modify-write of the ksettings, this can have the undesired effect of removing 1G from the configured advertising mask.
We've had other PHYs have this behaviour; the correct solution is for the PHY driver to implement reading the resolution from the PHY rather than relying on the generic implementation if it can down-shift
If it's already upstream, could you please point to one of the phy driver that already implements this properly?
Thanks Antonio
Am 24.11.2020 um 16:17 schrieb Antonio Borneo:
On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
If the auto-negotiation fails to establish a gigabit link, the phy can try to 'down-shift': it resets the bits in MII_CTRL1000 to stop advertising 1Gbps and retries the negotiation at 100Mbps.
From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") the content of MII_CTRL1000 is not checked anymore at the end of the negotiation, preventing the detection of phy 'down-shift'. In case of 'down-shift' phydev->advertising gets out-of-sync wrt MII_CTRL1000 and still includes modes that the phy have already dropped. The link partner could still advertise higher speeds, while the link is established at one of the common lower speeds. The logic 'and' in phy_resolve_aneg_linkmode() between phydev->advertising and phydev->lp_advertising will report an incorrect mode.
Issue detected with a local phy rtl8211f connected with a gigabit capable router through a two-pairs network cable.
After auto-negotiation, read back MII_CTRL1000 and mask-out from phydev->advertising the modes that have been eventually discarded due to the 'down-shift'.
Sorry, but no. While your solution will appear to work, in introduces unexpected changes to the user visible APIs.
if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
if (phydev->is_gigabit_capable) {
adv = phy_read(phydev, MII_CTRL1000);
if (adv < 0)
return adv;
/* update advertising in case of 'down-shift' */
mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
adv);
If a down-shift occurs, this will cause the configured advertising mask to lose the 1G speed, which will be visible to userspace.
You are right, it gets propagated to user that 1Gbps is not advertised
Userspace doesn't expect the advertising mask to change beneath it. Since updates from userspace are done using a read-modify-write of the ksettings, this can have the undesired effect of removing 1G from the configured advertising mask.
We've had other PHYs have this behaviour; the correct solution is for the PHY driver to implement reading the resolution from the PHY rather than relying on the generic implementation if it can down-shift
If it's already upstream, could you please point to one of the phy driver that already implements this properly?
See e.g. aqr107_read_rate(), used by aqr107_read_status().
Thanks Antonio
On Tue, Nov 24, 2020 at 04:17:42PM +0100, Antonio Borneo wrote:
On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
Userspace doesn't expect the advertising mask to change beneath it. Since updates from userspace are done using a read-modify-write of the ksettings, this can have the undesired effect of removing 1G from the configured advertising mask.
We've had other PHYs have this behaviour; the correct solution is for the PHY driver to implement reading the resolution from the PHY rather than relying on the generic implementation if it can down-shift
If it's already upstream, could you please point to one of the phy driver that already implements this properly?
Reading the resolved information is PHY specific as it isn't standardised.
Marvell PHYs have read the resolved information for a very long time. I added support for it to at803x.c:
06d5f3441b2e net: phy: at803x: use operating parameters from PHY-specific status
after it broke for exactly the reason you're reporting for your PHY.
On Tue, 2020-11-24 at 15:37 +0000, Russell King - ARM Linux admin wrote:
On Tue, Nov 24, 2020 at 04:17:42PM +0100, Antonio Borneo wrote:
On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
Userspace doesn't expect the advertising mask to change beneath it. Since updates from userspace are done using a read-modify-write of the ksettings, this can have the undesired effect of removing 1G from the configured advertising mask.
We've had other PHYs have this behaviour; the correct solution is for the PHY driver to implement reading the resolution from the PHY rather than relying on the generic implementation if it can down-shift
If it's already upstream, could you please point to one of the phy driver that already implements this properly?
Reading the resolved information is PHY specific as it isn't standardised.
Digging in the info you have provided, I realized that another Realtek PHY has some specific code already upstream to deal with downshift. The PHY specific code is added by Heiner in d445dff2df60 ("net: phy: realtek: read actual speed to detect downshift"). This code reads the actual speed from page 0xa43 address 0x12, that is not reported in the datasheet of rtl8211f. But I checked the register content in rtl8211f and it works at the same way too!
I have added Willy in copy; maybe he can confirm that we can use page 0xa43 address 0x12 on rtl8211f to read the actual speed after negotiation.
In such case the fix for rtl8211f requires just adding the same custom read_status().
Antonio
Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
If the auto-negotiation fails to establish a gigabit link, the phy can try to 'down-shift': it resets the bits in MII_CTRL1000 to stop advertising 1Gbps and retries the negotiation at 100Mbps.
I see that Russell answered already. My 2cts:
Are you sure all PHY's supporting downshift adjust the advertisement bits? IIRC an Aquantia PHY I dealt with does not. And if a PHY does so I'd consider this problematic: Let's say you have a broken cable and the PHY downshifts to 100Mbps. If you change the cable then the PHY would still negotiate 100Mbps only.
Also I think phydev->advertising reflects what the user wants to advertise, as mentioned by Russell before.
From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
in genphy_read_status") the content of MII_CTRL1000 is not checked anymore at the end of the negotiation, preventing the detection of phy 'down-shift'. In case of 'down-shift' phydev->advertising gets out-of-sync wrt MII_CTRL1000 and still includes modes that the phy have already dropped. The link partner could still advertise higher speeds, while the link is established at one of the common lower speeds. The logic 'and' in phy_resolve_aneg_linkmode() between phydev->advertising and phydev->lp_advertising will report an incorrect mode.
Issue detected with a local phy rtl8211f connected with a gigabit capable router through a two-pairs network cable.
After auto-negotiation, read back MII_CTRL1000 and mask-out from phydev->advertising the modes that have been eventually discarded due to the 'down-shift'.
Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Antonio Borneo antonio.borneo@st.com Link: https://lore.kernel.org/r/478f871a-583d-01f1-9cc5-2eea56d8c2a7@huawei.com
To: Andrew Lunn andrew@lunn.ch To: Heiner Kallweit hkallweit1@gmail.com To: Russell King linux@armlinux.org.uk To: "David S. Miller" davem@davemloft.net To: Jakub Kicinski kuba@kernel.org To: netdev@vger.kernel.org To: Yonglong Liu liuyonglong@huawei.com Cc: linuxarm@huawei.com Cc: Salil Mehta salil.mehta@huawei.com Cc: linux-stm32@st-md-mailman.stormreply.com Cc: linux-kernel@vger.kernel.org Cc: Antonio Borneo antonio.borneo@st.com
drivers/net/phy/phy_device.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 5dab6be6fc38..5d1060aa1b25 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -2331,7 +2331,7 @@ EXPORT_SYMBOL(genphy_read_status_fixed); */ int genphy_read_status(struct phy_device *phydev) {
- int err, old_link = phydev->link;
- int adv, err, old_link = phydev->link;
/* Update the link, but return if there was an error */ err = genphy_update_link(phydev); @@ -2356,6 +2356,14 @@ int genphy_read_status(struct phy_device *phydev) return err; if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
if (phydev->is_gigabit_capable) {
adv = phy_read(phydev, MII_CTRL1000);
if (adv < 0)
return adv;
/* update advertising in case of 'down-shift' */
mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
adv);
phy_resolve_aneg_linkmode(phydev); } else if (phydev->autoneg == AUTONEG_DISABLE) { err = genphy_read_status_fixed(phydev);}
base-commit: d549699048b4b5c22dd710455bcdb76966e55aa3
On Tue, Nov 24, 2020 at 04:03:40PM +0100, Heiner Kallweit wrote:
Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
If the auto-negotiation fails to establish a gigabit link, the phy can try to 'down-shift': it resets the bits in MII_CTRL1000 to stop advertising 1Gbps and retries the negotiation at 100Mbps.
I see that Russell answered already. My 2cts:
Are you sure all PHY's supporting downshift adjust the advertisement bits? IIRC an Aquantia PHY I dealt with does not. And if a PHY does so I'd consider this problematic: Let's say you have a broken cable and the PHY downshifts to 100Mbps. If you change the cable then the PHY would still negotiate 100Mbps only.
From what I've seen, that is not how downshift works, at least on
the PHYs I've seen.
When the PHY downshifts, it modifies the advertisement registers, but it also remembers the original value. When the cable is unplugged, it restores the setting to what was previously set.
It is _far_ from nice, but the fact is that your patch that Antonio identified has broken previously working support, something that I brought up when I patched one of the PHY drivers that was broken by this very same problem by your patch.
That said, _if_ the PHY has a way to read the resolved state rather than reading the advertisement registers, that is what should be used (as I said previously) rather than trying to decode the advertisement registers ourselves. That is normally more reliable for speed and duplex.
On Tue, 2020-11-24 at 15:17 +0000, Russell King - ARM Linux admin wrote:
On Tue, Nov 24, 2020 at 04:03:40PM +0100, Heiner Kallweit wrote:
Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
If the auto-negotiation fails to establish a gigabit link, the phy can try to 'down-shift': it resets the bits in MII_CTRL1000 to stop advertising 1Gbps and retries the negotiation at 100Mbps.
I see that Russell answered already. My 2cts:
Are you sure all PHY's supporting downshift adjust the advertisement bits? IIRC an Aquantia PHY I dealt with does not. And if a PHY does so I'd consider this problematic: Let's say you have a broken cable and the PHY downshifts to 100Mbps. If you change the cable then the PHY would still negotiate 100Mbps only.
From what I've seen, that is not how downshift works, at least on the PHYs I've seen.
When the PHY downshifts, it modifies the advertisement registers, but it also remembers the original value. When the cable is unplugged, it restores the setting to what was previously set.
In fact, at least rtl8211f is able to recover the original settings and returns to 1Gbps once a decent cable gets plugged-in.
It is _far_ from nice, but the fact is that your patch that Antonio identified has broken previously working support, something that I brought up when I patched one of the PHY drivers that was broken by this very same problem by your patch.
The idea to fix it for a general case was indeed triggered by the fact that before commit 5502b218e001 this was the norm. I considered it as a regression.
That said, _if_ the PHY has a way to read the resolved state rather than reading the advertisement registers, that is what should be used (as I said previously) rather than trying to decode the advertisement registers ourselves. That is normally more reliable for speed and duplex.
Wrt rtl8211f I don't have info other then the public datasheet, and there I didn't found any way other than reading the advertisement register.
I have read the latest comment from Heiner. I will check aqr107!
Thanks Antonio
From: Russell King
Sent: 24 November 2020 15:17
...
That said, _if_ the PHY has a way to read the resolved state rather than reading the advertisement registers, that is what should be used (as I said previously) rather than trying to decode the advertisement registers ourselves. That is normally more reliable for speed and duplex.
Determining the speed and duplux from the ANAR and ANRR (I can't remember the name of the response register) has always been completely broken.
The problems arise when you connect to either a 10M hub or a 10/100M autodetecting hub (these are a 10M hub and a 100M hub connected by a bridge). The PHY will either see single link test pulses (10M hub) or a simple burst of link test pulses (10/100 hub) and fall back to 10M HDX or 100M HDX. Both the 10M hub and 10/100 hub are happy with the link test pulse stream that contains the ANAR. However the ANRR register will (typically) contain the value from the last system that sent it one. So if you unplug from something that does 100M FDX and plug into a hub the MAC unit is likely to be misconfigured and do FDX.
Of course, there is no generic way to get the actual mode. I'm not sure the PHY I was using (a long time ago) even had any private register that could tell you.
For one system (which was never going to do anything fast) I just removed the FDX modes from the ANAR. The MAC didn't care whether it was 10M or 100M.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
linux-stable-mirror@lists.linaro.org