From: Alexander Sverdlin alexander.sverdlin@siemens.com
dsa_switch_shutdown() doesn't bring down any ports, but only disconnects slaves from master. Packets still come afterwards into master port and the ports are being polled for link status. This leads to crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 Workqueue: events_power_efficient phy_state_machine pc : lan9303_mdio_phy_read lr : lan9303_phy_read Call trace: lan9303_mdio_phy_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read genphy_update_link genphy_read_status phy_check_link_status phy_state_machine process_one_work worker_thread
Call lan9303_remove() instead to really unregister all ports before zeroing drvdata and dsa_ptr.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com --- drivers/net/dsa/lan9303-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index 268949939636..ecd507355f51 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
void lan9303_shutdown(struct lan9303 *chip) { - dsa_switch_shutdown(chip->ds); + lan9303_remove(chip); } EXPORT_SYMBOL(lan9303_shutdown);
Hi Alexander,
On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
From: Alexander Sverdlin alexander.sverdlin@siemens.com
dsa_switch_shutdown() doesn't bring down any ports, but only disconnects slaves from master. Packets still come afterwards into master port and the ports are being polled for link status. This leads to crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 Workqueue: events_power_efficient phy_state_machine pc : lan9303_mdio_phy_read lr : lan9303_phy_read Call trace: lan9303_mdio_phy_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read genphy_update_link genphy_read_status phy_check_link_status phy_state_machine process_one_work worker_thread
Call lan9303_remove() instead to really unregister all ports before zeroing drvdata and dsa_ptr.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com
drivers/net/dsa/lan9303-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index 268949939636..ecd507355f51 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove); void lan9303_shutdown(struct lan9303 *chip) {
- dsa_switch_shutdown(chip->ds);
- lan9303_remove(chip);
} EXPORT_SYMBOL(lan9303_shutdown); -- 2.46.0
You've said here that a similar change still does not protect against packets received after shutdown: https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.came...
The difference between that and this is the extra lan9303_disable_processing_port() calls here. But while that does disable RX on switch ports, it still doesn't wait for pending RX frames to be processed. So the race is still open. No?
Hello Vladimir!
On Wed, 2024-09-11 at 19:28 +0300, Vladimir Oltean wrote:
On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
From: Alexander Sverdlin alexander.sverdlin@siemens.com
dsa_switch_shutdown() doesn't bring down any ports, but only disconnects slaves from master. Packets still come afterwards into master port and the ports are being polled for link status. This leads to crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 Workqueue: events_power_efficient phy_state_machine pc : lan9303_mdio_phy_read lr : lan9303_phy_read Call trace: lan9303_mdio_phy_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read genphy_update_link genphy_read_status phy_check_link_status phy_state_machine process_one_work worker_thread
Call lan9303_remove() instead to really unregister all ports before zeroing drvdata and dsa_ptr.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com
drivers/net/dsa/lan9303-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index 268949939636..ecd507355f51 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove); void lan9303_shutdown(struct lan9303 *chip) {
- dsa_switch_shutdown(chip->ds);
- lan9303_remove(chip);
} EXPORT_SYMBOL(lan9303_shutdown); -- 2.46.0
You've said here that a similar change still does not protect against packets received after shutdown: https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.came...
The difference between that and this is the extra lan9303_disable_processing_port() calls here. But while that does disable RX on switch ports, it still doesn't wait for pending RX frames to be processed. So the race is still open. No?
This patch addresses the race of zeroing drvdata in
static void lan9303_mdio_shutdown(struct mdio_device *mdiodev) { struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
if (!sw_dev) return;
lan9303_shutdown(&sw_dev->chip);
dev_set_drvdata(&mdiodev->dev, NULL); }
versus
static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy, int reg) { struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);
what you refer to is another race, zeroing of dsa_ptr in struct net_device versus the whole network stack, which I addressed in https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@si...
Hello Vladimir,
On Wed, 2024-09-11 at 18:55 +0200, Alexander Sverdlin wrote:
You've said here that a similar change still does not protect against packets received after shutdown: https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.came...
The difference between that and this is the extra lan9303_disable_processing_port() calls here. But while that does disable RX on switch ports, it still doesn't wait for pending RX frames to be processed. So the race is still open. No?
besides from the below, I've expected this question... In the meanwhile I've tested mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown. After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read pattern therefore I've posted this tested patch.
If you'd prefer to solve this centrally for all drivers, I can test your patch from the MDIO-drvdata PoV.
This patch addresses the race of zeroing drvdata in
static void lan9303_mdio_shutdown(struct mdio_device *mdiodev) { struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev); if (!sw_dev) return; lan9303_shutdown(&sw_dev->chip); dev_set_drvdata(&mdiodev->dev, NULL); }
versus
static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy, int reg) { struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);
what you refer to is another race, zeroing of dsa_ptr in struct net_device versus the whole network stack, which I addressed in https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@si...
On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote:
The difference between that and this is the extra lan9303_disable_processing_port() calls here. But while that does disable RX on switch ports, it still doesn't wait for pending RX frames to be processed. So the race is still open. No?
besides from the below, I've expected this question... In the meanwhile I've tested mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown. After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read pattern therefore I've posted this tested patch.
If you'd prefer to solve this centrally for all drivers, I can test your patch from the MDIO-drvdata PoV.
But this would mean throwing away the whole "net: dsa: be compatible with masters which unregister on shutdown" work?
On Wed, Sep 11, 2024 at 05:09:08PM +0000, Sverdlin, Alexander wrote:
On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote:
The difference between that and this is the extra lan9303_disable_processing_port() calls here. But while that does disable RX on switch ports, it still doesn't wait for pending RX frames to be processed. So the race is still open. No?
besides from the below, I've expected this question... In the meanwhile I've tested mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown. After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read pattern therefore I've posted this tested patch.
If you'd prefer to solve this centrally for all drivers, I can test your patch from the MDIO-drvdata PoV.
But this would mean throwing away the whole "net: dsa: be compatible with masters which unregister on shutdown" work?
I did not propose anything (yet). I'm still trying to form a mental model of what is broken and what works. Hence the request to test that change.
OTOH, this patch is equally throwing away the whole "net: dsa: be compatible with masters which unregister on shutdown" work (for lan9303).
Hi Alexander,
On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
From: Alexander Sverdlin alexander.sverdlin@siemens.com
dsa_switch_shutdown() doesn't bring down any ports, but only disconnects slaves from master. Packets still come afterwards into master port and the ports are being polled for link status. This leads to crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 Workqueue: events_power_efficient phy_state_machine pc : lan9303_mdio_phy_read lr : lan9303_phy_read Call trace: lan9303_mdio_phy_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read genphy_update_link genphy_read_status phy_check_link_status phy_state_machine process_one_work worker_thread
Call lan9303_remove() instead to really unregister all ports before zeroing drvdata and dsa_ptr.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com
Could you please test this alternative solution (patch attached) for both reported problems?
Thanks.
Hi Vladimir!
Thank you for the quick fix!
On Thu, 2024-09-12 at 13:15 +0300, Vladimir Oltean wrote:
From: Alexander Sverdlin alexander.sverdlin@siemens.com
dsa_switch_shutdown() doesn't bring down any ports, but only disconnects slaves from master. Packets still come afterwards into master port and the ports are being polled for link status. This leads to crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 Workqueue: events_power_efficient phy_state_machine pc : lan9303_mdio_phy_read lr : lan9303_phy_read Call trace: lan9303_mdio_phy_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read genphy_update_link genphy_read_status phy_check_link_status phy_state_machine process_one_work worker_thread
Call lan9303_remove() instead to really unregister all ports before zeroing drvdata and dsa_ptr.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
Do you think it would make sense to add the same Fixes: tag as above? (That's the earlier one of the two)
Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com
Could you please test this alternative solution (patch attached) for both reported problems?
We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without, and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW. All of the above for a backport to v6.1, but this part should be OK, I believe.
Overall looks very good, you can add my Reviewed-by: Alexander Sverdlin alexander.sverdlin@siemens.com Tested-by: Alexander Sverdlin alexander.sverdlin@siemens.com when you officially publish the patch.
Hi Alexander,
On Fri, Sep 13, 2024 at 04:16:57PM +0000, Sverdlin, Alexander wrote:
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
Do you think it would make sense to add the same Fixes: tag as above? (That's the earlier one of the two)
Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com
Could you please test this alternative solution (patch attached) for both reported problems?
We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without, and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW. All of the above for a backport to v6.1, but this part should be OK, I believe.
Overall looks very good, you can add my Reviewed-by: Alexander Sverdlin alexander.sverdlin@siemens.com Tested-by: Alexander Sverdlin alexander.sverdlin@siemens.com when you officially publish the patch.
Thanks for testing, I really appreciate it.
I will add all the required tags for the formal patch submission.
linux-stable-mirror@lists.linaro.org