Hello Vladimir!
On Wed, 2024-09-11 at 19:28 +0300, Vladimir Oltean wrote:
On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
From: Alexander Sverdlin alexander.sverdlin@siemens.com
dsa_switch_shutdown() doesn't bring down any ports, but only disconnects slaves from master. Packets still come afterwards into master port and the ports are being polled for link status. This leads to crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 Workqueue: events_power_efficient phy_state_machine pc : lan9303_mdio_phy_read lr : lan9303_phy_read Call trace: lan9303_mdio_phy_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read genphy_update_link genphy_read_status phy_check_link_status phy_state_machine process_one_work worker_thread
Call lan9303_remove() instead to really unregister all ports before zeroing drvdata and dsa_ptr.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com
drivers/net/dsa/lan9303-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index 268949939636..ecd507355f51 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove); void lan9303_shutdown(struct lan9303 *chip) {
- dsa_switch_shutdown(chip->ds);
- lan9303_remove(chip);
} EXPORT_SYMBOL(lan9303_shutdown); -- 2.46.0
You've said here that a similar change still does not protect against packets received after shutdown: https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.came...
The difference between that and this is the extra lan9303_disable_processing_port() calls here. But while that does disable RX on switch ports, it still doesn't wait for pending RX frames to be processed. So the race is still open. No?
This patch addresses the race of zeroing drvdata in
static void lan9303_mdio_shutdown(struct mdio_device *mdiodev) { struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
if (!sw_dev) return;
lan9303_shutdown(&sw_dev->chip);
dev_set_drvdata(&mdiodev->dev, NULL); }
versus
static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy, int reg) { struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);
what you refer to is another race, zeroing of dsa_ptr in struct net_device versus the whole network stack, which I addressed in https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@si...