On Wed, 2025-05-07 at 14:20 -0700, Stanislav Fomichev wrote:
On 05/07, Cosmin Ratiu wrote:
On Wed, 2025-05-07 at 15:13 +0000, Cosmin Ratiu wrote:
In any case, please hold off with picking this patch up, it seems there's a possibility of a real deadlock. Here's the scenario:
Hmm, are you sure you're calling __netdev_update_features on the upper? I don't see how the lower would be locked in that case. From my POW, this is what happens:
- your dev (lower) has a vlan on it (upper)
- you call lro=off on the _lower_
- this triggers FEAT_CHANGE notifier and vlan_device_event catches
it 4. since the lower has a vlan device (dev->vlan_info != NULL), it goes over every other vlan in the group and triggers netdev_update_features for the upper (netdev_update_features vlandev) 5. the upper tries to sync the features into the lower (including the one that triggered FEAT_CHANGE) and that's where the deadlock happens
But I think (5) should be largely a no-op for the device triggering the notification, because the features have been already applied in ethnl_set_features. I'd move the lock into netdev_sync_lower_features, and only after checking the features (and making sure that we are going to change them). The feature check might be racy, but I think it should still work?
You are right, if I restrict the lower dev critical section to only the call to __netdev_update_features for the lower dev there's no deadlock any more, because the device with the lock held already had its features updated.
I will send a new version of this patch soon after the full regression suite finishes and I convince myself there are no more issues related to this that we can encounter.
Can you also share the bonding stacktrace as well to confirm it's the same issue?
Sure, here it is, it's the same scenario. bond_netdev_event gets called on a slave dev, it recomputes features and updates all slaves (bond_compute_features), and then the same lock is reacquired.
But this is also fixed with your suggestion above.
============================================ WARNING: possible recursive locking detected
devlink/14341 is trying to acquire lock: ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at: __netdev_update_features+0x31e/0xe20
but task is already holding lock: ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at: mlx5e_attach_netdev+0x31f/0x360 [mlx5_core] and the lock comparison function returns 0:
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&dev_instance_lock_key#9); lock(&dev_instance_lock_key#9);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by devlink/14341: #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40 #1: ffff888164a5c250 (&devlink->lock_key){+.+.}-{4:4}, at: devlink_get_from_attrs_lock+0xbc/0x180 #2: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at: mlx5e_attach_netdev+0x30d/0x360 [mlx5_core] #3: ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at: mlx5e_attach_netdev+0x31f/0x360 [mlx5_core]
Call Trace: <TASK> dump_stack_lvl+0x69/0xa0 print_deadlock_bug.cold+0xbd/0xca __lock_acquire+0x163c/0x2f00 lock_acquire+0xd3/0x2e0 __mutex_lock+0x98/0xf10 __netdev_update_features+0x31e/0xe20 netdev_change_features+0x1f/0x60 bond_compute_features+0x24e/0x300 [bonding] bond_netdev_event+0x2e0/0x400 [bonding] notifier_call_chain+0x3d/0x100 netdev_update_features+0x52/0x60 mlx5e_attach_netdev+0x32f/0x360 [mlx5_core] mlx5e_netdev_attach_profile+0x48/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x90/0xf0 [mlx5_core] mlx5e_vport_rep_load+0x414/0x490 [mlx5_core] __esw_offloads_load_rep+0x87/0xd0 [mlx5_core] mlx5_esw_offloads_rep_load+0x45/0xe0 [mlx5_core] esw_offloads_enable+0xb7b/0xca0 [mlx5_core] mlx5_eswitch_enable_locked+0x293/0x430 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x229/0x620 [mlx5_core] devlink_nl_eswitch_set_doit+0x60/0xd0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x188/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1e1/0x2c0 netlink_sendmsg+0x210/0x450
Cosmin.