On 07/22, Simon Horman wrote:
On Mon, Jul 21, 2025 at 09:54:22AM -0700, Stanislav Fomichev wrote:
Cosmin reports the following locking issue:
# BUG: sleeping function called from invalid context at kernel/locking/mutex.c:275 # dump_stack_lvl+0x4f/0x60 # __might_resched+0xeb/0x140 # mutex_lock+0x1a/0x40 # dev_set_promiscuity+0x26/0x90 # __dev_set_promiscuity+0x85/0x170 # __dev_set_rx_mode+0x69/0xa0 # dev_uc_add+0x6d/0x80 # vlan_dev_open+0x5f/0x120 [8021q] # __dev_open+0x10c/0x2a0 # __dev_change_flags+0x1a4/0x210 # netif_change_flags+0x22/0x60 # do_setlink.isra.0+0xdb0/0x10f0 # rtnl_newlink+0x797/0xb00 # rtnetlink_rcv_msg+0x1cb/0x3f0 # netlink_rcv_skb+0x53/0x100 # netlink_unicast+0x273/0x3b0 # netlink_sendmsg+0x1f2/0x430
Which is similar to recent syzkaller reports in [0] and [1] and triggers because macsec does not advertise IFF_UNICAST_FLT although it has proper ndo_set_rx_mode callback that takes care of pushing uc/mc addresses down to the real device.
In general, dev_uc_add call path is problematic for stacking non-IFF_UNICAST_FLT because we might grab netdev instance lock under addr_list_lock spinlock, so this is not a systemic fix.
0: https://lore.kernel.org/netdev/686d55b4.050a0220.1ffab7.0014.GAE@google.com 1: https://lore.kernel.org/netdev/68712acf.a00a0220.26a83e.0051.GAE@google.com/ Link: 2aff4342b0f5b1539c02ffd8df4c7e58dd9746e7.camel@nvidia.com
I think that Link: should be followed by a URL
Link: https://lore.kernel.org/netdev/2aff4342b0f5b1539c02ffd8df4c7e58dd9746e7.came...
Whoops, sorry, forgot to prefix the message id with a URL :-( If this gets a CR, I'll repost with a fix. (presumably should be easy to fix during git am)
Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Reported-by: Cosmin Ratiu cratiu@nvidia.com Tested-by: Cosmin Ratiu cratiu@nvidia.com Signed-off-by: Stanislav Fomichev sdf@fomichev.me
Hi Stan,
I ran the test provided by patch 2/2. When run with with a debug kernel using VNG.
It reliably passes with patch 1/2 applied. And fails without patch 1/2 applied. Where fails means the kernel panics along the lines of the stack trace in the commit message.
Reviewed-by: Simon Horman horms@kernel.org Tested-by: Simon Horman horms@kernel.org
Thank you for testing!