In the non-RT kernel, local_bh_disable() merely disables preemption, whereas it maps to an actual spin lock in the RT kernel. Consequently, when attempting to refill RX buffers via netdev_alloc_skb() in macb_mac_link_up(), a deadlock scenario arises as follows:
WARNING: possible circular locking dependency detected 6.18.0-08691-g2061f18ad76e #39 Not tainted ------------------------------------------------------ kworker/0:0/8 is trying to acquire lock: ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c
but task is already holding lock: ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit +0x148/0xb7c
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&queue->tx_ptr_lock){+...}-{3:3}: rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x148/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20
-> #2 (_xmit_ETHER#2){+...}-{3:3}: rt_spin_lock+0x50/0x1f0 sch_direct_xmit+0x11c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20
-> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}: lock_release+0x250/0x348 __local_bh_enable_ip+0x7c/0x240 __netdev_alloc_skb+0x1b4/0x1d8 gem_rx_refill+0xdc/0x240 gem_init_rings+0xb4/0x108 macb_mac_link_up+0x9c/0x2b4 phylink_resolve+0x170/0x614 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20
-> #0 (&bp->lock){+.+.}-{3:3}: __lock_acquire+0x15a8/0x2084 lock_acquire+0x1cc/0x350 rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x808/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20
other info that might help us debug this:
Chain exists of: &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&queue->tx_ptr_lock); lock(_xmit_ETHER#2); lock(&queue->tx_ptr_lock); lock(&bp->lock);
*** DEADLOCK ***
Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0xa0/0xf0 dump_stack+0x18/0x24 print_circular_bug+0x28c/0x370 check_noncircular+0x198/0x1ac __lock_acquire+0x15a8/0x2084 lock_acquire+0x1cc/0x350 rt_spin_lock+0x50/0x1f0 macb_start_xmit+0x808/0xb7c dev_hard_start_xmit+0x94/0x284 sch_direct_xmit+0x8c/0x37c __dev_queue_xmit+0x708/0x1120 neigh_resolve_output+0x148/0x28c ip6_finish_output2+0x2c0/0xb2c __ip6_finish_output+0x114/0x308 ip6_output+0xc4/0x4a4 mld_sendpack+0x220/0x68c mld_ifc_work+0x2a8/0x4f4 process_one_work+0x20c/0x5f8 worker_thread+0x1b0/0x35c kthread+0x144/0x200 ret_from_fork+0x10/0x20
Notably, invoking the mog_init_rings() callback upon link establishment is unnecessary. Instead, we can exclusively call mog_init_rings() within the ndo_open() callback. This adjustment resolves the deadlock issue. Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings() when opening the network interface via at91ether_open(), moving mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC check.
Fixes: 633e98a711ac ("net: macb: use resolved link config in mac_link_up()") Cc: stable@vger.kernel.org Suggested-by: Kevin Hao kexin.hao@windriver.com Signed-off-by: Xiaolei Wang xiaolei.wang@windriver.com ---
V1: https://patchwork.kernel.org/project/netdevbpf/patch/20251128103647.351259-1... V2: Update the correct lock dependency chain and add the Fix tag. V3: update commit log, Add full deadlock log added explanations: because MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings(), we don't need the MACB_CAPS_MACB_IS_EMAC check when moving mog_init_rings() to macb_open().
drivers/net/ethernet/cadence/macb_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index ca2386b83473..064fccdcf699 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -744,7 +744,6 @@ static void macb_mac_link_up(struct phylink_config *config, /* Initialize rings & buffers as clearing MACB_BIT(TE) in link down * cleared the pipeline and control registers. */ - bp->macbgem_ops.mog_init_rings(bp); macb_init_buffers(bp);
for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) @@ -2991,6 +2990,8 @@ static int macb_open(struct net_device *dev) goto pm_exit; }
+ bp->macbgem_ops.mog_init_rings(bp); + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { napi_enable(&queue->napi_rx); napi_enable(&queue->napi_tx);