On Tue, Oct 08, 2024 at 06:30:49AM +0300, Wei Fang wrote:
I think the reason is that Rx BDRs are disabled when enetc_stop() is called, but there are still many unprocessed frames on Rx BDR. These frames will be processed by XDP program and put into Tx BDR. So enetc_wait_txbdr() will timeout and cause xdp_tx_in_flight will not be cleared.
So based on this patch, we should add a separate patch, similar to the patch 2 ("net: enetc: fix the issues of XDP_REDIRECT feature "), which prevents the XDP_TX frames from being put into Tx BDRs when the ENETC_TX_DOWN flag is set. The new patch is shown below. After adding this new patch, I followed your test steps and tested for more than 30 minutes, and the issue cannot be reproduced anymore (without this patch, this problem would be reproduced within seconds).
--- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -1606,6 +1606,12 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring, break; case XDP_TX: tx_ring = priv->xdp_tx_ring[rx_ring->index];
if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags))) {
enetc_xdp_drop(rx_ring, orig_i, i);
tx_ring->stats.xdp_tx_drops++;
break;
}
xdp_tx_bd_cnt = enetc_rx_swbd_to_xdp_tx_swbd(xdp_tx_arr, rx_ring, orig_i, i);
Yeah, it works on my side as well. Thanks for following up.
I would argue that the above snippet should be a fixup for the "net: enetc: fix the issues of XDP_REDIRECT feature" change, and a rewrite of the commit message is in order. Currently, as a reader, I get the impression that only XDP_REDIRECT needs to check the ENETC_TX_DOWN flag, only for the next patch to come and say "remember what was said about the TX ring not being allowed to actively transmit frames while disabling it? well, that patch wasn't sufficient to ensure this condition, because XDP_TX needs to respect that flag as well!"