We found some bugs when testing the XDP function of enetc driver, and these bugs are easy to reproduce. This is not only causes XDP to not work, but also the network cannot be restored after exiting the XDP program. So the patch set is mainly to fix these bugs. For details, please see the commit message of each patch.
--- v1 link: https://lore.kernel.org/bpf/20240919084104.661180-1-wei.fang@nxp.com/T/ v2 link: https://lore.kernel.org/netdev/20241008224806.2onzkt3gbslw5jxb@skbuf/T/ ---
Wei Fang (3): net: enetc: remove xdp_drops statistic from enetc_xdp_drop() net: enetc: fix the issues of XDP_REDIRECT feature net: enetc: disable IRQ after Rx and Tx BD rings are disabled
drivers/net/ethernet/freescale/enetc/enetc.c | 56 +++++++++++++++----- drivers/net/ethernet/freescale/enetc/enetc.h | 1 + 2 files changed, 44 insertions(+), 13 deletions(-)
The xdp_drops statistic indicates the number of XDP frames dropped in the Rx direction. However, enetc_xdp_drop() is also used in XDP_TX and XDP_REDIRECT actions. If frame loss occurs in these two actions, the frames loss count should not be included in xdp_drops, because there are already xdp_tx_drops and xdp_redirect_failures to count the frame loss of these two actions, so it's better to remove xdp_drops statistic from enetc_xdp_drop() and increase xdp_drops in XDP_DROP action.
Fixes: 7ed2bc80074e ("net: enetc: add support for XDP_TX") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang wei.fang@nxp.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com --- v2: no changes v3: no changes --- drivers/net/ethernet/freescale/enetc/enetc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 032d8eadd003..56e59721ec7d 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -1521,7 +1521,6 @@ static void enetc_xdp_drop(struct enetc_bdr *rx_ring, int rx_ring_first, &rx_ring->rx_swbd[rx_ring_first]); enetc_bdr_idx_inc(rx_ring, &rx_ring_first); } - rx_ring->stats.xdp_drops++; }
static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring, @@ -1586,6 +1585,7 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring, fallthrough; case XDP_DROP: enetc_xdp_drop(rx_ring, orig_i, i); + rx_ring->stats.xdp_drops++; break; case XDP_PASS: rxbd = orig_rxbd;
On Wed, Oct 09, 2024 at 05:03:25PM +0800, Wei Fang wrote:
The xdp_drops statistic indicates the number of XDP frames dropped in the Rx direction. However, enetc_xdp_drop() is also used in XDP_TX and XDP_REDIRECT actions. If frame loss occurs in these two actions, the frames loss count should not be included in xdp_drops, because there are already xdp_tx_drops and xdp_redirect_failures to count the frame loss of these two actions, so it's better to remove xdp_drops statistic from enetc_xdp_drop() and increase xdp_drops in XDP_DROP action.
Fixes: 7ed2bc80074e ("net: enetc: add support for XDP_TX") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang wei.fang@nxp.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com
Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com
When testing the XDP_REDIRECT function on the LS1028A platform, we found a very reproducible issue that the Tx frames can no longer be sent out even if XDP_REDIRECT is turned off. Specifically, if there is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, the console may display some warnings like "timeout for tx ring #6 clear", and all redirected frames will be dropped, the detaild log is as follows.
root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg
By analyzing the XDP_REDIRECT implementation of enetc driver, we found two problems. First, enetc driver will reconfigure Tx and Rx BD rings when a bpf program is installed or uninstalled, but there is no mechanisms to block the redirected frames when enetc driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to prevent the redirected frames to be attached to Tx BD rings. This is not only used to block XDP_REDIRECT frames, but also to block XDP_TX frames.
Second, Tx BD rings are disabled first in enetc_stop() and then wait for empty. This operation is not safe while the Tx BD ring is actively transmitting frames, and will cause the ring to not be empty and hardware exception. As described in the block guide of LS1028A NETC, software should only disable an active ring after all pending ring entries have been consumed (i.e. when PI = CI). Disabling a transmit ring that is actively processing BDs risks a HW-SW race hazard whereby a hardware resource becomes assigned to work on one or more ring entries only to have those entries be removed due to the ring becoming disabled. So the correct behavior is that the software stops putting frames on the Tx BD rings (this is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be empty, and finally disables the Tx BD rings.
Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang wei.fang@nxp.com --- v2 changes: Remove a blank line from the end of enetc_disable_tx_bdrs(). v3 changes: Block the XDP_TX frames when ENETC_TX_DOWN flag is set. --- drivers/net/ethernet/freescale/enetc/enetc.c | 50 ++++++++++++++++---- drivers/net/ethernet/freescale/enetc/enetc.h | 1 + 2 files changed, 41 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 56e59721ec7d..52da10f62430 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -902,6 +902,7 @@ static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
if (unlikely(tx_frm_cnt && netif_carrier_ok(ndev) && __netif_subqueue_stopped(ndev, tx_ring->index) && + !test_bit(ENETC_TX_DOWN, &priv->flags) && (enetc_bd_unused(tx_ring) >= ENETC_TXBDS_MAX_NEEDED))) { netif_wake_subqueue(ndev, tx_ring->index); } @@ -1377,6 +1378,9 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames, int xdp_tx_bd_cnt, i, k; int xdp_tx_frm_cnt = 0;
+ if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags))) + return -ENETDOWN; + enetc_lock_mdio();
tx_ring = priv->xdp_tx_ring[smp_processor_id()]; @@ -1602,6 +1606,12 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring, break; case XDP_TX: tx_ring = priv->xdp_tx_ring[rx_ring->index]; + if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags))) { + enetc_xdp_drop(rx_ring, orig_i, i); + tx_ring->stats.xdp_tx_drops++; + break; + } + xdp_tx_bd_cnt = enetc_rx_swbd_to_xdp_tx_swbd(xdp_tx_arr, rx_ring, orig_i, i); @@ -2223,18 +2233,24 @@ static void enetc_enable_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring) enetc_rxbdr_wr(hw, idx, ENETC_RBMR, rbmr); }
-static void enetc_enable_bdrs(struct enetc_ndev_priv *priv) +static void enetc_enable_rx_bdrs(struct enetc_ndev_priv *priv) { struct enetc_hw *hw = &priv->si->hw; int i;
- for (i = 0; i < priv->num_tx_rings; i++) - enetc_enable_txbdr(hw, priv->tx_ring[i]); - for (i = 0; i < priv->num_rx_rings; i++) enetc_enable_rxbdr(hw, priv->rx_ring[i]); }
+static void enetc_enable_tx_bdrs(struct enetc_ndev_priv *priv) +{ + struct enetc_hw *hw = &priv->si->hw; + int i; + + for (i = 0; i < priv->num_tx_rings; i++) + enetc_enable_txbdr(hw, priv->tx_ring[i]); +} + static void enetc_disable_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring) { int idx = rx_ring->index; @@ -2251,18 +2267,24 @@ static void enetc_disable_txbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring) enetc_txbdr_wr(hw, idx, ENETC_TBMR, 0); }
-static void enetc_disable_bdrs(struct enetc_ndev_priv *priv) +static void enetc_disable_rx_bdrs(struct enetc_ndev_priv *priv) { struct enetc_hw *hw = &priv->si->hw; int i;
- for (i = 0; i < priv->num_tx_rings; i++) - enetc_disable_txbdr(hw, priv->tx_ring[i]); - for (i = 0; i < priv->num_rx_rings; i++) enetc_disable_rxbdr(hw, priv->rx_ring[i]); }
+static void enetc_disable_tx_bdrs(struct enetc_ndev_priv *priv) +{ + struct enetc_hw *hw = &priv->si->hw; + int i; + + for (i = 0; i < priv->num_tx_rings; i++) + enetc_disable_txbdr(hw, priv->tx_ring[i]); +} + static void enetc_wait_txbdr(struct enetc_hw *hw, struct enetc_bdr *tx_ring) { int delay = 8, timeout = 100; @@ -2452,6 +2474,8 @@ void enetc_start(struct net_device *ndev)
enetc_setup_interrupts(priv);
+ enetc_enable_tx_bdrs(priv); + for (i = 0; i < priv->bdr_int_num; i++) { int irq = pci_irq_vector(priv->si->pdev, ENETC_BDR_INT_BASE_IDX + i); @@ -2460,9 +2484,11 @@ void enetc_start(struct net_device *ndev) enable_irq(irq); }
- enetc_enable_bdrs(priv); + enetc_enable_rx_bdrs(priv);
netif_tx_start_all_queues(ndev); + + clear_bit(ENETC_TX_DOWN, &priv->flags); } EXPORT_SYMBOL_GPL(enetc_start);
@@ -2520,9 +2546,11 @@ void enetc_stop(struct net_device *ndev) struct enetc_ndev_priv *priv = netdev_priv(ndev); int i;
+ set_bit(ENETC_TX_DOWN, &priv->flags); + netif_tx_stop_all_queues(ndev);
- enetc_disable_bdrs(priv); + enetc_disable_rx_bdrs(priv);
for (i = 0; i < priv->bdr_int_num; i++) { int irq = pci_irq_vector(priv->si->pdev, @@ -2535,6 +2563,8 @@ void enetc_stop(struct net_device *ndev)
enetc_wait_bdrs(priv);
+ enetc_disable_tx_bdrs(priv); + enetc_clear_interrupts(priv); } EXPORT_SYMBOL_GPL(enetc_stop); diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h index 97524dfa234c..fb7d98d57783 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.h +++ b/drivers/net/ethernet/freescale/enetc/enetc.h @@ -325,6 +325,7 @@ enum enetc_active_offloads {
enum enetc_flags_bit { ENETC_TX_ONESTEP_TSTAMP_IN_PROGRESS = 0, + ENETC_TX_DOWN, };
/* interrupt coalescing modes */
Commit title still mentions only XDP_REDIRECT, whereas implementation also touches XDP_TX (and only makes a very minor mention of it).
Wouldn't it be better to have "net: enetc: block concurrent XDP transmissions during ring reconfiguration" for a commit title?
On Wed, Oct 09, 2024 at 05:03:26PM +0800, Wei Fang wrote:
When testing the XDP_REDIRECT function on the LS1028A platform, we found a very reproducible issue that the Tx frames can no longer be sent out even if XDP_REDIRECT is turned off. Specifically, if there is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, the console may display some warnings like "timeout for tx ring #6 clear", and all redirected frames will be dropped, the detaild log
detailed
is as follows.
root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg
By analyzing the XDP_REDIRECT implementation of enetc driver, we found two problems. First, enetc driver will reconfigure Tx and Rx BD rings when a bpf program is installed or uninstalled, but there is no mechanisms to block the redirected frames when enetc driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to
(.. driver reconfigures BD rings.) Similarly, XDP_TX verdicts on received frames can also lead to frames being enqueued in the TX rings. Because XDP ignores the state set by the netif_tx_wake_queue() API, we also have to introduce the ENETC_TX_DOWN flag to suppress transmission of XDP frames.
prevent the redirected frames to be attached to Tx BD rings. This is not only used to block XDP_REDIRECT frames, but also to block XDP_TX frames.
Second, Tx BD rings are disabled first in enetc_stop() and then wait for empty. This operation is not safe while the Tx BD ring
the driver waits for them to become empty.
is actively transmitting frames, and will cause the ring to not be empty and hardware exception. As described in the block guide of LS1028A NETC, software should only disable an active ring after all pending ring entries have been consumed (i.e. when PI = CI). Disabling a transmit ring that is actively processing BDs risks a HW-SW race hazard whereby a hardware resource becomes assigned to work on one or more ring entries only to have those entries be removed due to the ring becoming disabled. So the correct behavior is that the software stops putting frames on the Tx BD rings (this is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be empty, and finally disables the Tx BD rings.
It feels like this separate part (refactoring of enetc_start() and enetc_stop() operation ordering) should be its own patch? It is logically different than the introduction and checking of the ENETC_TX_DOWN condition.
-----Original Message----- From: Vladimir Oltean vladimir.oltean@nxp.com Sent: 2024年10月9日 19:35 To: Wei Fang wei.fang@nxp.com Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Claudiu Manoil claudiu.manoil@nxp.com; ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org; john.fastabend@gmail.com; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org; imx@lists.linux.dev; rkannoth@marvell.com; maciej.fijalkowski@intel.com; sbhatta@marvell.com Subject: Re: [PATCH v3 net 2/3] net: enetc: fix the issues of XDP_REDIRECT feature
Commit title still mentions only XDP_REDIRECT, whereas implementation also touches XDP_TX (and only makes a very minor mention of it).
Wouldn't it be better to have "net: enetc: block concurrent XDP transmissions during ring reconfiguration" for a commit title?
On Wed, Oct 09, 2024 at 05:03:26PM +0800, Wei Fang wrote:
When testing the XDP_REDIRECT function on the LS1028A platform, we found a very reproducible issue that the Tx frames can no longer be sent out even if XDP_REDIRECT is turned off. Specifically, if there is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, the console may display some warnings like "timeout for tx ring #6 clear", and all redirected frames will be dropped, the detaild log
detailed
is as follows.
root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s
15.71 bulk-avg
eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s
15.71 bulk-avg
By analyzing the XDP_REDIRECT implementation of enetc driver, we found two problems. First, enetc driver will reconfigure Tx and Rx BD rings when a bpf program is installed or uninstalled, but there is no mechanisms to block the redirected frames when enetc driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to
(.. driver reconfigures BD rings.) Similarly, XDP_TX verdicts on received frames can also lead to frames being enqueued in the TX rings. Because XDP ignores the state set by the netif_tx_wake_queue() API, we also have to introduce the ENETC_TX_DOWN flag to suppress transmission of XDP frames.
prevent the redirected frames to be attached to Tx BD rings. This is not only used to block XDP_REDIRECT frames, but also to block XDP_TX frames.
Second, Tx BD rings are disabled first in enetc_stop() and then wait for empty. This operation is not safe while the Tx BD ring
the driver waits for them to become empty.
is actively transmitting frames, and will cause the ring to not be empty and hardware exception. As described in the block guide of LS1028A NETC, software should only disable an active ring after all pending ring entries have been consumed (i.e. when PI = CI). Disabling a transmit ring that is actively processing BDs risks a HW-SW race hazard whereby a hardware resource becomes assigned to work on one or more ring entries only to have those entries be removed due to the ring becoming disabled. So the correct behavior is that the software stops putting frames on the Tx BD rings (this is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be empty, and finally disables the Tx BD rings.
It feels like this separate part (refactoring of enetc_start() and enetc_stop() operation ordering) should be its own patch? It is logically different than the introduction and checking of the ENETC_TX_DOWN condition.
Okay, I will separate this patch into two patches, one is for ENETC_TX_DOWN, the other is for disabling Tx BDRs after the rings are empty. Thanks.
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC on LS1028A, it was found that if the command was re-run multiple times, Rx could not receive the frames, and the result of xdo-bench showed that the rx rate was 0.
root@ls1028ardb:~# ./xdp-bench tx eno0 Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) Summary 2046 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s
By observing the Rx PIR and CIR registers, we found that CIR is always equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring is full and can no longer accommodate other Rx frames. Therefore, we can conclude that the problem is caused by the Rx BD ring not being cleaned up.
Further analysis of the code revealed that the Rx BD ring will only be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. Therefore, some debug logs were added to the driver and the current values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx BD ring was full. The logs are as follows.
[ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
From the results, we can see that the max value of xdp_tx_in_flight has reached 2140. However, the size of the Rx BD ring is only 2048. This is incredible, so we checked the code again and found that xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled and it was not reset when the bfp program was installed again. The root cause is that the IRQ is disabled too early in enetc_stop(), resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, xdp_tx_in_flight is not cleared.
Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang wei.fang@nxp.com --- v2 changes: 1. Modify the titile and rephrase the commit meesage. 2. Use the new solution as described in the title v3 changes: no changes. --- drivers/net/ethernet/freescale/enetc/enetc.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 52da10f62430..c09370eab319 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -2474,8 +2474,6 @@ void enetc_start(struct net_device *ndev)
enetc_setup_interrupts(priv);
- enetc_enable_tx_bdrs(priv); - for (i = 0; i < priv->bdr_int_num; i++) { int irq = pci_irq_vector(priv->si->pdev, ENETC_BDR_INT_BASE_IDX + i); @@ -2484,6 +2482,8 @@ void enetc_start(struct net_device *ndev) enable_irq(irq); }
+ enetc_enable_tx_bdrs(priv); + enetc_enable_rx_bdrs(priv);
netif_tx_start_all_queues(ndev); @@ -2552,6 +2552,10 @@ void enetc_stop(struct net_device *ndev)
enetc_disable_rx_bdrs(priv);
+ enetc_wait_bdrs(priv); + + enetc_disable_tx_bdrs(priv); + for (i = 0; i < priv->bdr_int_num; i++) { int irq = pci_irq_vector(priv->si->pdev, ENETC_BDR_INT_BASE_IDX + i); @@ -2561,10 +2565,6 @@ void enetc_stop(struct net_device *ndev) napi_disable(&priv->int_vector[i]->napi); }
- enetc_wait_bdrs(priv); - - enetc_disable_tx_bdrs(priv); - enetc_clear_interrupts(priv); } EXPORT_SYMBOL_GPL(enetc_stop);
On Wed, Oct 09, 2024 at 05:03:27PM +0800, Wei Fang wrote:
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC on LS1028A, it was found that if the command was re-run multiple times, Rx could not receive the frames, and the result of xdo-bench showed
xdp-bench
that the rx rate was 0.
root@ls1028ardb:~# ./xdp-bench tx eno0 Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) Summary 2046 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s Summary 0 rx/s 0 err,drop/s
By observing the Rx PIR and CIR registers, we found that CIR is always equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring is full and can no longer accommodate other Rx frames. Therefore, we can conclude that the problem is caused by the Rx BD ring not being cleaned up.
Further analysis of the code revealed that the Rx BD ring will only be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. Therefore, some debug logs were added to the driver and the current values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx BD ring was full. The logs are as follows.
[ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
From the results, we can see that the max value of xdp_tx_in_flight has reached 2140. However, the size of the Rx BD ring is only 2048. This is incredible, so we checked the code again and found that xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled and it was not reset when the bfp program was installed again.
Please make it clear that this is more general and it happens whenever enetc_stop() is called.
The root cause is that the IRQ is disabled too early in enetc_stop(), resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, xdp_tx_in_flight is not cleared.
I feel that the problem is not so much the IRQ, as the NAPI (softirq), really. Under heavy traffic we don't even get that many hardirqs (if any), but NAPI just reschedules itself because of the budget which constantly gets exceeded. Please make this also clear in the commit title, something like "net: enetc: disable NAPI only after TX rings are empty".
I would restate the problem as: "The root cause is that we disable NAPI too aggressively, without having waited for the pending XDP_TX frames to be transmitted, and their buffers recycled, so that the xdp_tx_in_flight counter can naturally drop to zero. Later, enetc_free_tx_ring() does free those stale, untransmitted XDP_TX packets, but it is not coded up to also reset the xdp_tx_in_flight counter, hence the manifestation of the bug."
And then we should have a paragraph that describes the solution as well. "One option would be to cover this extra condition in enetc_free_tx_ring(), but now that the ENETC_TX_DOWN exists, we have created a window at the beginning of enetc_stop() where NAPI can still be scheduled, but any concurrent enqueue will be blocked. Therefore, we can call enetc_wait_bdrs() and enetc_disable_tx_bdrs() with NAPI still scheduled, and it is guaranteed that this will not wait indefinitely, but instead give us an indication that the pending TX frames have orderly dropped to zero. Only then should we call napi_disable().
This way, enetc_free_tx_ring() becomes entirely redundant and can be dropped as part of subsequent cleanup.
The change also refactors enetc_start() so that it looks like the mirror opposite procedure of enetc_stop()."
I think describing the problem and solution in these terms gives the reviewers more versed in NAPI a better chance of understanding what is going on and what we are trying to achieve.
-----Original Message----- From: Vladimir Oltean vladimir.oltean@nxp.com Sent: 2024年10月9日 20:10 To: Wei Fang wei.fang@nxp.com Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Claudiu Manoil claudiu.manoil@nxp.com; ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org; john.fastabend@gmail.com; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org; imx@lists.linux.dev; rkannoth@marvell.com; maciej.fijalkowski@intel.com; sbhatta@marvell.com Subject: Re: [PATCH v3 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings are disabled
On Wed, Oct 09, 2024 at 05:03:27PM +0800, Wei Fang wrote:
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC on LS1028A, it was found that if the command was re-run multiple times, Rx could not receive the frames, and the result of xdo-bench showed
xdp-bench
that the rx rate was 0.
root@ls1028ardb:~# ./xdp-bench tx eno0 Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) Summary 2046 rx/s 0
err,drop/s
Summary 0 rx/s 0
err,drop/s
Summary 0 rx/s 0
err,drop/s
Summary 0 rx/s 0
err,drop/s
By observing the Rx PIR and CIR registers, we found that CIR is always equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring is full and can no longer accommodate other Rx frames. Therefore, we can conclude that the problem is caused by the Rx BD ring not being cleaned up.
Further analysis of the code revealed that the Rx BD ring will only be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. Therefore, some debug logs were added to the driver and the current values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx BD ring was full. The logs are as follows.
[ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
From the results, we can see that the max value of xdp_tx_in_flight has reached 2140. However, the size of the Rx BD ring is only 2048. This is incredible, so we checked the code again and found that xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled and it was not reset when the bfp program was installed again.
Please make it clear that this is more general and it happens whenever enetc_stop() is called.
The root cause is that the IRQ is disabled too early in enetc_stop(), resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, xdp_tx_in_flight is not cleared.
I feel that the problem is not so much the IRQ, as the NAPI (softirq), really. Under heavy traffic we don't even get that many hardirqs (if any), but NAPI just reschedules itself because of the budget which constantly gets exceeded. Please make this also clear in the commit title, something like "net: enetc: disable NAPI only after TX rings are empty".
I would restate the problem as: "The root cause is that we disable NAPI too aggressively, without having waited for the pending XDP_TX frames to be transmitted, and their buffers recycled, so that the xdp_tx_in_flight counter can naturally drop to zero. Later, enetc_free_tx_ring() does free those stale, untransmitted XDP_TX packets, but it is not coded up to also reset the xdp_tx_in_flight counter, hence the manifestation of the bug."
And then we should have a paragraph that describes the solution as well. "One option would be to cover this extra condition in enetc_free_tx_ring(), but now that the ENETC_TX_DOWN exists, we have created a window at the beginning of enetc_stop() where NAPI can still be scheduled, but any concurrent enqueue will be blocked. Therefore, we can call enetc_wait_bdrs() and enetc_disable_tx_bdrs() with NAPI still scheduled, and it is guaranteed that this will not wait indefinitely, but instead give us an indication that the pending TX frames have orderly dropped to zero. Only then should we call napi_disable().
This way, enetc_free_tx_ring() becomes entirely redundant and can be dropped as part of subsequent cleanup.
The change also refactors enetc_start() so that it looks like the mirror opposite procedure of enetc_stop()."
I think describing the problem and solution in these terms gives the reviewers more versed in NAPI a better chance of understanding what is going on and what we are trying to achieve.
Thanks for helping rephrase the commit message, I will applying it to the next version.
linux-stable-mirror@lists.linaro.org