RE: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings are disabled

1 Oct 2024

      ...
-----Original Message-----
From: Vladimir Oltean vladimir.oltean@nxp.com
Sent: 2024年10月1日 6:03
To: Wei Fang wei.fang@nxp.com
Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
pabeni@redhat.com; Claudiu Manoil claudiu.manoil@nxp.com;
ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org;
john.fastabend@gmail.com; linux-kernel@vger.kernel.org;
netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org;
imx@lists.linux.dev; rkannoth@marvell.com; maciej.fijalkowski@intel.com;
sbhatta@marvell.com
Subject: Re: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings
are disabled
On Sun, Sep 29, 2024 at 10:45:06AM +0800, Wei Fang wrote:
...
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
on LS1028A, it was found that if the command was re-run multiple times,
Rx could not receive the frames, and the result of xdo-bench showed
that the rx rate was 0.
root@ls1028ardb:~# ./xdp-bench tx eno0
Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
Summary                      2046 rx/s                  0
err,drop/s
...
Summary                         0 rx/s                  0
err,drop/s
...
Summary                         0 rx/s                  0
err,drop/s
...
Summary                         0 rx/s                  0
err,drop/s
...
By observing the Rx PIR and CIR registers, we found that CIR is always
equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
is full and can no longer accommodate other Rx frames. Therefore, we
can conclude that the problem is caused by the Rx BD ring not being
cleaned up.
Further analysis of the code revealed that the Rx BD ring will only
be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
Therefore, some debug logs were added to the driver and the current
values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
BD ring was full. The logs are as follows.
[  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
[  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
[  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
From the results, we can see that the max value of xdp_tx_in_flight
has reached 2140. However, the size of the Rx BD ring is only 2048.
This is incredible, so we checked the code again and found that
xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled
and it was not reset when the bfp program was installed again. The
root cause is that the IRQ is disabled too early in enetc_stop(),
resulting in enetc_recycle_xdp_tx_buff() not being called, therefore,
xdp_tx_in_flight is not cleared.
Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet
processing")
...
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang wei.fang@nxp.com

v2 changes:

Modify the titile and rephrase the commit meesage.
Use the new solution as described in the title

I gave this another test under a bit different set of circumstances this time,
and I'm confident that there are still problems, which I haven't identified
though (yet).
With 64 byte frames at 2.5 Gbps, I see this going on:
$ xdp-bench tx eno0 &
$ while :; do taskset $((1 << 0)) hwstamp_ctl -i eno0 -r 1 && sleep 1 && taskset
$((1 << 0)) hwstamp_ctl -i eno0 -r 0 && sleep 1; done
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 0
rx_filter 1
Summary                 1,556,952 rx/s                  0 err,drop/s
Summary                         0 rx/s                  0 err,drop/s
Summary                         0 rx/s                  0 err,drop/s
current settings:
tx_type 0
rx_filter 1
Summary                         0 rx/s                  0 err,drop/s
[  883.780346] fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear (its
RX ring has 2072 XDP_TX frames in flight)
new settings:
tx_type 0
rx_filter 0
Summary                     1,027 rx/s                  0 err,drop/s
current settings:
tx_type 0
rx_filter 0
Summary                         0 rx/s                  0 err,drop/s
which looks like the symptoms that the patch tries to solve.
My previous testing was with 390 byte frames, and this did not happen.
Please do not merge this.
Oh, it looks like there are still some issues we don't know about. I did
test using 64 bytes but not at that high of a rate. Also I didn't turn on
timestamp. Anyway, I will try to reproduce the issue when I'm back to
office next Tuesday. It would be nice if you can help find the root cause
before next Tuesday, thanks!

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings are disabled