-----Original Message----- From: Vladimir Oltean vladimir.oltean@nxp.com Sent: 2024年10月9日 19:35 To: Wei Fang wei.fang@nxp.com Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Claudiu Manoil claudiu.manoil@nxp.com; ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org; john.fastabend@gmail.com; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org; imx@lists.linux.dev; rkannoth@marvell.com; maciej.fijalkowski@intel.com; sbhatta@marvell.com Subject: Re: [PATCH v3 net 2/3] net: enetc: fix the issues of XDP_REDIRECT feature
Commit title still mentions only XDP_REDIRECT, whereas implementation also touches XDP_TX (and only makes a very minor mention of it).
Wouldn't it be better to have "net: enetc: block concurrent XDP transmissions during ring reconfiguration" for a commit title?
On Wed, Oct 09, 2024 at 05:03:26PM +0800, Wei Fang wrote:
When testing the XDP_REDIRECT function on the LS1028A platform, we found a very reproducible issue that the Tx frames can no longer be sent out even if XDP_REDIRECT is turned off. Specifically, if there is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, the console may display some warnings like "timeout for tx ring #6 clear", and all redirected frames will be dropped, the detaild log
detailed
is as follows.
root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s
15.71 bulk-avg
eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s
15.71 bulk-avg
By analyzing the XDP_REDIRECT implementation of enetc driver, we found two problems. First, enetc driver will reconfigure Tx and Rx BD rings when a bpf program is installed or uninstalled, but there is no mechanisms to block the redirected frames when enetc driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to
(.. driver reconfigures BD rings.) Similarly, XDP_TX verdicts on received frames can also lead to frames being enqueued in the TX rings. Because XDP ignores the state set by the netif_tx_wake_queue() API, we also have to introduce the ENETC_TX_DOWN flag to suppress transmission of XDP frames.
prevent the redirected frames to be attached to Tx BD rings. This is not only used to block XDP_REDIRECT frames, but also to block XDP_TX frames.
Second, Tx BD rings are disabled first in enetc_stop() and then wait for empty. This operation is not safe while the Tx BD ring
the driver waits for them to become empty.
is actively transmitting frames, and will cause the ring to not be empty and hardware exception. As described in the block guide of LS1028A NETC, software should only disable an active ring after all pending ring entries have been consumed (i.e. when PI = CI). Disabling a transmit ring that is actively processing BDs risks a HW-SW race hazard whereby a hardware resource becomes assigned to work on one or more ring entries only to have those entries be removed due to the ring becoming disabled. So the correct behavior is that the software stops putting frames on the Tx BD rings (this is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be empty, and finally disables the Tx BD rings.
It feels like this separate part (refactoring of enetc_start() and enetc_stop() operation ordering) should be its own patch? It is logically different than the introduction and checking of the ENETC_TX_DOWN condition.
Okay, I will separate this patch into two patches, one is for ENETC_TX_DOWN, the other is for disabling Tx BDRs after the rings are empty. Thanks.