From: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
Ensure TX descriptor type fields are written in a safe order so the DMA engine does not begin processing a chain before all descriptors are fully initialised.
For multi-descriptor transmissions the driver writes DT_FEND into the last descriptor and DT_FSTART into the first. The DMA engine starts processing when it sees DT_FSTART. If the compiler or CPU reorders the writes and publishes DT_FSTART before DT_FEND, the DMA can start early and process an incomplete chain, leading to corrupted transmissions or DMA errors.
Fix this by writing DT_FEND before the dma_wmb() barrier, executing dma_wmb() immediately before DT_FSTART (or DT_FSINGLE in the single descriptor case), and then adding a wmb() after the type updates to ensure CPU-side ordering before ringing the hardware doorbell.
On an RZ/G2L platform running an RT kernel, this reordering hazard was observed as TX stalls and timeouts:
[ 372.968431] NETDEV WATCHDOG: end0 (ravb): transmit queue 0 timed out [ 372.968494] WARNING: CPU: 0 PID: 10 at net/sched/sch_generic.c:467 dev_watchdog+0x4a4/0x4ac [ 373.969291] ravb 11c20000.ethernet end0: transmit timed out, status 00000000, resetting...
This change enforces the required ordering and prevents the DMA engine from observing DT_FSTART before the rest of the descriptor chain is valid.
Fixes: 2f45d1902acf ("ravb: minimize TX data copying") Cc: stable@vger.kernel.org Co-developed-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com --- drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index a200e205825a..2a995fa9bfff 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2211,15 +2211,19 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev)
skb_tx_timestamp(skb); } - /* Descriptor type must be set after all the above writes */ - dma_wmb(); + + /* For multi-descriptors set DT_FEND before calling dma_wmb() */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; - desc->die_dt = DT_FSTART; - } else { - desc->die_dt = DT_FSINGLE; } + + /* Descriptor type must be set after all the above writes */ + dma_wmb(); + desc->die_dt = (num_tx_desc > 1) ? DT_FSTART : DT_FSINGLE; + + /* Ensure data is written to RAM before initiating DMA transfer */ + wmb(); ravb_modify(ndev, TCCR, TCCR_TSRQ0 << q, TCCR_TSRQ0 << q);
priv->cur_tx[q] += num_tx_desc;
Hi Prabhakar,
Thanks for your work.
On 2025-10-15 16:00:26 +0100, Prabhakar wrote:
From: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
Ensure TX descriptor type fields are written in a safe order so the DMA engine does not begin processing a chain before all descriptors are fully initialised.
For multi-descriptor transmissions the driver writes DT_FEND into the last descriptor and DT_FSTART into the first. The DMA engine starts processing when it sees DT_FSTART. If the compiler or CPU reorders the writes and publishes DT_FSTART before DT_FEND, the DMA can start early and process an incomplete chain, leading to corrupted transmissions or DMA errors.
Fix this by writing DT_FEND before the dma_wmb() barrier, executing dma_wmb() immediately before DT_FSTART (or DT_FSINGLE in the single descriptor case), and then adding a wmb() after the type updates to ensure CPU-side ordering before ringing the hardware doorbell.
On an RZ/G2L platform running an RT kernel, this reordering hazard was observed as TX stalls and timeouts:
[ 372.968431] NETDEV WATCHDOG: end0 (ravb): transmit queue 0 timed out [ 372.968494] WARNING: CPU: 0 PID: 10 at net/sched/sch_generic.c:467 dev_watchdog+0x4a4/0x4ac [ 373.969291] ravb 11c20000.ethernet end0: transmit timed out, status 00000000, resetting...
This change enforces the required ordering and prevents the DMA engine from observing DT_FSTART before the rest of the descriptor chain is valid.
Fixes: 2f45d1902acf ("ravb: minimize TX data copying") Cc: stable@vger.kernel.org Co-developed-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index a200e205825a..2a995fa9bfff 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2211,15 +2211,19 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev) skb_tx_timestamp(skb); }
- /* Descriptor type must be set after all the above writes */
- dma_wmb();
- /* For multi-descriptors set DT_FEND before calling dma_wmb() */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--;
desc->die_dt = DT_FSTART;
- } else {
}desc->die_dt = DT_FSINGLE;
- /* Descriptor type must be set after all the above writes */
- dma_wmb();
- desc->die_dt = (num_tx_desc > 1) ? DT_FSTART : DT_FSINGLE;
IMHO it's ugly to evaluate num_tx_desc twice. I would rather just open code the full steps in each branch of the if above. It would make it easier to read and understand.
- /* Ensure data is written to RAM before initiating DMA transfer */
- wmb();
All of this looks a bit odd, why not just do a single dma_wmb() or wmb() before ringing the doorbell? Maybe I'm missing something obvious?
ravb_modify(ndev, TCCR, TCCR_TSRQ0 << q, TCCR_TSRQ0 << q); priv->cur_tx[q] += num_tx_desc; -- 2.43.0
Hi Niklas,
Thank you for the review.
On Wed, Oct 15, 2025 at 4:56 PM Niklas Söderlund niklas.soderlund@ragnatech.se wrote:
Hi Prabhakar,
Thanks for your work.
On 2025-10-15 16:00:26 +0100, Prabhakar wrote:
From: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
Ensure TX descriptor type fields are written in a safe order so the DMA engine does not begin processing a chain before all descriptors are fully initialised.
For multi-descriptor transmissions the driver writes DT_FEND into the last descriptor and DT_FSTART into the first. The DMA engine starts processing when it sees DT_FSTART. If the compiler or CPU reorders the writes and publishes DT_FSTART before DT_FEND, the DMA can start early and process an incomplete chain, leading to corrupted transmissions or DMA errors.
Fix this by writing DT_FEND before the dma_wmb() barrier, executing dma_wmb() immediately before DT_FSTART (or DT_FSINGLE in the single descriptor case), and then adding a wmb() after the type updates to ensure CPU-side ordering before ringing the hardware doorbell.
On an RZ/G2L platform running an RT kernel, this reordering hazard was observed as TX stalls and timeouts:
[ 372.968431] NETDEV WATCHDOG: end0 (ravb): transmit queue 0 timed out [ 372.968494] WARNING: CPU: 0 PID: 10 at net/sched/sch_generic.c:467 dev_watchdog+0x4a4/0x4ac [ 373.969291] ravb 11c20000.ethernet end0: transmit timed out, status 00000000, resetting...
This change enforces the required ordering and prevents the DMA engine from observing DT_FSTART before the rest of the descriptor chain is valid.
Fixes: 2f45d1902acf ("ravb: minimize TX data copying") Cc: stable@vger.kernel.org Co-developed-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index a200e205825a..2a995fa9bfff 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2211,15 +2211,19 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev)
skb_tx_timestamp(skb); }
/* Descriptor type must be set after all the above writes */
dma_wmb();
/* For multi-descriptors set DT_FEND before calling dma_wmb() */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--;
desc->die_dt = DT_FSTART;
} else {
desc->die_dt = DT_FSINGLE; }
/* Descriptor type must be set after all the above writes */
dma_wmb();
desc->die_dt = (num_tx_desc > 1) ? DT_FSTART : DT_FSINGLE;
IMHO it's ugly to evaluate num_tx_desc twice. I would rather just open code the full steps in each branch of the if above. It would make it easier to read and understand.
I did this just to avoid compiler optimizations. With the previous similar code on 5.10 CIP RT it was observed that the compiler optimized code in such a way that the DT_FSTART was written first before DT_FEND while the DMA was active because of which we ran into DMA issues causing QEF errors.
/* Ensure data is written to RAM before initiating DMA transfer */
wmb();
All of this looks a bit odd, why not just do a single dma_wmb() or wmb() before ringing the doorbell? Maybe I'm missing something obvious?
This wmb() was mainly added to ensure all the descriptor data is in RAM. The HW manual for RZ/G1/2, R-Car Gen1/2 and RZ/G2L family mentions that we need to read back the last written descriptor before triggering the DMA. Please let me know if you think this can be handled differently.
Cheers, Prabhakar
ravb_modify(ndev, TCCR, TCCR_TSRQ0 << q, TCCR_TSRQ0 << q); priv->cur_tx[q] += num_tx_desc;
-- 2.43.0
-- Kind Regards, Niklas Söderlund
Hello,
On 2025-10-15 18:01:13 +0100, Lad, Prabhakar wrote:
Hi Niklas,
Thank you for the review.
On Wed, Oct 15, 2025 at 4:56 PM Niklas Söderlund niklas.soderlund@ragnatech.se wrote:
Hi Prabhakar,
Thanks for your work.
On 2025-10-15 16:00:26 +0100, Prabhakar wrote:
From: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
Ensure TX descriptor type fields are written in a safe order so the DMA engine does not begin processing a chain before all descriptors are fully initialised.
For multi-descriptor transmissions the driver writes DT_FEND into the last descriptor and DT_FSTART into the first. The DMA engine starts processing when it sees DT_FSTART. If the compiler or CPU reorders the writes and publishes DT_FSTART before DT_FEND, the DMA can start early and process an incomplete chain, leading to corrupted transmissions or DMA errors.
Fix this by writing DT_FEND before the dma_wmb() barrier, executing dma_wmb() immediately before DT_FSTART (or DT_FSINGLE in the single descriptor case), and then adding a wmb() after the type updates to ensure CPU-side ordering before ringing the hardware doorbell.
On an RZ/G2L platform running an RT kernel, this reordering hazard was observed as TX stalls and timeouts:
[ 372.968431] NETDEV WATCHDOG: end0 (ravb): transmit queue 0 timed out [ 372.968494] WARNING: CPU: 0 PID: 10 at net/sched/sch_generic.c:467 dev_watchdog+0x4a4/0x4ac [ 373.969291] ravb 11c20000.ethernet end0: transmit timed out, status 00000000, resetting...
This change enforces the required ordering and prevents the DMA engine from observing DT_FSTART before the rest of the descriptor chain is valid.
Fixes: 2f45d1902acf ("ravb: minimize TX data copying") Cc: stable@vger.kernel.org Co-developed-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index a200e205825a..2a995fa9bfff 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2211,15 +2211,19 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev)
skb_tx_timestamp(skb); }
/* Descriptor type must be set after all the above writes */
dma_wmb();
/* For multi-descriptors set DT_FEND before calling dma_wmb() */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--;
desc->die_dt = DT_FSTART;
} else {
desc->die_dt = DT_FSINGLE; }
/* Descriptor type must be set after all the above writes */
dma_wmb();
desc->die_dt = (num_tx_desc > 1) ? DT_FSTART : DT_FSINGLE;
IMHO it's ugly to evaluate num_tx_desc twice. I would rather just open code the full steps in each branch of the if above. It would make it easier to read and understand.
I did this just to avoid compiler optimizations. With the previous similar code on 5.10 CIP RT it was observed that the compiler optimized code in such a way that the DT_FSTART was written first before DT_FEND while the DMA was active because of which we ran into DMA issues causing QEF errors.
I was thinking of something like
/* Descriptor type must be set after all the above writes */ dma_wmb();
if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; /* For multi-descriptors ensure DT_FEND before start * TODO: Add explanation about compiler optimised code etc. */ dma_wmb(); desc->die_dt = DT_FSTART; } else { desc->die_dt = DT_FSINGLE; }
Would make new new special case clearer to understand. And if we figure out different way of doing it it's very clear why the second dma_wmb() is needed. But after writing it out I'm not so sure anymore, maybe adding a temporary variable instead would make it a clearer read.
u8 die_dt = DT_FSINGLE;
/* For multi-descriptors ensure DT_FEND before DT_FEND * TODO: Add explanation about compiler optimised code etc. */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; die_dt = DT_FSTART; }
/* Descriptor type must be set after all the above writes */ dma_wmb(); desc->die_dt = die_dt;
I think my main complaint is that evaluating num_tx_desc > 1 multiple times makes the code read as stuff was just thrown into the driver until a test-case passed without understanding the root cause.
/* Ensure data is written to RAM before initiating DMA transfer */
wmb();
All of this looks a bit odd, why not just do a single dma_wmb() or wmb() before ringing the doorbell? Maybe I'm missing something obvious?
This wmb() was mainly added to ensure all the descriptor data is in RAM. The HW manual for RZ/G1/2, R-Car Gen1/2 and RZ/G2L family mentions that we need to read back the last written descriptor before triggering the DMA. Please let me know if you think this can be handled differently.
Have you observed any issues without the wmb() here?
What I'm trying to understand is why a new barrier is needed here when it have worked without one before. I'm likely just slow but what I'm trying to grasp is the need for the intermittent dma_wmb() above in relation to this one.
Should it not be, setup the descriptors, barrier to ensure it's in RAM, ring doorbell. The staggered method of setup descriptors, barrier, setup more descriptors, barrier, ring doorbell is what confuses me, I think :-)
Cheers, Prabhakar
ravb_modify(ndev, TCCR, TCCR_TSRQ0 << q, TCCR_TSRQ0 << q); priv->cur_tx[q] += num_tx_desc;
-- 2.43.0
-- Kind Regards, Niklas Söderlund
Hi Niklas,
Thanks for your feedback!
From: Niklas Söderlund niklas.soderlund@ragnatech.se Sent: 15 October 2025 18:28 To: Lad, Prabhakar prabhakar.csengg@gmail.com Cc: Paul Barker paul@pbarker.dev; Andrew Lunn andrew+netdev@lunn.ch; David S. Miller davem@davemloft.net; Eric Dumazet edumazet@google.com; Jakub Kicinski kuba@kernel.org; Paolo Abeni pabeni@redhat.com; Sergei Shtylyov sergei.shtylyov@cogentembedded.com; netdev@vger.kernel.org; linux-renesas-soc@vger.kernel.org; linux-kernel@vger.kernel.org; Biju Das biju.das.jz@bp.renesas.com; Fabrizio Castro fabrizio.castro.jz@renesas.com; Prabhakar Mahadev Lad prabhakar.mahadev-lad.rj@bp.renesas.com; stable@vger.kernel.org Subject: Re: [PATCH 3/3] net: ravb: Enforce descriptor type ordering to prevent early DMA start
Hello,
On 2025-10-15 18:01:13 +0100, Lad, Prabhakar wrote:
Hi Niklas,
Thank you for the review.
On Wed, Oct 15, 2025 at 4:56 PM Niklas Söderlund niklas.soderlund@ragnatech.se wrote:
Hi Prabhakar,
Thanks for your work.
On 2025-10-15 16:00:26 +0100, Prabhakar wrote:
From: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
Ensure TX descriptor type fields are written in a safe order so the DMA engine does not begin processing a chain before all descriptors are fully initialised.
For multi-descriptor transmissions the driver writes DT_FEND into the last descriptor and DT_FSTART into the first. The DMA engine starts processing when it sees DT_FSTART. If the compiler or CPU reorders the writes and publishes DT_FSTART before DT_FEND, the DMA can start early and process an incomplete chain, leading to corrupted transmissions or DMA errors.
Fix this by writing DT_FEND before the dma_wmb() barrier, executing dma_wmb() immediately before DT_FSTART (or DT_FSINGLE in the single descriptor case), and then adding a wmb() after the type updates to ensure CPU-side ordering before ringing the hardware doorbell.
On an RZ/G2L platform running an RT kernel, this reordering hazard was observed as TX stalls and timeouts:
[ 372.968431] NETDEV WATCHDOG: end0 (ravb): transmit queue 0 timed out [ 372.968494] WARNING: CPU: 0 PID: 10 at net/sched/sch_generic.c:467 dev_watchdog+0x4a4/0x4ac [ 373.969291] ravb 11c20000.ethernet end0: transmit timed out, status 00000000, resetting...
This change enforces the required ordering and prevents the DMA engine from observing DT_FSTART before the rest of the descriptor chain is valid.
Fixes: 2f45d1902acf ("ravb: minimize TX data copying") Cc: stable@vger.kernel.org Co-developed-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index a200e205825a..2a995fa9bfff 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2211,15 +2211,19 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device
*ndev)
skb_tx_timestamp(skb); }
/* Descriptor type must be set after all the above writes */
dma_wmb();
/* For multi-descriptors set DT_FEND before calling dma_wmb() */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--;
desc->die_dt = DT_FSTART;
} else {
desc->die_dt = DT_FSINGLE; }
/* Descriptor type must be set after all the above writes */
dma_wmb();
desc->die_dt = (num_tx_desc > 1) ? DT_FSTART : DT_FSINGLE;
IMHO it's ugly to evaluate num_tx_desc twice. I would rather just open code the full steps in each branch of the if above. It would make it easier to read and understand.
I did this just to avoid compiler optimizations. With the previous similar code on 5.10 CIP RT it was observed that the compiler optimized code in such a way that the DT_FSTART was written first before DT_FEND while the DMA was active because of which we ran into DMA issues causing QEF errors.
I was thinking of something like
/* Descriptor type must be set after all the above writes */ dma_wmb();
if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; /* For multi-descriptors ensure DT_FEND before start * TODO: Add explanation about compiler optimised code etc. */ dma_wmb(); desc->die_dt = DT_FSTART; } else { desc->die_dt = DT_FSINGLE; }
Would make new new special case clearer to understand. And if we figure out different way of doing it it's very clear why the second dma_wmb() is needed. But after writing it out I'm not so sure anymore, maybe adding a temporary variable instead would make it a clearer read.
u8 die_dt = DT_FSINGLE; /* For multi-descriptors ensure DT_FEND before DT_FEND * TODO: Add explanation about compiler optimised code etc. */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; die_dt = DT_FSTART; } /* Descriptor type must be set after all the above writes */ dma_wmb(); desc->die_dt = die_dt;
I think my main complaint is that evaluating num_tx_desc > 1 multiple times makes the code read as stuff was just thrown into the driver until a test-case passed without understanding the root cause.
What about the below instead?
if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; /* When using multi-descriptors, DT_FEND needs to get written * before DT_FSTART, but the compiler may reorder the memory * writes in an attempt to optimize the code. * Use a dma_wmb() barrier to make sure DT_FEND and DT_FSTART * are written exactly in the order shown in the code. */ dma_wmb(); desc->die_dt = DT_FSTART; } else { /* The descriptor type must be set after all the previous writes */ dma_wmb(); desc->die_dt = DT_FSINGLE; }
/* Ensure data is written to RAM before initiating DMA transfer */
wmb();
All of this looks a bit odd, why not just do a single dma_wmb() or wmb() before ringing the doorbell? Maybe I'm missing something obvious?
This wmb() was mainly added to ensure all the descriptor data is in RAM. The HW manual for RZ/G1/2, R-Car Gen1/2 and RZ/G2L family mentions that we need to read back the last written descriptor before triggering the DMA. Please let me know if you think this can be handled differently.
Have you observed any issues without the wmb() here?
No, we haven't. We have added it because after further discussions with the design team, it became clear that the best thing to do is to wait for all the previous memory writes to have completed fully, alongside cache, branch prediction, and other operations, so that when we ring the DMA bell everything is in the right place.
As Prabhakar pointed out, the manual refers to a `read` operation to be used as a delay, but a wmb() barrier is more accurate and its purposes is clearer than a random memory read.
Having said that, we haven't noticed any issues without it in practice, but having it might prevent issues going forward.
The dma_wmb() barrier is still needed for cases where the DMA engine is already active, therefore writing the descriptor(s) at the wrong time can lead to error conditions (which we have managed to see, and which this patch addresses for multi-descriptor cases).
Perhaps this patch should be split into two patches: 1. to address the error we have seen, for use cases where the DMA engine is already running (this is regarding dma_wmb()) 2. to address the use cases where the DMA engine is not running yet, and we want to avoid the possibility of ringing the bell when things are not 100% ready.
What do you think?
Cheers, Fab
What I'm trying to understand is why a new barrier is needed here when it have worked without one before. I'm likely just slow but what I'm trying to grasp is the need for the intermittent dma_wmb() above in relation to this one.
Should it not be, setup the descriptors, barrier to ensure it's in RAM, ring doorbell. The staggered method of setup descriptors, barrier, setup more descriptors, barrier, ring doorbell is what confuses me, I think :-)
Cheers, Prabhakar
ravb_modify(ndev, TCCR, TCCR_TSRQ0 << q, TCCR_TSRQ0 << q); priv->cur_tx[q] += num_tx_desc;
-- 2.43.0
-- Kind Regards, Niklas Söderlund
-- Kind Regards, Niklas Söderlund
Hi Fabrizio,
On 2025-10-16 12:00:29 +0000, Fabrizio Castro wrote:
Hi Niklas,
Thanks for your feedback!
From: Niklas Söderlund niklas.soderlund@ragnatech.se Sent: 15 October 2025 18:28 To: Lad, Prabhakar prabhakar.csengg@gmail.com Cc: Paul Barker paul@pbarker.dev; Andrew Lunn andrew+netdev@lunn.ch; David S. Miller davem@davemloft.net; Eric Dumazet edumazet@google.com; Jakub Kicinski kuba@kernel.org; Paolo Abeni pabeni@redhat.com; Sergei Shtylyov sergei.shtylyov@cogentembedded.com; netdev@vger.kernel.org; linux-renesas-soc@vger.kernel.org; linux-kernel@vger.kernel.org; Biju Das biju.das.jz@bp.renesas.com; Fabrizio Castro fabrizio.castro.jz@renesas.com; Prabhakar Mahadev Lad prabhakar.mahadev-lad.rj@bp.renesas.com; stable@vger.kernel.org Subject: Re: [PATCH 3/3] net: ravb: Enforce descriptor type ordering to prevent early DMA start
Hello,
On 2025-10-15 18:01:13 +0100, Lad, Prabhakar wrote:
Hi Niklas,
Thank you for the review.
On Wed, Oct 15, 2025 at 4:56 PM Niklas Söderlund niklas.soderlund@ragnatech.se wrote:
Hi Prabhakar,
Thanks for your work.
On 2025-10-15 16:00:26 +0100, Prabhakar wrote:
From: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
Ensure TX descriptor type fields are written in a safe order so the DMA engine does not begin processing a chain before all descriptors are fully initialised.
For multi-descriptor transmissions the driver writes DT_FEND into the last descriptor and DT_FSTART into the first. The DMA engine starts processing when it sees DT_FSTART. If the compiler or CPU reorders the writes and publishes DT_FSTART before DT_FEND, the DMA can start early and process an incomplete chain, leading to corrupted transmissions or DMA errors.
Fix this by writing DT_FEND before the dma_wmb() barrier, executing dma_wmb() immediately before DT_FSTART (or DT_FSINGLE in the single descriptor case), and then adding a wmb() after the type updates to ensure CPU-side ordering before ringing the hardware doorbell.
On an RZ/G2L platform running an RT kernel, this reordering hazard was observed as TX stalls and timeouts:
[ 372.968431] NETDEV WATCHDOG: end0 (ravb): transmit queue 0 timed out [ 372.968494] WARNING: CPU: 0 PID: 10 at net/sched/sch_generic.c:467 dev_watchdog+0x4a4/0x4ac [ 373.969291] ravb 11c20000.ethernet end0: transmit timed out, status 00000000, resetting...
This change enforces the required ordering and prevents the DMA engine from observing DT_FSTART before the rest of the descriptor chain is valid.
Fixes: 2f45d1902acf ("ravb: minimize TX data copying") Cc: stable@vger.kernel.org Co-developed-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Fabrizio Castro fabrizio.castro.jz@renesas.com Signed-off-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com
drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index a200e205825a..2a995fa9bfff 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2211,15 +2211,19 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device
*ndev)
skb_tx_timestamp(skb); }
/* Descriptor type must be set after all the above writes */
dma_wmb();
/* For multi-descriptors set DT_FEND before calling dma_wmb() */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--;
desc->die_dt = DT_FSTART;
} else {
desc->die_dt = DT_FSINGLE; }
/* Descriptor type must be set after all the above writes */
dma_wmb();
desc->die_dt = (num_tx_desc > 1) ? DT_FSTART : DT_FSINGLE;
IMHO it's ugly to evaluate num_tx_desc twice. I would rather just open code the full steps in each branch of the if above. It would make it easier to read and understand.
I did this just to avoid compiler optimizations. With the previous similar code on 5.10 CIP RT it was observed that the compiler optimized code in such a way that the DT_FSTART was written first before DT_FEND while the DMA was active because of which we ran into DMA issues causing QEF errors.
I was thinking of something like
/* Descriptor type must be set after all the above writes */ dma_wmb();
if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; /* For multi-descriptors ensure DT_FEND before start * TODO: Add explanation about compiler optimised code etc. */ dma_wmb(); desc->die_dt = DT_FSTART; } else { desc->die_dt = DT_FSINGLE; }
Would make new new special case clearer to understand. And if we figure out different way of doing it it's very clear why the second dma_wmb() is needed. But after writing it out I'm not so sure anymore, maybe adding a temporary variable instead would make it a clearer read.
u8 die_dt = DT_FSINGLE; /* For multi-descriptors ensure DT_FEND before DT_FEND * TODO: Add explanation about compiler optimised code etc. */ if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; die_dt = DT_FSTART; } /* Descriptor type must be set after all the above writes */ dma_wmb(); desc->die_dt = die_dt;
I think my main complaint is that evaluating num_tx_desc > 1 multiple times makes the code read as stuff was just thrown into the driver until a test-case passed without understanding the root cause.
What about the below instead?
if (num_tx_desc > 1) { desc->die_dt = DT_FEND; desc--; /* When using multi-descriptors, DT_FEND needs to get written * before DT_FSTART, but the compiler may reorder the memory * writes in an attempt to optimize the code. * Use a dma_wmb() barrier to make sure DT_FEND and DT_FSTART * are written exactly in the order shown in the code. */ dma_wmb(); desc->die_dt = DT_FSTART; } else { /* The descriptor type must be set after all the previous writes */ dma_wmb(); desc->die_dt = DT_FSINGLE; }
I like this. It makes the two conditions very clear. Maybe extend the first comment a bit with the information you add below. That the important thing is that this protects for cases where the DMA engine is already running, and it's very important that it do not see a DT_FSTART before a DT_FEND is already committed, else it can run amuck. That would make the intent super clear.
/* Ensure data is written to RAM before initiating DMA transfer */
wmb();
All of this looks a bit odd, why not just do a single dma_wmb() or wmb() before ringing the doorbell? Maybe I'm missing something obvious?
This wmb() was mainly added to ensure all the descriptor data is in RAM. The HW manual for RZ/G1/2, R-Car Gen1/2 and RZ/G2L family mentions that we need to read back the last written descriptor before triggering the DMA. Please let me know if you think this can be handled differently.
Have you observed any issues without the wmb() here?
No, we haven't. We have added it because after further discussions with the design team, it became clear that the best thing to do is to wait for all the previous memory writes to have completed fully, alongside cache, branch prediction, and other operations, so that when we ring the DMA bell everything is in the right place.
As Prabhakar pointed out, the manual refers to a `read` operation to be used as a delay, but a wmb() barrier is more accurate and its purposes is clearer than a random memory read.
Having said that, we haven't noticed any issues without it in practice, but having it might prevent issues going forward.
The dma_wmb() barrier is still needed for cases where the DMA engine is already active, therefore writing the descriptor(s) at the wrong time can lead to error conditions (which we have managed to see, and which this patch addresses for multi-descriptor cases).
Perhaps this patch should be split into two patches:
- to address the error we have seen, for use cases where the DMA engine is already running (this is regarding dma_wmb())
- to address the use cases where the DMA engine is not running yet, and we want to avoid the possibility of ringing the bell when things are not 100% ready.
What do you think?
More and smaller patches are always a good idea IMHO, makes bisecting stuff so much easier.
I still have my doubts about the usefulness of adding a wmb() here. Maybe beacause I'm also confused about why it's a wmb() and not a dma_wmb(), are not only trying to make sure the DMA is not started before we know the last desc->die_dt = {DT_FSTART,DT_FSINGLE} have would be visible?
The usefulness of that I can imagine. As if the DMA engine is not running and we start it and that write is not visible we could delay sending until the doorbell is rang again.
Cheers, Fab
What I'm trying to understand is why a new barrier is needed here when it have worked without one before. I'm likely just slow but what I'm trying to grasp is the need for the intermittent dma_wmb() above in relation to this one.
Should it not be, setup the descriptors, barrier to ensure it's in RAM, ring doorbell. The staggered method of setup descriptors, barrier, setup more descriptors, barrier, ring doorbell is what confuses me, I think :-)
Cheers, Prabhakar
ravb_modify(ndev, TCCR, TCCR_TSRQ0 << q, TCCR_TSRQ0 << q); priv->cur_tx[q] += num_tx_desc;
-- 2.43.0
-- Kind Regards, Niklas Söderlund
-- Kind Regards, Niklas Söderlund
linux-stable-mirror@lists.linaro.org