On Thu, May 29, 2025 at 03:03:38PM +0800, Miaoqing Pan wrote:
On 5/26/2025 7:48 PM, Johan Hovold wrote:
Add the missing memory barriers to make sure that destination ring descriptors are read after the head pointers to avoid using stale data on weakly ordered architectures like aarch64.
@@ -3851,6 +3851,9 @@ int ath11k_dp_process_rx_err(struct ath11k_base *ab, struct napi_struct *napi, ath11k_hal_srng_access_begin(ab, srng);
- /* Make sure descriptor is read after the head pointer. */
- dma_rmb();
Thanks Johan, for continuing to follow up on this issue. I have some different opinions.
This change somewhat deviates from the fix approach described in https://lore.kernel.org/all/20250321095219.19369-1-johan+linaro@kernel.org/. In this case, the descriptor might be accessed before it is updated or while it is still being updated. Therefore, a dma_rmb() should be added after the call to ath11k_hal_srng_dst_get_next_entry() and before accessing ath11k_hal_ce_dst_status_get_length(), to ensure that the DMA has completed before reading the descriptor.
However, in this patch, the memory barrier is used to protect the head pointer (HP). I don't think a memory barrier is necessary for HP, because even if an outdated HP is fetched, ath11k_hal_srng_dst_get_next_entry() will return NULL and exit safely.
No, the barrier is needed between reading the head pointer and accessing descriptor fields, that's what matters.
You can still end up with reading stale descriptor data even when ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to speculation (that's what happens on the X13s).
Whether to place it before or after (or inside) ath11k_hal_srng_dst_get_next_entry() is a trade off between readability, maintainability and whether we want to avoid unnecessary barriers in cases like the above where we strictly only need one barrier before the loop (or if we want to avoid the barrier in case the ring is ever empty).
So, placing the memory barrier inside ath11k_hal_srng_dst_get_next_entry() would be more appropriate.
@@ -678,6 +678,8 @@ u32 *ath11k_hal_srng_dst_get_next_entry(struct ath11k_base *ab, if (srng->flags & HAL_SRNG_FLAGS_CACHED) ath11k_hal_srng_prefetch_desc(ab, srng);
dma_rmb();
}return desc;
So this will add a barrier in each iteration of the loop, but we only need a single one after reading the head pointer.
[ Also note that ath11k_hal_srng_dst_peek() would similarly need a barrier if we were to move them into those helpers. ]
Johan