On 6/2/2025 4:03 PM, Johan Hovold wrote:
On Thu, May 29, 2025 at 03:03:38PM +0800, Miaoqing Pan wrote:
On 5/26/2025 7:48 PM, Johan Hovold wrote:
Add the missing memory barriers to make sure that destination ring descriptors are read after the head pointers to avoid using stale data on weakly ordered architectures like aarch64.
@@ -3851,6 +3851,9 @@ int ath11k_dp_process_rx_err(struct ath11k_base *ab, struct napi_struct *napi, ath11k_hal_srng_access_begin(ab, srng);
- /* Make sure descriptor is read after the head pointer. */
- dma_rmb();
Thanks Johan, for continuing to follow up on this issue. I have some different opinions.
This change somewhat deviates from the fix approach described in https://lore.kernel.org/all/20250321095219.19369-1-johan+linaro@kernel.org/. In this case, the descriptor might be accessed before it is updated or while it is still being updated. Therefore, a dma_rmb() should be added after the call to ath11k_hal_srng_dst_get_next_entry() and before accessing ath11k_hal_ce_dst_status_get_length(), to ensure that the DMA has completed before reading the descriptor.
However, in this patch, the memory barrier is used to protect the head pointer (HP). I don't think a memory barrier is necessary for HP, because even if an outdated HP is fetched, ath11k_hal_srng_dst_get_next_entry() will return NULL and exit safely.
No, the barrier is needed between reading the head pointer and accessing descriptor fields, that's what matters.
You can still end up with reading stale descriptor data even when ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to speculation (that's what happens on the X13s).
The fact is that a dma_rmb() does not even prevent speculation, no matter where it is placed, right? If so the whole point of dma_rmb() is to prevent from compiler reordering or CPU reordering, but is it really possible?
The sequence is
1# reading HP srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
2# validate HP if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp) return NULL;
3# get desc desc = srng->ring_base_vaddr + srng->u.dst_ring.tp;
4# accessing desc ath11k_hal_desc_reo_parse_err(... desc, ...)
Clearly each step depends on the results of previous steps. In this case the compiler/CPU is expected to be smart enough to not do any reordering, isn't it?
Whether to place it before or after (or inside) ath11k_hal_srng_dst_get_next_entry() is a trade off between readability, maintainability and whether we want to avoid unnecessary barriers in cases like the above where we strictly only need one barrier before the loop (or if we want to avoid the barrier in case the ring is ever empty).
So, placing the memory barrier inside ath11k_hal_srng_dst_get_next_entry() would be more appropriate.
@@ -678,6 +678,8 @@ u32 *ath11k_hal_srng_dst_get_next_entry(struct ath11k_base *ab, if (srng->flags & HAL_SRNG_FLAGS_CACHED) ath11k_hal_srng_prefetch_desc(ab, srng);
dma_rmb();
}return desc;
So this will add a barrier in each iteration of the loop, but we only need a single one after reading the head pointer.
[ Also note that ath11k_hal_srng_dst_peek() would similarly need a barrier if we were to move them into those helpers. ]
Johan