On 6/3/2025 7:51 PM, Johan Hovold wrote:
On Tue, Jun 03, 2025 at 06:52:37PM +0800, Baochen Qiang wrote:
On 6/2/2025 4:03 PM, Johan Hovold wrote:
No, the barrier is needed between reading the head pointer and accessing descriptor fields, that's what matters.
You can still end up with reading stale descriptor data even when ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to speculation (that's what happens on the X13s).
The fact is that a dma_rmb() does not even prevent speculation, no matter where it is placed, right?
It prevents the speculated load from being used.
If so the whole point of dma_rmb() is to prevent from compiler reordering or CPU reordering, but is it really possible?
The sequence is
1# reading HP srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
2# validate HP if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp) return NULL;
3# get desc desc = srng->ring_base_vaddr + srng->u.dst_ring.tp;
4# accessing desc ath11k_hal_desc_reo_parse_err(... desc, ...)
Clearly each step depends on the results of previous steps. In this case the compiler/CPU is expected to be smart enough to not do any reordering, isn't it?
Steps 3 and 4 can be done speculatively before the load in step 1 is complete as long as the result is discarded if it turns out not to be needed.
If the condition in step 2 is true and step 3 speculatively loads descriptor from TP before step 1, could this cause issues?
We previously had extensive discussions on this topic in the https://lore.kernel.org/linux-wireless/ecfe850c-b263-4bee-b888-c34178e690fc@... thread. On my platform, dma_rmb() did not work as expected. The issue only disappeared after disabling PCIe endpoint relaxed ordering in firmware side. So it seems that HP was updated (Memory write) before descriptor (Memory write), which led to the problem.