On Wed, Jun 04, 2025 at 03:57:57PM +0800, Miaoqing Pan wrote:
On 6/4/2025 3:06 PM, Johan Hovold wrote:
On Wed, Jun 04, 2025 at 01:32:08PM +0800, Miaoqing Pan wrote:
On 6/4/2025 10:34 AM, Miaoqing Pan wrote:
On 6/3/2025 7:51 PM, Johan Hovold wrote:
On Tue, Jun 03, 2025 at 06:52:37PM +0800, Baochen Qiang wrote:
The sequence is
1# reading HP srng->u.dst_ring.cached_hp = READ_ONCE(*srng- > u.dst_ring.hp_addr);
2# validate HP if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp) return NULL;
3# get desc desc = srng->ring_base_vaddr + srng->u.dst_ring.tp;
4# accessing desc ath11k_hal_desc_reo_parse_err(... desc, ...)
Clearly each step depends on the results of previous steps. In this case the compiler/CPU is expected to be smart enough to not do any reordering, isn't it?
Steps 3 and 4 can be done speculatively before the load in step 1 is complete as long as the result is discarded if it turns out not to be needed.
If the condition in step 2 is true and step 3 speculatively loads descriptor from TP before step 1, could this cause issues?
Sorry for typo, if the condition in step 2 is false and step 3 speculatively loads descriptor from TP before step 1, could this cause issues?
Almost correct; the descriptor can be loaded (from TP) before the head pointer is loaded and thus before the condition in step 2 has been evaluated. And if the condition in step 2 later turns out to be false, step 4 may use stale data from before the head pointer was updated.
Actually, there's a missing step between step 3 and step 4: TP+1.
TP+1: srng->u.dst_ring.tp += srng->entry_size
Sure, but that is not relevant for the issue at hand.
TP is managed by the CPU and points to the current first unprocessed descriptor, while HP and the descriptor are asynchronously updated by DMA. So are you saying that the descriptor obtained through speculative loading has not yet been updated, or is in the process of being updated?
Exactly.
Johan