On Tue, 2026-06-23 at 15:33 +0100, André Draszik wrote:
However, if my issue were to be solved with barriers, the test_and_set_bit() in dma_fence_signal_timestamp_locked() would have to be replaced with the more weakly ordered test_bit() and set_bit(), maybe creating other pitfalls.
For the avoidance of doubts, I'm not saying that all the issues you raised can be solved by barriers instead of appropriate locks (I don't know enough about the code and issues in general here).
I'm not saying that you're saying that. I'm just cautioning you that this change could be tricky.
I do think however that appropriate locks will fix the ordering issue highlighted by sashiko (i.e. +1 for your argument). Barriers would fix this specific issue, too, but that is not a statement about any wider issues.
The ordering issue in the get_*_name() functions plays into that. Setting the bit would then be done after setting the ops-pointer to NULL. So one would have to try to move the NULL set, too.
Long story short, this is painful and subtle.
But I think what we are realizing over and over again is that dma_fence has many subtleties to its API contract, and the implementation's sparring use of spinlocks leads to workarounds where people take locks manually or have to do an RCU dance.
Note that Christian is strongly opposed to guarding everything with locks, in part for supposedly occuring deadlocks in the fence callbacks when the driver needs to take its own locks.
ww_mutex could help against deadlocks, but might affect performance, in case these are all critical code paths (IDK),
You can't use sleepable locks in fences. They fire in interrupt context left and right ;)
Despite, that wouldn't even solve the reported problem.
The tl;dr is:
there is fence_ops->enable_signaling(), which is currently being called with the fence lock held. So the driver, in that callback, cannot take a driver-specific lock IF there is another driver party (like an IRQ) taking first the driver lock and then the fence lock.
Which is why Christian König wants to remove the fence lock being held in enable_signaling().
One reason why that, supposedly, is currently not a problem is that without fence->inline_lock, you can protect the fctx with the same lock and do fctx list manipulations in enable_signaling() with lock protection.
If you have a big bowl of popcorn available, you could checkout this thread:
https://lore.kernel.org/dri-devel/20260608142436.265820-2-phasta@kernel.org/
;p
My own thinking is: If everyone used inline_lock, and if we could rely on everyone being able to do the necessary work in enable_signaling() without said lock- inversion, then we could perfectly synchronize all actions related to dma_fence, including driver and, thus, fence_ops unload.
The only thing blocking really might be enable_signaling (the other callbacks already take the lock). The more difficult question would be how to implement that in a backwards compatible manner, i.e., for those who don't have inline_lock.
Another idea for the distant future might be to question the existence of those callbacks. Userspace often is sort of decoupled from the hardware fences through intermediate fences already.
The community discussion regarding that problem is currently in some sort of dead end, where none of us seems to know what the correct path forward is.
Please ignore if the following doesn't make sense, I'm just a bystander :-) How about at least adding the required barriers and related changes, and taking it from there? This would solve some immediate and easy to hit issues on Arm64? If they turn out to be insufficient, code can still be changed.
I am in support of that, which is why I posted that RFC for feedback about the appropriate memory barriers.
BTW, thanks Philipp for all these details, much appreciated.
You're welcome. If you'd find a clever solution, probably everyone would be happy.
P.
Cheers, A.
linaro-mm-sig@lists.linaro.org