Hi Philipp,
On Tue, 2026-06-23 at 13:58 +0200, Philipp Stanner wrote:
On Tue, 2026-06-23 at 12:37 +0100, André Draszik wrote:
On Thu, 2026-06-18 at 17:56 +0200, Philipp Stanner wrote:
I continue to believe because of bugs like this and the ones I have quoted in the threads above the robustness of the kernel could be greatly improved if we could get dma_fence fully synchronized with its lock.
On top of that, sashiko highlighted (via my other patch) that the existing code is missing some memory barriers:
https://sashiko.dev/#/patchset/20260618-linux-drm_crtc_fix-v1-1-801f29c9853d...
I believe Lock synchronization would resolve that (as would adding explicit memory barriers).
That is being discussed in the thread I linked, where Gary lists which barriers you would need for (presumably correct) lockless magic.
Having read Gary's suggestion, that aligns with what I had in mind.
However, if my issue were to be solved with barriers, the test_and_set_bit() in dma_fence_signal_timestamp_locked() would have to be replaced with the more weakly ordered test_bit() and set_bit(), maybe creating other pitfalls.
For the avoidance of doubts, I'm not saying that all the issues you raised can be solved by barriers instead of appropriate locks (I don't know enough about the code and issues in general here).
I do think however that appropriate locks will fix the ordering issue highlighted by sashiko (i.e. +1 for your argument). Barriers would fix this specific issue, too, but that is not a statement about any wider issues.
The ordering issue in the get_*_name() functions plays into that. Setting the bit would then be done after setting the ops-pointer to NULL. So one would have to try to move the NULL set, too.
Long story short, this is painful and subtle.
But I think what we are realizing over and over again is that dma_fence has many subtleties to its API contract, and the implementation's sparring use of spinlocks leads to workarounds where people take locks manually or have to do an RCU dance.
Note that Christian is strongly opposed to guarding everything with locks, in part for supposedly occuring deadlocks in the fence callbacks when the driver needs to take its own locks.
ww_mutex could help against deadlocks, but might affect performance, in case these are all critical code paths (IDK),
The community discussion regarding that problem is currently in some sort of dead end, where none of us seems to know what the correct path forward is.
Please ignore if the following doesn't make sense, I'm just a bystander :-) How about at least adding the required barriers and related changes, and taking it from there? This would solve some immediate and easy to hit issues on Arm64? If they turn out to be insufficient, code can still be changed.
[...] My understanding of the current situation is that as an issuer of dma_fence's you, in general, should wait for a grace period until you perform operations like driver unload, or, more generally, have fence- related resources and such being accessed through callbacks go away.
If I understand correctly, simply waiting for a grace period in the driver's unbind should be the way to go.
Danilo ... Maybe he's got the time to share some details with you that are relevant to your work.
Will wait a little :-)
BTW, thanks Philipp for all these details, much appreciated.
Cheers, A.