Am 09.04.25 um 14:56 schrieb Philipp Stanner:
On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote:
On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
Hi Philipp,
On Wed, 9 Apr 2025 14:06:37 +0200 Philipp Stanner phasta@kernel.org wrote:
dma_fence_is_signaled()'s name strongly reads as if this function were intended for checking whether a fence is already signaled. Also the boolean it returns hints at that.
The function's behavior, however, is more complex: it can check with a driver callback whether the hardware's sequence number indicates that the fence can already be treated as signaled, although the hardware's / driver's interrupt handler has not signaled it yet. If that's the case, the function also signals the fence.
(Presumably) this has caused a bug in Nouveau (unknown commit), where nouveau_fence_done() uses the function to check a fence, which causes a race.
Give the function a more obvious name.
This is just my personal view on this, but I find the new name just as confusing as the old one. It sounds like something is checked, but it's clear what, and then the fence is forcibly signaled like it would be if you call drm_fence_signal(). Of course, this clarified by the doc, but given the goal was to make the function name clearly reflect what it does, I'm not convinced it's significantly better.
Maybe dma_fence_check_hw_state_and_propagate(), though it might be too long of name. Oh well, feel free to ignore this comments if a majority is fine with the new name.
Yoa, the name isn't perfect (the perfect name describing the whole behavior would be dma_fence_check_if_already_signaled_then_check_hardware_state_and_pro pa gate() ^^'
My intention here is to have the reader realize "watch out, the fence might get signaled here!", which is probably the most important event regarding fences, which can race, invoke the callbacks and so on.
For details readers will then check the documentation.
But I'm of course open to see if there's a majority for this or that name.
how about:
dma_fence_check_hw_and_signal() ?
I don't think that renaming the function is a good idea in the first place.
What the function does internally is an implementation detail of the framework.
For the code using this function it's completely irrelevant if the function might also signal the fence, what matters for the caller is the returned status of the fence. I think this also counts for the dma_fence_is_signaled() documentation.
What we should improve is the documentation of the dma_fence_ops->enable_signaling and dma_fence_ops->signaled callbacks.
Especially see the comment about reference counts on enable_signaling which is missing on the signaled callback. That is most likely the root cause why nouveau implemented enable_signaling correctly but not the other one.
But putting that aside I think we should make nails with heads and let the framework guarantee that the fences stay alive until they are signaled (one way or another). This completely removes the burden to keep a reference on unsignaled fences from the drivers / implementations and make things more over all more defensive.
Regards, Christian.
P.
P.
Regards,
Boris