[SNIP]It could be, however, that at the same moment nouveau_fence_signal() is removing that entry, holding the appropriate lock. So we have a race. Again.
You see, fixing things in Nouveau is difficult :) It gets more difficult if you want to clean it up "properly", so it conforms to rules such as those from dma_fence. I have now provided two fixes that both work, but you are not satisfied with from the dma_fence-maintainer's perspective. I understand that, but please also understand that it's actually not my primary task to work on Nouveau. I just have to fix this bug to move on with my scheduler work.
So if you have another idea, feel free to share it. But I'd like to know how we can go on here.
I'm running out of ideas. What I'm wondering if we couldn't just remove performance hacky fastpath functions such as nouveau_fence_is_signaled() completely. It seems redundant to me.
Or we might add locking to it, but IDK what was achieved with RCU here. In any case it's definitely bad that Nouveau has so many redundant and half-redundant mechanisms.
P.P.Regards, Christian.P.Regards, Christian.Replace the call to dma_fence_is_signaled() with nouveau_fence_base_is_signaled(). Cc: <stable@vger.kernel.org> # 4.10+, precise commit not to be determined Signed-off-by: Philipp Stanner <phasta@kernel.org> --- drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 7cc84472cece..33535987d8ed 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -274,7 +274,7 @@ nouveau_fence_done(struct nouveau_fence *fence) nvif_event_block(&fctx->event); spin_unlock_irqrestore(&fctx->lock, flags); } - return dma_fence_is_signaled(&fence->base); + return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-base.flags);} static long