On Wed, 2022-04-20 at 20:56 +0200, Christian König wrote:
⚠ External Email
Am 20.04.22 um 20:49 schrieb Christian König:
Am 20.04.22 um 20:41 schrieb Zack Rusin:
On Wed, 2022-04-20 at 19:40 +0200, Christian König wrote:
Am 20.04.22 um 19:38 schrieb Zack Rusin:
On Wed, 2022-04-20 at 09:37 +0200, Christian König wrote:
⚠ External Email
Hi Zack,
Am 20.04.22 um 05:56 schrieb Zack Rusin: > On Thu, 2022-04-07 at 10:59 +0200, Christian König wrote: > > Rework the internals of the dma_resv object to allow > > adding > > more > > than > > one > > write fence and remember for each fence what purpose it > > had. > > > > This allows removing the workaround from amdgpu which > > used a > > container > > for > > this instead. > > > > Signed-off-by: Christian König > > christian.koenig@amd.com > > Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch > > Cc: amd-gfx@lists.freedesktop.org > afaict this change broke vmwgfx which now kernel oops > right > after > boot. > I haven't had the time to look into it yet, so I'm not > sure > what's > the > problem. I'll look at this tomorrow, but just in case you > have > some > clues, the backtrace follows: that's a known issue and should already be fixed with:
commit d72dcbe9fce505228dae43bef9da8f2b707d1b3d Author: Christian König christian.koenig@amd.com Date: Mon Apr 11 15:21:59 2022 +0200
Unfortunately that doesn't seem to be it. The backtrace is from the current (as of the time of sending of this email) drm-misc- next, which has this change, so it's something else.
Ok, that's strange. In this case I need to investigate further.
Maybe VMWGFX is adding more than one fence and we actually need to reserve multiple slots.
This might be helper code issue with CONFIG_DEBUG_MUTEXES set. On that config dma_resv_reset_max_fences does: fences->max_fences = fences->num_fences; For some objects num_fences is 0 and so after max_fences and num_fences are both 0. And then BUG_ON(num_fences >= max_fences) is triggered.
Yeah, but that's expected behavior.
What's not expected is that max_fences is still 0 (or equal to old num_fences) when VMWGFX tries to add a new fence. The function ttm_eu_reserve_buffers() should have reserved at least one fence slot.
So the underlying problem is that either ttm_eu_reserve_buffers() was never called or VMWGFX tried to add more than one fence.
To figure out what it is could you try the following code fragment:
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c index f46891012be3..a36f89d3f36d 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c @@ -288,7 +288,7 @@ int vmw_validation_add_bo(struct vmw_validation_context *ctx, val_buf->bo = ttm_bo_get_unless_zero(&vbo->base); if (!val_buf->bo) return -ESRCH; - val_buf->num_shared = 0; + val_buf->num_shared = 16; list_add_tail(&val_buf->head, &ctx->bo_list); bo_node->as_mob = as_mob; bo_node->cpu_blit = cpu_blit;
Fails the same BUG_ON with num_fences and max_fences == 0.
z