[Linaro-mm-sig] Re: [PATCH 03/15] dma-buf & drm/amdgpu: remove dma_resv workaround

20 Apr 2022

On Wed, 2022-04-20 at 20:56 +0200, Christian König wrote:
...
⚠ External Email
Am 20.04.22 um 20:49 schrieb Christian König:
...
Am 20.04.22 um 20:41 schrieb Zack Rusin:
...
On Wed, 2022-04-20 at 19:40 +0200, Christian König wrote:
...
Am 20.04.22 um 19:38 schrieb Zack Rusin:
...
On Wed, 2022-04-20 at 09:37 +0200, Christian König wrote:
...
⚠ External Email
Hi Zack,
Am 20.04.22 um 05:56 schrieb Zack Rusin:
> On Thu, 2022-04-07 at 10:59 +0200, Christian König wrote:
> > Rework the internals of the dma_resv object to allow
> > adding
> > more
> > than
> > one
> > write fence and remember for each fence what purpose it
> > had.
> > 
> > This allows removing the workaround from amdgpu which
> > used a
> > container
> > for
> > this instead.
> > 
> > Signed-off-by: Christian König
> > christian.koenig@amd.com
> > Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
> > Cc: amd-gfx@lists.freedesktop.org
> afaict this change broke vmwgfx which now kernel oops
> right
> after
> boot.
> I haven't had the time to look into it yet, so I'm not
> sure
> what's
> the
> problem. I'll look at this tomorrow, but just in case you
> have
> some
> clues, the backtrace follows:
that's a known issue and should already be fixed with:
commit d72dcbe9fce505228dae43bef9da8f2b707d1b3d
Author: Christian König christian.koenig@amd.com
Date:   Mon Apr 11 15:21:59 2022 +0200
Unfortunately that doesn't seem to be it. The backtrace is
from the
current (as of the time of sending of this email) drm-misc-
next,
which
has this change, so it's something else.
Ok, that's strange. In this case I need to investigate further.
Maybe VMWGFX is adding more than one fence and we actually need
to
reserve multiple slots.
This might be helper code issue with CONFIG_DEBUG_MUTEXES set. On
that config
dma_resv_reset_max_fences does:
    fences->max_fences = fences->num_fences;
For some objects num_fences is 0 and so after max_fences and
num_fences are both 0.
And then BUG_ON(num_fences >= max_fences) is triggered.
Yeah, but that's expected behavior.
What's not expected is that max_fences is still 0 (or equal to old
num_fences) when VMWGFX tries to add a new fence. The function
ttm_eu_reserve_buffers() should have reserved at least one fence
slot.
So the underlying problem is that either ttm_eu_reserve_buffers()
was
never called or VMWGFX tried to add more than one fence.
To figure out what it is could you try the following code fragment:

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c
b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c
index f46891012be3..a36f89d3f36d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c
@@ -288,7 +288,7 @@ int vmw_validation_add_bo(struct
vmw_validation_context *ctx,
                 val_buf->bo = ttm_bo_get_unless_zero(&vbo->base);
                 if (!val_buf->bo)
                         return -ESRCH;
-               val_buf->num_shared = 0;
+               val_buf->num_shared = 16;
                 list_add_tail(&val_buf->head, &ctx->bo_list);
                 bo_node->as_mob = as_mob;
                 bo_node->cpu_blit = cpu_blit;
Fails the same BUG_ON with num_fences and max_fences == 0.
z

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[Linaro-mm-sig] Re: [PATCH 03/15] dma-buf & drm/amdgpu: remove dma_resv workaround