Re: [Linaro-mm-sig] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2 - Linaro-mm-sig

18 Feb 2020

On Tue, Feb 18, 2020 at 9:17 PM Thomas Hellström (VMware)
thomas_os@shipmail.org wrote:
...
On 2/17/20 6:55 PM, Daniel Vetter wrote:
...
On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote:
...
Implement the importer side of unpinned DMA-buf handling.
v2: update page tables immediately
Signed-off-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 ++++++++++++++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  6 ++
  2 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 770baba621b3..48de7624d49c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
     return ERR_PTR(ret);
  }
+/**


amdgpu_dma_buf_move_notify - &attach.move_notify implementation







@attach: the DMA-buf attachment







Invalidate the DMA-buf attachment, making sure that the we re-create the



mapping before the next use.


*/

+static void
+amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
+{

struct drm_gem_object *obj = attach->importer_priv;
struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv);
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = { false, false };
struct ttm_placement placement = {};
struct amdgpu_vm_bo_base *bo_base;
int r;

if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
       return;



r = ttm_bo_validate(&bo->tbo, &placement, &ctx);
if (r) {
       DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r);


       return;


}

for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
       struct amdgpu_vm *vm = bo_base->vm;


       struct dma_resv *resv = vm->root.base.bo->tbo.base.resv;



       if (ticket) {



Yeah so this is kinda why I've been a total pain about the exact semantics
of the move_notify hook. I think we should flat-out require that importers
_always_ have a ticket attach when they call this, and that they can cope
with additional locks being taken (i.e. full EDEADLCK) handling.
Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to
the dma_resv object, which we then can take #ifdef
CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket).
Now the real disaster is how we handle deadlocks. Two issues:

Ideally we'd keep any lock we've taken locked until the end, it helps
 needless backoffs. I've played around a bit with that but not even poc
 level, just an idea:

https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27...
Idea is essentially to track a list of objects we had to lock as part of
   the ttm_bo_validate of the main object.

Second one is if we get a EDEADLCK on one of these sublocks (like the
 one here). We need to pass that up the entire callchain, including a
 temporary reference (we have to drop locks to do the ww_mutex_lock_slow
 call), and need a custom callback to drop that temporary reference
 (since that's all driver specific, might even be internal ww_mutex and
 not anything remotely looking like a normal dma_buf). This probably
 needs the exec util helpers from ttm, but at the dma_resv level, so that
 we can do something like this:

struct dma_resv_ticket {
      struct ww_acquire_ctx base;
  /* can be set by anyone (including other drivers) that got hold of
   * this ticket and had to acquire some new lock. This lock might
   * protect anything, including driver-internal stuff, and isn't
   * required to be a dma_buf or even just a dma_resv. */
  struct ww_mutex *contended_lock;

  /* callback which the driver (which might be a dma-buf exporter
   * and not matching the driver that started this locking ticket)
   * sets together with @contended_lock, for the main driver to drop
   * when it calls dma_resv_unlock on the contended_lock. */
  void (drop_ref*)(struct ww_mutex *contended_lock);

};
This is all supremely nasty (also ttm_bo_validate would need to be
improved to handle these sublocks and random new objects that could force
a ww_mutex_lock_slow).
Just a short comment on this:
Neither the currently used wait-die or the wound-wait algorithm
*strictly* requires a slow lock on the contended lock. For wait-die it's
just very convenient since it makes us sleep instead of spinning with
-EDEADLK on the contended lock. For wound-wait IIRC one could just
immediately restart the whole locking transaction after an -EDEADLK, and
the transaction would automatically end up waiting on the contended
lock, provided the mutex lock stealing is not allowed. There is however
a possibility that the transaction will be wounded again on another
lock, taken before the contended lock, but I think there are ways to
improve the wound-wait algorithm to reduce that probability.
So in short, choosing the wound-wait algorithm instead of wait-die and
perhaps modifying the ww mutex code somewhat would probably help passing
an -EDEADLK up the call chain without requiring passing the contended
lock, as long as each locker releases its own locks when receiving an
-EDEADLK.
Hm this is kinda tempting, since rolling out the full backoff tricker
across driver boundaries is going to be real painful.
What I'm kinda worried about is the debug/validation checks we're
losing with this. The required backoff has this nice property that
ww_mutex debug code can check that we've fully unwound everything when
we should, that we've blocked on the right lock, and that we're
restarting everything without keeling over. Without that I think we
could end up with situations where a driver in the middle feels like
handling the EDEADLCK, which might go well most of the times (the
deadlock will probably be mostly within a given driver, not across).
Right up to the point where someone creates a deadlock across drivers,
and the lack of full rollback will be felt.
So not sure whether we can still keep all these debug/validation
checks, or whether this is a step too far towards clever tricks.
But definitely a neat idea ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch