On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote:
Implement the importer side of unpinned DMA-buf handling.
v2: update page tables immediately
Signed-off-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 ++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 2 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 770baba621b3..48de7624d49c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) return ERR_PTR(ret); } +/**
- amdgpu_dma_buf_move_notify - &attach.move_notify implementation
- @attach: the DMA-buf attachment
- Invalidate the DMA-buf attachment, making sure that the we re-create the
- mapping before the next use.
- */
+static void +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{
- struct drm_gem_object *obj = attach->importer_priv;
- struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv);
- struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
- struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
- struct ttm_operation_ctx ctx = { false, false };
- struct ttm_placement placement = {};
- struct amdgpu_vm_bo_base *bo_base;
- int r;
- if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
return;
- r = ttm_bo_validate(&bo->tbo, &placement, &ctx);
- if (r) {
DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r);
return;
- }
- for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
struct amdgpu_vm *vm = bo_base->vm;
struct dma_resv *resv = vm->root.base.bo->tbo.base.resv;
if (ticket) {
Yeah so this is kinda why I've been a total pain about the exact semantics of the move_notify hook. I think we should flat-out require that importers _always_ have a ticket attach when they call this, and that they can cope with additional locks being taken (i.e. full EDEADLCK) handling.
Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to the dma_resv object, which we then can take #ifdef CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket).
Now the real disaster is how we handle deadlocks. Two issues:
- Ideally we'd keep any lock we've taken locked until the end, it helps needless backoffs. I've played around a bit with that but not even poc level, just an idea:
https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27...
Idea is essentially to track a list of objects we had to lock as part of the ttm_bo_validate of the main object.
- Second one is if we get a EDEADLCK on one of these sublocks (like the one here). We need to pass that up the entire callchain, including a temporary reference (we have to drop locks to do the ww_mutex_lock_slow call), and need a custom callback to drop that temporary reference (since that's all driver specific, might even be internal ww_mutex and not anything remotely looking like a normal dma_buf). This probably needs the exec util helpers from ttm, but at the dma_resv level, so that we can do something like this:
struct dma_resv_ticket { struct ww_acquire_ctx base;
/* can be set by anyone (including other drivers) that got hold of * this ticket and had to acquire some new lock. This lock might * protect anything, including driver-internal stuff, and isn't * required to be a dma_buf or even just a dma_resv. */ struct ww_mutex *contended_lock;
/* callback which the driver (which might be a dma-buf exporter * and not matching the driver that started this locking ticket) * sets together with @contended_lock, for the main driver to drop * when it calls dma_resv_unlock on the contended_lock. */ void (drop_ref*)(struct ww_mutex *contended_lock); };
This is all supremely nasty (also ttm_bo_validate would need to be improved to handle these sublocks and random new objects that could force a ww_mutex_lock_slow).
Plan B would be to throw our hands into and declare that "move_notify is best effort only and can fail for any reason". Exactly like ttm eviction currently does, even with all your hacks to do at least some dma_resv_lock (but not the full slowpath).
Given how much "fun" you have with all the low memory handling and ttm fundamentally being best-effort only (despite that dma_resv would allow us to do this right, with some work) I'm not sure that's a good idea to extend to a cross-driver interface. Personally I'd lean towards fixing this first fully (in ttm/amdgpu), and then using that to implement move_notify correctly.
Or just add an int return value here and mandate that importers must handle eviction failures. Exactly like ttm_mem_evict_first can currently still fail for various reasons.
(Sorry this isn't exactly the mail you hoped for)
Cheers, Daniel
/* When we get an error here it means that somebody
* else is holding the VM lock and updating page tables
* So we can just continue here.
*/
r = dma_resv_lock(resv, ticket);
if (r)
continue;
} else {
/* TODO: This is more problematic and we actually need
* to allow page tables updates without holding the
* lock.
*/
if (!dma_resv_trylock(resv))
continue;
}
r = amdgpu_vm_clear_freed(adev, vm, NULL);
if (!r)
r = amdgpu_vm_handle_moved(adev, vm);
if (r && r != -EBUSY)
DRM_ERROR("Failed to invalidate VM page tables (%d))\n",
r);
dma_resv_unlock(resv);
- }
+}
static const struct dma_buf_attach_ops amdgpu_dma_buf_attach_ops = {
- .move_notify = amdgpu_dma_buf_move_notify
}; /** @@ -489,7 +553,7 @@ struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev, return obj; attach = dma_buf_dynamic_attach(dma_buf, dev->dev,
&amdgpu_dma_buf_attach_ops, NULL);
if (IS_ERR(attach)) { drm_gem_object_put(obj); return ERR_CAST(attach);&amdgpu_dma_buf_attach_ops, obj);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8ae260822908..8c480c898b0d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -926,6 +926,9 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, return 0; }
- if (bo->tbo.base.import_attach)
dma_buf_pin(bo->tbo.base.import_attach);
- bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; /* force to pin into visible video ram */ if (!(bo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS))
@@ -1009,6 +1012,9 @@ int amdgpu_bo_unpin(struct amdgpu_bo *bo) amdgpu_bo_subtract_pin_size(bo);
- if (bo->tbo.base.import_attach)
dma_buf_unpin(bo->tbo.base.import_attach);
- for (i = 0; i < bo->placement.num_placement; i++) { bo->placements[i].lpfn = 0; bo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT;
-- 2.17.1