i915 does tons of allocations from this worker, which lockdep catches.
Also generic infrastructure like this with big potential for how
dma_fence or other cross driver contracts work, really should be
reviewed on dri-devel. Implementing custom wheels for everything
within the driver is a classic case of "platform problem" [1]. Which in
upstream we really shouldn't have.
Since there's no quick way to solve these splats (dma_fence_work is
used a bunch in basic buffer management and command submission) like
for amdgpu, I'm giving up at this point here. Annotating i915
scheduler and gpu reset could would be interesting, but since lockdep
is one-shot we can't see what surprises would lurk there.
1: https://lwn.net/Articles/443531/
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/i915/i915_sw_fence_work.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index a3a81bb8f2c3..5b74acadaef5 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -17,12 +17,15 @@ static void fence_work(struct work_struct *work)
{
struct dma_fence_work *f = container_of(work, typeof(*f), work);
int err;
+ bool fence_cookie;
+ fence_cookie = dma_fence_begin_signalling();
err = f->ops->work(f);
if (err)
dma_fence_set_error(&f->dma, err);
fence_complete(f);
+ dma_fence_end_signalling(fence_cookie);
dma_fence_put(&f->dma);
}
--
2.26.2
In the face of unpriviledged userspace being able to submit bogus gpu
workloads the kernel needs gpu timeout and reset (tdr) to guarantee
that dma_fences actually complete. Annotate this worker to make sure
we don't have any accidental locking inversions or other problems
lurking.
Originally this was part of the overall scheduler annotation patch.
But amdgpu has some glorious inversions here:
- grabs console_lock
- does a full modeset, which grabs all kinds of locks
(drm_modeset_lock, dma_resv_lock) which can deadlock with
dma_fence_wait held inside them.
- almost minor at that point, but the modeset code also allocates
memory
These all look like they'll be very hard to fix properly, the hardware
seems to require a full display reset with any gpu recovery.
Hence split out as a seperate patch.
Since amdgpu isn't the only hardware driver that needs to reset the
display (at least gen2/3 on intel have the same problem) we need a
generic solution for this. There's two tricks we could still from
drm/i915 and lift to dma-fence:
- The big whack, aka force-complete all fences. i915 does this for all
pending jobs if the reset is somehow stuck. Trouble is we'd need to
do this for all fences in the entire system, and just the
book-keeping for that will be fun. Plus lots of drivers use fences
for all kinds of internal stuff like memory management, so
unconditionally resetting all of them doesn't work.
I'm also hoping that with these fence annotations we could enlist
lockdep in finding the last offenders causing deadlocks, and we
could remove this get-out-of-jail trick.
- The more feasible approach (across drivers at least as part of the
dma_fence contract) is what drm/i915 does for gen2/3: When we need
to reset the display we wake up all dma_fence_wait_interruptible
calls, or well at least the equivalent of those in i915 internally.
Relying on ioctl restart we force all other threads to release their
locks, which means the tdr thread is guaranteed to be able to get
them. I think we could implement this at the dma_fence level,
including proper lockdep annotations.
dma_fence_begin_tdr():
- must be nested within a dma_fence_begin/end_signalling section
- will wake up all interruptible (but not the non-interruptible)
dma_fence_wait() calls and force them to complete with a
-ERESTARTSYS errno code. All new interrupitble calls to
dma_fence_wait() will immeidately fail with the same error code.
dma_fence_end_trdr():
- this will convert dma_fence_wait() calls back to normal.
Of course interrupting dma_fence_wait is only ok if the caller
specified that, which means we need to split the annotations into
interruptible and non-interruptible version. If we then make sure
that we only use interruptible dma_fence_wait() calls while holding
drm_modeset_lock we can grab them in tdr code, and allow display
resets. Doing the same for dma_resv_lock might be a lot harder, so
buffer updates must be avoided.
What's worse, we're not going to be able to make the dma_fence_wait
calls in mmu-notifiers interruptible, that doesn't work. So
allocating memory still wont' be allowed, even in tdr sections. Plus
obviously we can use this trick only in tdr, it is rather intrusive.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 06a736e506ad..e34a44376e87 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -279,9 +279,12 @@ static void drm_sched_job_timedout(struct work_struct *work)
{
struct drm_gpu_scheduler *sched;
struct drm_sched_job *job;
+ bool fence_cookie;
sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
+ fence_cookie = dma_fence_begin_signalling();
+
/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->ring_mirror_list,
@@ -313,6 +316,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
spin_lock(&sched->job_list_lock);
drm_sched_start_timeout(sched);
spin_unlock(&sched->job_list_lock);
+
+ dma_fence_end_signalling(fence_cookie);
}
/**
--
2.26.2
Trying to grab dma_resv_lock while in commit_tail before we've done
all the code that leads to the eventual signalling of the vblank event
(which can be a dma_fence) is deadlock-y. Don't do that.
Here the solution is easy because just grabbing locks to read
something races anyway. We don't need to bother, READ_ONCE is
equivalent. And avoids the locking issue.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c575e7394d03..04c11443b9ca 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6910,7 +6910,11 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
* explicitly on fences instead
* and in general should be called for
* blocking commit to as per framework helpers
+ *
+ * Yes, this deadlocks, since you're calling dma_resv_lock in a
+ * path that leads to a dma_fence_signal(). Don't do that.
*/
+#if 0
r = amdgpu_bo_reserve(abo, true);
if (unlikely(r != 0))
DRM_ERROR("failed to reserve buffer before flip\n");
@@ -6920,6 +6924,12 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
tmz_surface = amdgpu_bo_encrypted(abo);
amdgpu_bo_unreserve(abo);
+#endif
+ /*
+ * this races anyway, so READ_ONCE isn't any better or worse
+ * than the stuff above. Except the stuff above can deadlock.
+ */
+ tiling_flags = READ_ONCE(abo->tiling_flags);
fill_dc_plane_info_and_addr(
dm->adev, new_plane_state, tiling_flags,
--
2.26.2
My dma-fence lockdep annotations caught an inversion because we
allocate memory where we really shouldn't:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_fence_emit+0x30/0x330 [amdgpu]
amdgpu_ib_schedule+0x306/0x550 [amdgpu]
amdgpu_job_run+0x10f/0x260 [amdgpu]
drm_sched_main+0x1b9/0x490 [gpu_sched]
kthread+0x12e/0x150
Trouble right now is that lockdep only validates against GFP_FS, which
would be good enough for shrinkers. But for mmu_notifiers we actually
need !GFP_ATOMIC, since they can be called from any page laundering,
even if GFP_NOFS or GFP_NOIO are set.
I guess we should improve the lockdep annotations for
fs_reclaim_acquire/release.
Ofc real fix is to properly preallocate this fence and stuff it into
the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
the way.
v2: Two more allocations in scheduler paths.
Frist one:
__kmalloc+0x58/0x720
amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Second one:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_sync_fence+0x7e/0x110 [amdgpu]
amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d878fe7fee51..055b47241bb1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
uint32_t seq;
int r;
- fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
+ fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
if (fence == NULL)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index fe92dcd94d4a..fdcd6659f5ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
return amdgpu_sync_fence(sync, ring->vmid_wait, false);
- fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
+ fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
if (!fences)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index b87ca171986a..330476cc0c86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
if (amdgpu_sync_add_later(sync, f, explicit))
return 0;
- e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
+ e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
if (!e)
return -ENOMEM;
--
2.26.2
This is a bit tricky, since ->notifier_lock is held while calling
dma_fence_wait we must ensure that also the read side (i.e.
dma_fence_begin_signalling) is on the same side. If we mix this up
lockdep complaints, and that's again why we want to have these
annotations.
A nice side effect of this is that because of the fs_reclaim priming
for dma_fence_enable lockdep now automatically checks for us that
nothing in here allocates memory, without even running any userptr
workloads.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index a25fb59c127c..e109666aec14 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1212,6 +1212,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amdgpu_job *job;
uint64_t seq;
int r;
+ bool fence_cookie;
job = p->job;
p->job = NULL;
@@ -1226,6 +1227,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
*/
mutex_lock(&p->adev->notifier_lock);
+ fence_cookie = dma_fence_begin_signalling();
+
/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
* -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
*/
@@ -1262,12 +1265,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
+ dma_fence_end_signalling(fence_cookie);
mutex_unlock(&p->adev->notifier_lock);
return 0;
error_abort:
drm_sched_job_cleanup(&job->base);
+ dma_fence_end_signalling(fence_cookie);
mutex_unlock(&p->adev->notifier_lock);
error_unlock:
--
2.26.2
If the scheduler rt thread gets stuck on a mutex that we're holding
while waiting for gpu workloads to complete, we have a problem.
Add dma-fence annotations so that lockdep can check this for us.
I've tried to quite carefully review this, and I think it's at the
right spot. But obviosly no expert on drm scheduler.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 2f319102ae9f..06a736e506ad 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -763,9 +763,12 @@ static int drm_sched_main(void *param)
struct sched_param sparam = {.sched_priority = 1};
struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
int r;
+ bool fence_cookie;
sched_setscheduler(current, SCHED_FIFO, &sparam);
+ fence_cookie = dma_fence_begin_signalling();
+
while (!kthread_should_stop()) {
struct drm_sched_entity *entity = NULL;
struct drm_sched_fence *s_fence;
@@ -823,6 +826,9 @@ static int drm_sched_main(void *param)
wake_up(&sched->job_scheduled);
}
+
+ dma_fence_end_signalling(fence_cookie);
+
return 0;
}
--
2.26.2