My dma-fence lockdep annotations caught an inversion because we
allocate memory where we really shouldn't:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_fence_emit+0x30/0x330 [amdgpu]
amdgpu_ib_schedule+0x306/0x550 [amdgpu]
amdgpu_job_run+0x10f/0x260 [amdgpu]
drm_sched_main+0x1b9/0x490 [gpu_sched]
kthread+0x12e/0x150
Trouble right now is that lockdep only validates against GFP_FS, which
would be good enough for shrinkers. But for mmu_notifiers we actually
need !GFP_ATOMIC, since they can be called from any page laundering,
even if GFP_NOFS or GFP_NOIO are set.
I guess we should improve the lockdep annotations for
fs_reclaim_acquire/release.
Ofc real fix is to properly preallocate this fence and stuff it into
the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
the way.
v2: Two more allocations in scheduler paths.
Frist one:
__kmalloc+0x58/0x720
amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Second one:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_sync_fence+0x7e/0x110 [amdgpu]
amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index fe2d495d08ab..09614b325b5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
uint32_t seq;
int r;
- fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
+ fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
if (fence == NULL)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index 7521f4ab55de..2a4cde7cd746 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
return amdgpu_sync_fence(sync, ring->vmid_wait);
- fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
+ fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
if (!fences)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 8ea6c49529e7..af22b526cec9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
if (amdgpu_sync_add_later(sync, f))
return 0;
- e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
+ e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
if (!e)
return -ENOMEM;
--
2.28.0
This is a bit tricky, since ->notifier_lock is held while calling
dma_fence_wait we must ensure that also the read side (i.e.
dma_fence_begin_signalling) is on the same side. If we mix this up
lockdep complaints, and that's again why we want to have these
annotations.
A nice side effect of this is that because of the fs_reclaim priming
for dma_fence_enable lockdep now automatically checks for us that
nothing in here allocates memory, without even running any userptr
workloads.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d50b63a93d37..3b3999225e31 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1212,6 +1212,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amdgpu_job *job;
uint64_t seq;
int r;
+ bool fence_cookie;
job = p->job;
p->job = NULL;
@@ -1226,6 +1227,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
*/
mutex_lock(&p->adev->notifier_lock);
+ fence_cookie = dma_fence_begin_signalling();
+
/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
* -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
*/
@@ -1262,12 +1265,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
+ dma_fence_end_signalling(fence_cookie);
mutex_unlock(&p->adev->notifier_lock);
return 0;
error_abort:
drm_sched_job_cleanup(&job->base);
+ dma_fence_end_signalling(fence_cookie);
mutex_unlock(&p->adev->notifier_lock);
error_unlock:
--
2.28.0
If the scheduler rt thread gets stuck on a mutex that we're holding
while waiting for gpu workloads to complete, we have a problem.
Add dma-fence annotations so that lockdep can check this for us.
I've tried to quite carefully review this, and I think it's at the
right spot. But obviosly no expert on drm scheduler.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9a0d77a68018..f69abc4e70d3 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -764,9 +764,12 @@ static int drm_sched_main(void *param)
{
struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
int r;
+ bool fence_cookie;
sched_set_fifo_low(current);
+ fence_cookie = dma_fence_begin_signalling();
+
while (!kthread_should_stop()) {
struct drm_sched_entity *entity = NULL;
struct drm_sched_fence *s_fence;
@@ -824,6 +827,9 @@ static int drm_sched_main(void *param)
wake_up(&sched->job_scheduled);
}
+
+ dma_fence_end_signalling(fence_cookie);
+
return 0;
}
--
2.28.0
This is rather overkill since currently all drivers call this from
hardirq (or at least timers). But maybe in the future we're going to
have thread irq handlers and what not, doesn't hurt to be prepared.
Plus this is an easy start for sprinkling these fence annotations into
shared code.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_vblank.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index f135b79593dd..ba7e741764aa 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -24,6 +24,7 @@
* OTHER DEALINGS IN THE SOFTWARE.
*/
+#include <linux/dma-fence.h>
#include <linux/export.h>
#include <linux/kthread.h>
#include <linux/moduleparam.h>
@@ -1913,7 +1914,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
{
struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
unsigned long irqflags;
- bool disable_irq;
+ bool disable_irq, fence_cookie;
if (drm_WARN_ON_ONCE(dev, !drm_dev_has_vblank(dev)))
return false;
@@ -1921,6 +1922,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
if (drm_WARN_ON(dev, pipe >= dev->num_crtcs))
return false;
+ fence_cookie = dma_fence_begin_signalling();
+
spin_lock_irqsave(&dev->event_lock, irqflags);
/* Need timestamp lock to prevent concurrent execution with
@@ -1933,6 +1936,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
if (!vblank->enabled) {
spin_unlock(&dev->vblank_time_lock);
spin_unlock_irqrestore(&dev->event_lock, irqflags);
+ dma_fence_end_signalling(fence_cookie);
return false;
}
@@ -1959,6 +1963,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
if (disable_irq)
vblank_disable_fn(&vblank->disable_timer);
+ dma_fence_end_signalling(fence_cookie);
+
return true;
}
EXPORT_SYMBOL(drm_handle_vblank);
--
2.28.0
This is a bit disappointing since we need to split the annotations
over all the different parts.
I was considering just leaking the critical section into the
->atomic_commit_tail callback of each driver. But that would mean we
need to pass the fence_cookie into each driver (there's a total of 13
implementations of this hook right now), so bad flag day. And also a
bit leaky abstraction.
Hence just do it function-by-function.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_atomic_helper.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 549a31e6042c..23013209d4bf 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1567,6 +1567,7 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done);
void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
{
struct drm_device *dev = old_state->dev;
+ bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, old_state);
@@ -1578,6 +1579,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
drm_atomic_helper_commit_hw_done(old_state);
+ dma_fence_end_signalling(fence_cookie);
+
drm_atomic_helper_wait_for_vblanks(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state);
@@ -1597,6 +1600,7 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_tail);
void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
{
struct drm_device *dev = old_state->dev;
+ bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, old_state);
@@ -1609,6 +1613,8 @@ void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
drm_atomic_helper_commit_hw_done(old_state);
+ dma_fence_end_signalling(fence_cookie);
+
drm_atomic_helper_wait_for_vblanks(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state);
@@ -1624,6 +1630,9 @@ static void commit_tail(struct drm_atomic_state *old_state)
ktime_t start;
s64 commit_time_ms;
unsigned int i, new_self_refresh_mask = 0;
+ bool fence_cookie;
+
+ fence_cookie = dma_fence_begin_signalling();
funcs = dev->mode_config.helper_private;
@@ -1652,6 +1661,8 @@ static void commit_tail(struct drm_atomic_state *old_state)
if (new_crtc_state->self_refresh_active)
new_self_refresh_mask |= BIT(i);
+ dma_fence_end_signalling(fence_cookie);
+
if (funcs && funcs->atomic_commit_tail)
funcs->atomic_commit_tail(old_state);
else
@@ -1810,6 +1821,7 @@ int drm_atomic_helper_commit(struct drm_device *dev,
bool nonblock)
{
int ret;
+ bool fence_cookie;
if (state->async_update) {
ret = drm_atomic_helper_prepare_planes(dev, state);
@@ -1832,6 +1844,8 @@ int drm_atomic_helper_commit(struct drm_device *dev,
if (ret)
return ret;
+ fence_cookie = dma_fence_begin_signalling();
+
if (!nonblock) {
ret = drm_atomic_helper_wait_for_fences(dev, state, true);
if (ret)
@@ -1869,6 +1883,7 @@ int drm_atomic_helper_commit(struct drm_device *dev,
*/
drm_atomic_state_get(state);
+ dma_fence_end_signalling(fence_cookie);
if (nonblock)
queue_work(system_unbound_wq, &state->commit_work);
else
@@ -1877,6 +1892,7 @@ int drm_atomic_helper_commit(struct drm_device *dev,
return 0;
err:
+ dma_fence_end_signalling(fence_cookie);
drm_atomic_helper_cleanup_planes(dev, state);
return ret;
}
--
2.28.0
From: Rob Clark <robdclark(a)chromium.org>
This doesn't remove *all* the struct_mutex, but it covers the worst
of it, ie. shrinker/madvise/free/retire. The submit path still uses
struct_mutex, but it still needs *something* serialize a portion of
the submit path, and lock_stat mostly just shows the lock contention
there being with other submits. And there are a few other bits of
struct_mutex usage in less critical paths (debugfs, etc). But this
seems like a reasonable step in the right direction.
v2: teach lockdep about shrinker locking patters (danvet) and
convert to obj->resv locking (danvet)
v3: fix get_vaddr locking for legacy userspace (relocs), devcoredump,
and rd/hangrd
Rob Clark (23):
drm/msm: Fix a couple incorrect usages of get_vaddr_active()
drm/msm/gem: Add obj->lock wrappers
drm/msm/gem: Rename internal get_iova_locked helper
drm/msm/gem: Move prototypes to msm_gem.h
drm/msm/gem: Add some _locked() helpers
drm/msm/gem: Move locking in shrinker path
drm/msm/submit: Move copy_from_user ahead of locking bos
drm/msm: Do rpm get sooner in the submit path
drm/msm/gem: Switch over to obj->resv for locking
drm/msm: Use correct drm_gem_object_put() in fail case
drm/msm: Drop chatty trace
drm/msm: Move update_fences()
drm/msm: Add priv->mm_lock to protect active/inactive lists
drm/msm: Document and rename preempt_lock
drm/msm: Protect ring->submits with it's own lock
drm/msm: Refcount submits
drm/msm: Remove obj->gpu
drm/msm: Drop struct_mutex from the retire path
drm/msm: Drop struct_mutex in free_object() path
drm/msm: Remove msm_gem_free_work
drm/msm: Drop struct_mutex in madvise path
drm/msm: Drop struct_mutex in shrinker path
drm/msm: Don't implicit-sync if only a single ring
drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 6 +-
drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 12 +-
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 1 +
drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c | 1 +
drivers/gpu/drm/msm/dsi/dsi_host.c | 1 +
drivers/gpu/drm/msm/msm_debugfs.c | 7 +
drivers/gpu/drm/msm/msm_drv.c | 21 +-
drivers/gpu/drm/msm/msm_drv.h | 73 +-----
drivers/gpu/drm/msm/msm_fbdev.c | 1 +
drivers/gpu/drm/msm/msm_gem.c | 266 +++++++++++-----------
drivers/gpu/drm/msm/msm_gem.h | 133 +++++++++--
drivers/gpu/drm/msm/msm_gem_shrinker.c | 81 ++-----
drivers/gpu/drm/msm/msm_gem_submit.c | 158 ++++++++-----
drivers/gpu/drm/msm/msm_gpu.c | 110 +++++----
drivers/gpu/drm/msm/msm_gpu.h | 5 +-
drivers/gpu/drm/msm/msm_rd.c | 2 +-
drivers/gpu/drm/msm/msm_ringbuffer.c | 3 +-
drivers/gpu/drm/msm/msm_ringbuffer.h | 13 +-
19 files changed, 495 insertions(+), 405 deletions(-)
--
2.26.2
On Thu, Aug 27, 2020 at 09:31:27AM -0400, Laura Abbott wrote:
> On 8/27/20 8:36 AM, Greg Kroah-Hartman wrote:
> > The ION android code has long been marked to be removed, now that we
> > dma-buf support merged into the real part of the kernel.
> >
> > It was thought that we could wait to remove the ion kernel at a later
> > time, but as the out-of-tree Android fork of the ion code has diverged
> > quite a bit, and any Android device using the ion interface uses that
> > forked version and not this in-tree version, the in-tree copy of the
> > code is abandonded and not used by anyone.
> >
> > Combine this abandoned codebase with the need to make changes to it in
> > order to keep the kernel building properly, which then causes merge
> > issues when merging those changes into the out-of-tree Android code, and
> > you end up with two different groups of people (the in-kernel-tree
> > developers, and the Android kernel developers) who are both annoyed at
> > the current situation. Because of this problem, just drop the in-kernel
> > copy of the ion code now, as it's not used, and is only causing problems
> > for everyone involved.
> >
> > Cc: "Arve Hjønnevåg" <arve(a)android.com>
> > Cc: "Christian König" <christian.koenig(a)amd.com>
> > Cc: Christian Brauner <christian(a)brauner.io>
> > Cc: Christoph Hellwig <hch(a)infradead.org>
> > Cc: Hridya Valsaraju <hridya(a)google.com>
> > Cc: Joel Fernandes <joel(a)joelfernandes.org>
> > Cc: John Stultz <john.stultz(a)linaro.org>
> > Cc: Laura Abbott <laura(a)labbott.name>
> > Cc: Martijn Coenen <maco(a)android.com>
> > Cc: Shuah Khan <shuah(a)kernel.org>
> > Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> > Cc: Suren Baghdasaryan <surenb(a)google.com>
> > Cc: Todd Kjos <tkjos(a)android.com>
> > Cc: devel(a)driverdev.osuosl.org
> > Cc: dri-devel(a)lists.freedesktop.org
> > Cc: linaro-mm-sig(a)lists.linaro.org
> > Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>
> We discussed this at the Android MC on Monday and the plan was to
> remove it after the next LTS release.
As 5.10 will be the next LTS release, I have now merged it to my
"testing" branch to go into 5.11-rc1.
thanks,
greg k-h