On 5/19/26 15:05, Mikhail Gavrilov wrote:
On Wed, Apr 29, 2026 at 7:37 PM Mikhail Gavrilov mikhail.v.gavrilov@gmail.com wrote:
When dumping IB contents from a hung job, amdgpu_devcoredump_format() acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid() and then, for each IB referenced by the job, calls amdgpu_bo_reserve() on the BO that backs the IB. Both reservations are taken on reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx, which trips lockdep:
WARNING: possible recursive locking detected
kworker/u128:0 is trying to acquire lock: ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4}, at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
but task is already holding lock: ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4}, at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
Possible unsafe locking scenario: CPU0 ---- lock(reservation_ww_class_mutex); lock(reservation_ww_class_mutex);
*** DEADLOCK *** May be due to missing lock nesting notation
Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu] Call Trace: __ww_mutex_lock.constprop.0 ww_mutex_lock amdgpu_bo_reserve amdgpu_devcoredump_format+0x1594 [amdgpu] amdgpu_devcoredump_deferred_work+0xea [amdgpu] process_one_work worker_thread kthread
Friendly ping. Pierre-Eric, Christian, Alex — any thoughts on this fix?
Happy to spin a v2 with any review feedback. One thing I'm aware of: the `Cc: stable@vger.kernel.org # 7.1` tag is probably unnecessary since the regression only landed in 7.1-rc1 and the fix will reach 7.1 final naturally via drm-fixes; I can drop it in v2 if preferred.
Good catch, but the fix is complete overkill.
You can lock multiple BOs at the same time, something like that here should do it:
drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 2); drm_exec_until_all_locked(&exec) { ret = amdgpu_vm_lock_pd(vm, &exec, 1); drm_exec_retry_on_contention(&exec); if (unlikely(ret)) goto fail_lock;
mapping = amdgpu_vm_bo_lookup_mapping(vm, ib_addr >> PAGE_SHIFT); if (!wptr_mapping) { ret = -EINVAL; goto fail_lock; }
obj = mapping->bo_va->base.bo; ret = drm_exec_lock_obj(&exec, &obj->tbo.base); drm_exec_retry_on_contention(&exec); if (unlikely(ret)) goto fail_lock; }
@Pierre-Eric can you take a look at that as well?
Thanks in advance, Christian.