On Wed, Apr 29, 2026 at 7:37 PM Mikhail Gavrilov mikhail.v.gavrilov@gmail.com wrote:
When dumping IB contents from a hung job, amdgpu_devcoredump_format() acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid() and then, for each IB referenced by the job, calls amdgpu_bo_reserve() on the BO that backs the IB. Both reservations are taken on reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx, which trips lockdep:
WARNING: possible recursive locking detected
kworker/u128:0 is trying to acquire lock: ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4}, at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
but task is already holding lock: ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4}, at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
Possible unsafe locking scenario: CPU0 ---- lock(reservation_ww_class_mutex); lock(reservation_ww_class_mutex);
*** DEADLOCK *** May be due to missing lock nesting notation
Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu] Call Trace: __ww_mutex_lock.constprop.0 ww_mutex_lock amdgpu_bo_reserve amdgpu_devcoredump_format+0x1594 [amdgpu] amdgpu_devcoredump_deferred_work+0xea [amdgpu] process_one_work worker_thread kthread
Friendly ping. Pierre-Eric, Christian, Alex — any thoughts on this fix?
Happy to spin a v2 with any review feedback. One thing I'm aware of: the `Cc: stable@vger.kernel.org # 7.1` tag is probably unnecessary since the regression only landed in 7.1-rc1 and the fix will reach 7.1 final naturally via drm-fixes; I can drop it in v2 if preferred.