[Public]
-----Original Message----- From: Greg Kroah-Hartman gregkh@linuxfoundation.org Sent: Monday, February 28, 2022 12:23 PM To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org; stable@vger.kernel.org; Paul Menzel pmenzel@molgen.mpg.de; Koenig, Christian Christian.Koenig@amd.com; Yu, Qiang Qiang.Yu@amd.com; Deucher, Alexander Alexander.Deucher@amd.com Subject: [PATCH 5.15 020/139] drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
From: Qiang Yu qiang.yu@amd.com
commit c1a66c3bc425ff93774fb2f6eefa67b83170dd7e upstream.
Workstation application ANSA/META v21.1.4 get this error dmesg when running CI test suite provided by ANSA/META: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (- 16)
This is caused by:
- create a 256MB buffer in invisible VRAM 2. CPU map the buffer and access
it causes vm_fault and try to move it to visible VRAM 3. force visible VRAM space and traverse all VRAM bos to check if evicting this bo is valuable 4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable() will set amdgpu_vm->evicting, but latter due to not in visible VRAM, won't really evict it so not add it to amdgpu_vm->evicted 5. before next CS to clear the amdgpu_vm->evicting, user VM ops ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted) but fail in amdgpu_vm_bo_update_mapping() (check amdgpu_vm->evicting) and get this error log
This error won't affect functionality as next CS will finish the waiting VM ops. But we'd better clear the error log by checking the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling amdgpu_vm_bo_update_mapping() later.
Another reason is amdgpu_vm->evicted list holds all BOs (both user buffer and page table), but only page table BOs' eviction prevent VM ops. amdgpu_vm->evicting flag is set only for page table BOs, so we should use evicting flag instead of evicted list in amdgpu_vm_ready().
The side effect of this change is: previously blocked VM op (user buffer in "evicted" list but no page table in it) gets done immediately.
v2: update commit comments.
Acked-by: Paul Menzel pmenzel@molgen.mpg.de Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Qiang Yu qiang.yu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
A regression was reported against this patch in 5.17. Please drop for now.
Alex
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -768,11 +768,16 @@ int amdgpu_vm_validate_pt_bos(struct amd
- Check if all VM PDs/PTs are ready for updates
- Returns:
- True if eviction list is empty.
*/
- True if VM is not evicting.
bool amdgpu_vm_ready(struct amdgpu_vm *vm) {
- return list_empty(&vm->evicted);
- bool ret;
- amdgpu_vm_eviction_lock(vm);
- ret = !vm->evicting;
- amdgpu_vm_eviction_unlock(vm);
- return ret;
}
/**