From: Philip Yang Philip.Yang@amd.com
[ Upstream commit a359288ccb4dd8edb086e7de8fdf6e36f544c922 ]
To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm".
Change to use uninterruptible lock fix the mapping leaking and no dmesg error log.
Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my analysis, I can now provide a definitive answer:
**YES**, this commit should be backported to stable kernel trees.
## Extensive Explanation:
### 1. **This fixes a real memory leak bug** The commit addresses a memory leak that occurs when a process is terminating and receives a signal during the seq64 memory cleanup phase. The interruptible lock acquisition (`DRM_EXEC_INTERRUPTIBLE_WAIT`) can be interrupted by signals, causing the cleanup to fail and leaving seq64 VA mappings active. This results in: - Memory leaks visible as "still active bo inside vm" error messages - GPU memory resource exhaustion over time - Potential system instability
### 2. **The fix is minimal and focused** The change is a single-line modification: ```diff - drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0); + drm_exec_init(&exec, 0, 0); ``` This changes the lock acquisition from interruptible to uninterruptible, ensuring cleanup always completes.
### 3. **It follows an established pattern** Similar Commit #1 (with Backport Status: YES) shows the exact same fix pattern was already applied to CSA cleanup: ```diff - drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0); + drm_exec_init(&exec, 0, 0); ``` This proves the approach is tested and validated.
### 4. **The fix is appropriate for cleanup paths** Using uninterruptible locks during cleanup/teardown is the correct approach because: - The process is already exiting, so a brief wait won't impact user experience - Cleanup MUST complete to prevent resource leaks - Signal handling during cleanup can cause more harm than good
### 5. **Low risk of regression** - The change only affects the cleanup path during process termination - It doesn't introduce new features or change normal operation behavior - The same pattern has been successfully used elsewhere in the driver
### 6. **Clear user impact** Without this fix, users will experience: - GPU memory leaks during normal usage (process exits) - Accumulating error messages in dmesg - Eventual GPU memory exhaustion requiring system restart
This meets all the criteria for stable backporting: it's a focused bug fix for a real issue affecting users, with minimal risk and no architectural changes.
drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c index e22cb2b5cd926..dba8051b8c14b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c @@ -133,7 +133,7 @@ void amdgpu_seq64_unmap(struct amdgpu_device *adev, struct amdgpu_fpriv *fpriv)
vm = &fpriv->vm;
- drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0); + drm_exec_init(&exec, 0, 0); drm_exec_until_all_locked(&exec) { r = amdgpu_vm_lock_pd(vm, &exec, 0); if (likely(!r))