From: Ben Skeggs bskeggs@nvidia.com
[ Upstream commit 7904bcdcf6b56602a049ed2b47282db63671fa99 ]
Though the initial upstreamed GSP-RM version in nouveau was 535.113.01, the code was developed against earlier versions.
535.42.02 modified the mailbox value used by GSP-RM to signal shutdown has completed, which was missed at the time.
I'm not aware of any issues caused by this, but noticed the bug while working on GB20x support.
Signed-off-by: Ben Skeggs bskeggs@nvidia.com Reviewed-by: Dave Airlie airlied@redhat.com Reviewed-by: Timur Tabi ttabi@nvidia.com Tested-by: Timur Tabi ttabi@nvidia.com Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES** This commit should be backported to stable kernel trees. **Analysis:** **1. Nature of the Bug Fix:** The commit fixes a bug in the GSP-RM shutdown wait condition in the r535_gsp_fini() function at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2612. The change is from: ```c if (nvkm_falcon_rd32(&gsp->falcon, 0x040) & 0x80000000) ``` to: ```c if (nvkm_falcon_rd32(&gsp->falcon, 0x040) == 0x80000000) ``` **2. Technical Impact:** - The original code used a bitwise AND operation (`&`) to check if bit 31 is set in the mailbox value - The fixed code uses an equality comparison (`==`) to check if the mailbox value is exactly 0x80000000 - This fixes a version compatibility issue where GSP- RM 535.42.02 changed the shutdown completion signal value - The bug could potentially cause incorrect shutdown behavior or timing issues **3. Backport Criteria Assessment:** - **Fixes important bug**: YES - This fixes a GSP-RM shutdown synchronization bug that could affect system stability - **Small and contained**: YES - Single line change, very minimal and focused - **No architectural changes**: YES - Simple conditional logic fix - **Minimal regression risk**: YES - The change is specific to GSP-RM shutdown sequence and well-understood - **Clear side effects**: NO - The fix has no unexpected side effects, only corrects the intended behavior **4. Critical Subsystem Impact:** The change affects the Nouveau DRM GSP (Graphics System Processor) subsystem, specifically the shutdown sequence. GSP is critical for modern NVIDIA GPU operation, and incorrect shutdown behavior could lead to system instability, resource leaks, or improper hardware state transitions. **5. Historical Context:** Looking at the similar commits provided as reference, all were marked "NO" for backporting because they were architectural preparation changes for GSP-RM support (adding `if (nvkm_gsp_rm(device->gsp)) return -ENODEV;` guards). This commit is fundamentally different - it's a targeted bug fix rather than a feature preparation change. **6. Stability Tree Rules Compliance:** - This is an important bugfix that corrects faulty synchronization logic - It has minimal risk of introducing regressions - The change is confined to a specific subsystem (nouveau GSP) - No new features are introduced The commit message explicitly states this was developed against earlier GSP- RM versions and the bug was "missed at the time" when 535.42.02 modified the mailbox behavior. While the author mentions being unaware of specific issues, incorrect shutdown synchronization in GPU drivers can lead to subtle but serious problems including system hangs, resource leaks, or corrupted GPU state.
drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index db2602e880062..6a964b54f69c2 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -2838,7 +2838,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend) return ret;
nvkm_msec(gsp->subdev.device, 2000, - if (nvkm_falcon_rd32(&gsp->falcon, 0x040) & 0x80000000) + if (nvkm_falcon_rd32(&gsp->falcon, 0x040) == 0x80000000) break; );