6.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesus Narvaez jesus.narvaez@intel.com
[ Upstream commit 0323a5127e7c534cfc88efe0f850a0cb777e938b ]
There is a rare race condition when preparing for a reset where guc_lrc_desc_unpin() could be in the process of deregistering a context while a different thread is scrubbing outstanding contexts and it alters the context state and does a wakeref put. Then, if there is a failure with deregister_context(), a second wakeref put could occur. As a result the wakeref count could drop below 0 and fail an INTEL_WAKEREF_BUG_ON() check.
Therefore if there is a failure with deregister_context(), undo the context state changes and do a wakeref put only if the context was set to be destroyed earlier.
v2: Expand comment to better explain change. (Daniele) v3: Removed addition to the original comment. (Daniele)
Fixes: 2f2cc53b5fe7 ("drm/i915/guc: Close deregister-context race against CT-loss") Signed-off-by: Jesus Narvaez jesus.narvaez@intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Alan Previn alan.previn.teres.alexis@intel.com Cc: Anshuman Gupta anshuman.gupta@intel.com Cc: Mousumi Jana mousumi.jana@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Matt Roper matthew.d.roper@intel.com Reviewed-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: John Harrison John.C.Harrison@Intel.com Link: https://lore.kernel.org/r/20250528230551.1855177-1-jesus.narvaez@intel.com (cherry picked from commit f36a75aba1c3176d177964bca76f86a075d2943a) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 108331a699958..127316d2c8aa9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -3443,18 +3443,29 @@ static inline int guc_lrc_desc_unpin(struct intel_context *ce) * GuC is active, lets destroy this context, but at this point we can still be racing * with suspend, so we undo everything if the H2G fails in deregister_context so * that GuC reset will find this context during clean up. + * + * There is a race condition where the reset code could have altered + * this context's state and done a wakeref put before we try to + * deregister it here. So check if the context is still set to be + * destroyed before undoing earlier changes, to avoid two wakeref puts + * on the same context. */ ret = deregister_context(ce, ce->guc_id.id); if (ret) { + bool pending_destroyed; spin_lock_irqsave(&ce->guc_state.lock, flags); - set_context_registered(ce); - clr_context_destroyed(ce); + pending_destroyed = context_destroyed(ce); + if (pending_destroyed) { + set_context_registered(ce); + clr_context_destroyed(ce); + } spin_unlock_irqrestore(&ce->guc_state.lock, flags); /* * As gt-pm is awake at function entry, intel_wakeref_put_async merely decrements * the wakeref immediately but per function spec usage call this after unlock. */ - intel_wakeref_put_async(>->wakeref); + if (pending_destroyed) + intel_wakeref_put_async(>->wakeref); }
return ret;