In a past bug fix it was forgotten that entity access must be protected by the entity lock. That's a data race and potentially UB.
Move the spin_unlock() to the appropriate position.
Cc: stable@vger.kernel.org # v5.13+ Fixes: ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3") Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 5a4697f636f2..aa222166de58 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -552,10 +552,11 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; } - spin_unlock(&entity->lock);
if (entity->num_sched_list == 1) entity->sched_list = NULL; + + spin_unlock(&entity->lock); }
/**
On 22/10/2025 07:34, Philipp Stanner wrote:
In a past bug fix it was forgotten that entity access must be protected by the entity lock. That's a data race and potentially UB.
Move the spin_unlock() to the appropriate position.
Cc: stable@vger.kernel.org # v5.13+ Fixes: ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3") Signed-off-by: Philipp Stanner phasta@kernel.org
drivers/gpu/drm/scheduler/sched_entity.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 5a4697f636f2..aa222166de58 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -552,10 +552,11 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; }
- spin_unlock(&entity->lock);
if (entity->num_sched_list == 1) entity->sched_list = NULL;
- spin_unlock(&entity->lock); }
/**
Agreed, unlock at the end is correct.
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com
Luckily only amdgpu calls drm_sched_entity_modify_sched(), and I am not sure if with hypothetical multi-threaded submit it could be possible to hit it or not.
One thread would have to clear the sched_list, while another would have to have passed the !sched_list at the beginning of the function, and the job_queue count and last_scheduled completed guards. Which would mean job has to get completed between the unlock and clearing the pointer.
Also, take a look at this when you can please:
https://lore.kernel.org/dri-devel/20240719094730.55301-1-tursulin@igalia.com...
This would also have fixed this issue by removing clearing of entity->sched_list. Latter being just a hack to work around the issue of confused container ownership rules.
Regards,
Tvrtko
linux-stable-mirror@lists.linaro.org