Both Tvrtko [1] and I [2] have recently proposed some improvals for drm_sched.
While taking Tvrtko's feedback into account for my patch, I realized that both his and my patch can be fully replaced with a bigger and far more beautiful series.
If I am not mistaken, it turns out that the entire entity->entity_idle completion is also nothing but a workaround around the grave mistake of not using the greatest helper with parallel programming that exists in computer science: Locking.
This series adds locking to the last_scheduled field and all checks related to detect the idleness of the entity. As before, the job_scheduled event queue causes the periodic checks.
This way, we can get rid of memory barriers, RCU, a few lines of code, make things more readable, understandable...
Tested with drm-sched-unit tests. I'm a bit busy right now, but wanted to show you guys the idea. Before merging I'd test it more exhaustively with Nouveau.
Greetings, Philipp
[1] https://lore.kernel.org/dri-devel/20260611123423.39819-1-tvrtko.ursulin@igal... [2] https://lore.kernel.org/dri-devel/20260626081942.2122144-2-phasta@kernel.org...
Philipp Stanner (5): drm/sched: Protect entity->last_scheduled with spinlock drm/sched: Lock spsc_queue in drm_sched_entity_pop_job() drm/sched: Avoid lock cycle for sched_entity drm/sched: Lock drm_sched_entity_is_idle() drm/sched: Remove entity->entity_idle
drivers/gpu/drm/scheduler/sched_entity.c | 75 +++++++++++------------- drivers/gpu/drm/scheduler/sched_main.c | 2 - drivers/gpu/drm/scheduler/sched_rq.c | 5 +- include/drm/gpu_scheduler.h | 16 ++--- 4 files changed, 41 insertions(+), 57 deletions(-)
base-commit: be4f10d44757211fd656fa57f37034657f26c883
The entity->last_scheduled field has always been set and read with special RCU functions in addition to memory barriers.
This was added in
commit 70102d77ff22 ("drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled")
however, no proper justification for that mechanism was provided. There seems to be no obvious reason, since the entity lock is available and taken at all places that evaluate the last_scheduled field. The only exception is drm_sched_entity_error(), which is not performance critical in any way.
Improve robustness, readability and maintainability by replacing RCU and barriers with the lock.
Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 50 ++++++++++-------------- include/drm/gpu_scheduler.h | 9 ++--- 2 files changed, 25 insertions(+), 34 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c51101ec70c1..91aec20611ad 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -135,7 +135,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->num_sched_list = num_sched_list; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->rq = &sched_list[0]->rq; - RCU_INIT_POINTER(entity->last_scheduled, NULL); RB_CLEAR_NODE(&entity->rb_tree_node); init_completion(&entity->entity_idle);
@@ -201,10 +200,10 @@ int drm_sched_entity_error(struct drm_sched_entity *entity) struct dma_fence *fence; int r;
- rcu_read_lock(); - fence = rcu_dereference(entity->last_scheduled); + spin_lock(&entity->lock); + fence = entity->last_scheduled; r = fence ? fence->error : 0; - rcu_read_unlock(); + spin_unlock(&entity->lock);
return r; } @@ -287,9 +286,10 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity) /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle);
- /* The entity is guaranteed to not be used by the scheduler */ - prev = rcu_dereference_check(entity->last_scheduled, true); + spin_lock(&entity->lock); + prev = entity->last_scheduled; dma_fence_get(prev); + spin_unlock(&entity->lock); while ((job = drm_sched_entity_queue_pop(entity))) { struct drm_sched_fence *s_fence = job->s_fence;
@@ -381,8 +381,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity) entity->dependency = NULL; }
- dma_fence_put(rcu_dereference_check(entity->last_scheduled, true)); - RCU_INIT_POINTER(entity->last_scheduled, NULL); + dma_fence_put(entity->last_scheduled); drm_sched_entity_stats_put(entity->stats); } EXPORT_SYMBOL(drm_sched_entity_fini); @@ -507,6 +506,10 @@ drm_sched_job_dependency(struct drm_sched_job *job,
struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) { + /* Helper to avoid dropping the reference while the entity lock is held, + * just to have some more robustness. + */ + struct dma_fence *prev_last_scheduled; struct drm_sched_job *sched_job;
sched_job = drm_sched_entity_queue_peek(entity); @@ -523,19 +526,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) if (entity->guilty && atomic_read(entity->guilty)) dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED);
- dma_fence_put(rcu_dereference_check(entity->last_scheduled, true)); - rcu_assign_pointer(entity->last_scheduled, - dma_fence_get(&sched_job->s_fence->finished)); - - /* - * If the queue is empty we allow drm_sched_entity_select_rq() to - * locklessly access ->last_scheduled. This only works if we set the - * pointer before we dequeue and if we a write barrier here. - */ - smp_wmb(); + spin_lock(&entity->lock); + prev_last_scheduled = entity->last_scheduled; + entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished); + spin_unlock(&entity->lock);
spsc_queue_pop(&entity->job_queue);
+ dma_fence_put(prev_last_scheduled); drm_sched_rq_pop_entity(entity);
/* Jobs and entities might have different lifecycles. Since we're @@ -561,21 +559,15 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (spsc_queue_count(&entity->job_queue)) return;
- /* - * Only when the queue is empty are we guaranteed that - * drm_sched_run_job_work() cannot change entity->last_scheduled. To - * enforce ordering we need a read barrier here. See - * drm_sched_entity_pop_job() for the other side. - */ - smp_rmb(); - - fence = rcu_dereference_check(entity->last_scheduled, true); + spin_lock(&entity->lock); + fence = entity->last_scheduled;
/* stay on the same engine if the previous job hasn't finished */ - if (fence && !dma_fence_is_signaled(fence)) + if (fence && !dma_fence_is_signaled(fence)) { + spin_unlock(&entity->lock); return; + }
- spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? &sched->rq : NULL; if (rq != entity->rq) { diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d61c19e78182..176ff1f936cd 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -100,7 +100,8 @@ struct drm_sched_entity { * @lock: * * Lock protecting the run-queue (@rq) to which this entity belongs, - * @priority and the list of schedulers (@sched_list, @num_sched_list). + * @priority, @last_scheduled and the list of schedulers (@sched_list, + * @num_sched_list). */ spinlock_t lock;
@@ -202,11 +203,9 @@ struct drm_sched_entity { /** * @last_scheduled: * - * Points to the finished fence of the last scheduled job. Only written - * by drm_sched_entity_pop_job(). Can be accessed locklessly from - * drm_sched_job_arm() if the queue is empty. + * Points to the finished fence of the last scheduled job. */ - struct dma_fence __rcu *last_scheduled; + struct dma_fence *last_scheduled;
/** * @last_user: last group leader pushing a job into the entity.
Cleanup work in the preceding commit added locking to drm_sched_entity_pop_job(). This cleanup causes a slightly sub-optimal lock cycle with drm_sched_rq_pop_entity().
sched_entity also utilizes the lockless spsc_queue (partially already used simultaneously with locks), which was marked for removal in
commit 6e7eb171ac96 ("Documentation: drm: Add entry for removing spsc_queue to TODO list")
To remove the lock-cycle mentioned above, the unlock must be moved downwards, also locking the lockless queue.
Guard spsc_queue_pop() in drm_sched_entity_pop_job() with the lock and document why that is being done.
Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 91aec20611ad..5cf0af91faf2 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -529,9 +529,17 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) spin_lock(&entity->lock); prev_last_scheduled = entity->last_scheduled; entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished); - spin_unlock(&entity->lock);
+ /* Preceding cleanup work made it necessary to add the spinlock + * to this function. spsc_queue, a lockless queue, is now + * counterintuitively guarded by the lock as well. spsc_queue is queued + * for removal (see DRM TODO list), so this somewhat serves as a + * preparational step. + * + * TODO: Replace spsc_queue completely with a locked (h)list. + */ spsc_queue_pop(&entity->job_queue); + spin_unlock(&entity->lock);
dma_fence_put(prev_last_scheduled); drm_sched_rq_pop_entity(entity);
Previous cleanup commits created a slightly sub-optimal lock-cycle between the two functions drm_sched_entity_pop_job() and drm_sched_rq_pop_entity().
Avoid the lock-cycle by moving the locking from drm_sched_rq_pop_entity() to drm_sched_entity_pop_job(). Add the appropriate lockdep check.
Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 2 +- drivers/gpu/drm/scheduler/sched_rq.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 5cf0af91faf2..0fc1213a0d3f 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -539,10 +539,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) * TODO: Replace spsc_queue completely with a locked (h)list. */ spsc_queue_pop(&entity->job_queue); + drm_sched_rq_pop_entity(entity); spin_unlock(&entity->lock);
dma_fence_put(prev_last_scheduled); - drm_sched_rq_pop_entity(entity);
/* Jobs and entities might have different lifecycles. Since we're * removing the job from the entities queue, set the jobs entity pointer diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c index 044546bcb5f8..97363f9ef8bc 100644 --- a/drivers/gpu/drm/scheduler/sched_rq.c +++ b/drivers/gpu/drm/scheduler/sched_rq.c @@ -319,11 +319,12 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity) struct drm_sched_job *next_job; struct drm_sched_rq *rq;
+ lockdep_assert_held(&entity->lock); + /* * Update the entity's location in the min heap according to * the timestamp of the next job, if any. */ - spin_lock(&entity->lock); rq = entity->rq; spin_lock(&rq->lock); next_job = drm_sched_entity_queue_peek(entity); @@ -340,7 +341,6 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity) drm_sched_entity_save_vruntime(entity, min_vruntime); } spin_unlock(&rq->lock); - spin_unlock(&entity->lock); }
/**
drm_sched_entity_is_idle() contains a badly documented memory barrier and an invalid lockless access to entity->stopped.
This function is in no way performance critical, so it is safer, more readable and more maintainable to take the spinlock. This also enables future cleanup work where the entity can be fully synchronized via its spinlock.
Add locking to drm_sched_entity_is_idle().
Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 0fc1213a0d3f..cb03d6a36578 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -178,14 +178,18 @@ EXPORT_SYMBOL(drm_sched_entity_modify_sched);
static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity) { - rmb(); /* for list_empty to work without lock */ + bool idle = false; + + spin_lock(&entity->lock);
if (list_empty(&entity->list) || spsc_queue_count(&entity->job_queue) == 0 || entity->stopped) - return true; + idle = true;
- return false; + spin_unlock(&entity->lock); + + return idle; }
/**
The completion entity->entity_idle only existed because the entity was not properly locked through it's spinlock. The completion served to inform waiters about whether the entity is actually idle, which is something locking (previously added to drm_sched_entity_is_idle()) can fully achieve.
Remove the surplus completion.
Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 9 --------- drivers/gpu/drm/scheduler/sched_main.c | 2 -- drivers/gpu/drm/scheduler/sched_rq.c | 1 - include/drm/gpu_scheduler.h | 7 ------- 4 files changed, 19 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index cb03d6a36578..23536dcfa96a 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -136,10 +136,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->rq = &sched_list[0]->rq; RB_CLEAR_NODE(&entity->rb_tree_node); - init_completion(&entity->entity_idle); - - /* We start in an idle state. */ - complete_all(&entity->entity_idle);
spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); @@ -285,12 +281,7 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity) spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->lock);
- /* Make sure this entity is not used by the scheduler at the moment */ - wait_for_completion(&entity->entity_idle); - - spin_lock(&entity->lock); prev = entity->last_scheduled; dma_fence_get(prev); spin_unlock(&entity->lock); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index d2ca01b31ee4..b90220794a14 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -997,7 +997,6 @@ static void drm_sched_run_job_work(struct work_struct *w)
sched_job = drm_sched_entity_pop_job(entity); if (!sched_job) { - complete_all(&entity->entity_idle); drm_sched_run_job_queue(sched); return; } @@ -1013,7 +1012,6 @@ static void drm_sched_run_job_work(struct work_struct *w) * refcount has been incremented for the scheduler already. */ fence = sched->ops->run_job(sched_job); - complete_all(&entity->entity_idle); drm_sched_fence_scheduled(s_fence, fence);
if (!IS_ERR_OR_NULL(fence)) { diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c index 97363f9ef8bc..54aba1ef0d7a 100644 --- a/drivers/gpu/drm/scheduler/sched_rq.c +++ b/drivers/gpu/drm/scheduler/sched_rq.c @@ -373,7 +373,6 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) return ERR_PTR(-ENOSPC); }
- reinit_completion(&entity->entity_idle); break; } } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 176ff1f936cd..55260cbe880a 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -221,13 +221,6 @@ struct drm_sched_entity { */ bool stopped;
- /** - * @entity_idle: - * - * Signals when entity is not in use, used to sequence entity cleanup in - * drm_sched_entity_fini(). - */ - struct completion entity_idle;
/** * @oldest_job_waiting:
On Wed, 2026-07-01 at 10:59 +0200, Philipp Stanner wrote:
The completion entity->entity_idle only existed because the entity was not properly locked through it's spinlock. The completion served to inform waiters about whether the entity is actually idle, which is something locking (previously added to drm_sched_entity_is_idle()) can fully achieve.
Remove the surplus completion.
[…]
- /* Make sure this entity is not used by the scheduler at the moment */
- wait_for_completion(&entity->entity_idle);
Alright, my bad, turns out I had a bit too much steam on the kettle and we cannot remove it because of the drm_sched_entity_flush() being able to perform an asynchronous kill while the scheduler work item is still running.
But I think we could probably put Tvrtko's flush_work() [1] here to get the same result.
Opinions?
P.
[1] https://lore.kernel.org/dri-devel/20260611123423.39819-1-tvrtko.ursulin@igal...
linaro-mm-sig@lists.linaro.org