[PATCH 0/5] drm/sched: Introduce the miracle of locking to entity

List overview All Threads
Download

newer

older

Venting Safely: A Look at the...

[PATCH v3 0/2] dma-buf: fix fd...

Philipp Stanner

1 Jul 2026 1 Jul '26

8:59 a.m.

Both Tvrtko [1] and I [2] have recently proposed some improvals for drm_sched.

While taking Tvrtko's feedback into account for my patch, I realized that both his and my patch can be fully replaced with a bigger and far more beautiful series.

If I am not mistaken, it turns out that the entire entity->entity_idle completion is also nothing but a workaround around the grave mistake of not using the greatest helper with parallel programming that exists in computer science: Locking.

This series adds locking to the last_scheduled field and all checks related to detect the idleness of the entity. As before, the job_scheduled event queue causes the periodic checks.

This way, we can get rid of memory barriers, RCU, a few lines of code, make things more readable, understandable...

Tested with drm-sched-unit tests. I'm a bit busy right now, but wanted to show you guys the idea. Before merging I'd test it more exhaustively with Nouveau.

Greetings, Philipp

[1] https://lore.kernel.org/dri-devel/20260611123423.39819-1-tvrtko.ursulin@igal... [2] https://lore.kernel.org/dri-devel/20260626081942.2122144-2-phasta@kernel.org...

Philipp Stanner (5): drm/sched: Protect entity->last_scheduled with spinlock drm/sched: Lock spsc_queue in drm_sched_entity_pop_job() drm/sched: Avoid lock cycle for sched_entity drm/sched: Lock drm_sched_entity_is_idle() drm/sched: Remove entity->entity_idle

drivers/gpu/drm/scheduler/sched_entity.c | 75 +++++++++++------------- drivers/gpu/drm/scheduler/sched_main.c | 2 - drivers/gpu/drm/scheduler/sched_rq.c | 5 +- include/drm/gpu_scheduler.h | 16 ++--- 4 files changed, 41 insertions(+), 57 deletions(-)

base-commit: be4f10d44757211fd656fa57f37034657f26c883

-- 2.54.0

Show replies by date

Philipp Stanner

1 Jul 1 Jul

8:59 a.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

The entity->last_scheduled field has always been set and read with special RCU functions in addition to memory barriers.

This was added in

commit 70102d77ff22 ("drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled")

however, no proper justification for that mechanism was provided. There seems to be no obvious reason, since the entity lock is available and taken at all places that evaluate the last_scheduled field. The only exception is drm_sched_entity_error(), which is not performance critical in any way.

Improve robustness, readability and maintainability by replacing RCU and barriers with the lock.

Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 50 ++++++++++-------------- include/drm/gpu_scheduler.h | 9 ++--- 2 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c51101ec70c1..91aec20611ad 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -135,7 +135,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->num_sched_list = num_sched_list; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->rq = &sched_list[0]->rq; - RCU_INIT_POINTER(entity->last_scheduled, NULL); RB_CLEAR_NODE(&entity->rb_tree_node); init_completion(&entity->entity_idle);

@@ -201,10 +200,10 @@ int drm_sched_entity_error(struct drm_sched_entity *entity) struct dma_fence *fence; int r;

- rcu_read_lock(); - fence = rcu_dereference(entity->last_scheduled); + spin_lock(&entity->lock); + fence = entity->last_scheduled; r = fence ? fence->error : 0; - rcu_read_unlock(); + spin_unlock(&entity->lock);

return r; } @@ -287,9 +286,10 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity) /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle);

- /* The entity is guaranteed to not be used by the scheduler */ - prev = rcu_dereference_check(entity->last_scheduled, true); + spin_lock(&entity->lock); + prev = entity->last_scheduled; dma_fence_get(prev); + spin_unlock(&entity->lock); while ((job = drm_sched_entity_queue_pop(entity))) { struct drm_sched_fence *s_fence = job->s_fence;

@@ -381,8 +381,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity) entity->dependency = NULL; }

- dma_fence_put(rcu_dereference_check(entity->last_scheduled, true)); - RCU_INIT_POINTER(entity->last_scheduled, NULL); + dma_fence_put(entity->last_scheduled); drm_sched_entity_stats_put(entity->stats); } EXPORT_SYMBOL(drm_sched_entity_fini); @@ -507,6 +506,10 @@ drm_sched_job_dependency(struct drm_sched_job *job,

struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) { + /* Helper to avoid dropping the reference while the entity lock is held, + * just to have some more robustness. + */ + struct dma_fence *prev_last_scheduled; struct drm_sched_job *sched_job;

sched_job = drm_sched_entity_queue_peek(entity); @@ -523,19 +526,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) if (entity->guilty && atomic_read(entity->guilty)) dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED);

- dma_fence_put(rcu_dereference_check(entity->last_scheduled, true)); - rcu_assign_pointer(entity->last_scheduled, - dma_fence_get(&sched_job->s_fence->finished)); - - /* - * If the queue is empty we allow drm_sched_entity_select_rq() to - * locklessly access ->last_scheduled. This only works if we set the - * pointer before we dequeue and if we a write barrier here. - */ - smp_wmb(); + spin_lock(&entity->lock); + prev_last_scheduled = entity->last_scheduled; + entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished); + spin_unlock(&entity->lock);

spsc_queue_pop(&entity->job_queue);

+ dma_fence_put(prev_last_scheduled); drm_sched_rq_pop_entity(entity);

/* Jobs and entities might have different lifecycles. Since we're @@ -561,21 +559,15 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (spsc_queue_count(&entity->job_queue)) return;

- /* - * Only when the queue is empty are we guaranteed that - * drm_sched_run_job_work() cannot change entity->last_scheduled. To - * enforce ordering we need a read barrier here. See - * drm_sched_entity_pop_job() for the other side. - */ - smp_rmb(); - - fence = rcu_dereference_check(entity->last_scheduled, true); + spin_lock(&entity->lock); + fence = entity->last_scheduled;

/* stay on the same engine if the previous job hasn't finished */ - if (fence && !dma_fence_is_signaled(fence)) + if (fence && !dma_fence_is_signaled(fence)) { + spin_unlock(&entity->lock); return; + }

- spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? &sched->rq : NULL; if (rq != entity->rq) { diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d61c19e78182..176ff1f936cd 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -100,7 +100,8 @@ struct drm_sched_entity { * @lock: * * Lock protecting the run-queue (@rq) to which this entity belongs, - * @priority and the list of schedulers (@sched_list, @num_sched_list). + * @priority, @last_scheduled and the list of schedulers (@sched_list, + * @num_sched_list). */ spinlock_t lock;

@@ -202,11 +203,9 @@ struct drm_sched_entity { /** * @last_scheduled: * - * Points to the finished fence of the last scheduled job. Only written - * by drm_sched_entity_pop_job(). Can be accessed locklessly from - * drm_sched_job_arm() if the queue is empty. + * Points to the finished fence of the last scheduled job. */ - struct dma_fence __rcu *last_scheduled; + struct dma_fence *last_scheduled;

/** * @last_user: last group leader pushing a job into the entity.

-- 2.54.0

Tvrtko Ursulin

3 Jul 3 Jul

11:27 a.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On 01/07/2026 09:59, Philipp Stanner wrote:

...

The entity->last_scheduled field has always been set and read with special RCU functions in addition to memory barriers.

This was added in

commit 70102d77ff22 ("drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled")

however, no proper justification for that mechanism was provided. There seems to be no obvious reason, since the entity lock is available and taken at all places that evaluate the last_scheduled field. The only exception is drm_sched_entity_error(), which is not performance critical in any way.

Improve robustness, readability and maintainability by replacing RCU and barriers with the lock.

First thing, and regardless of other strands of discussion, I think it should be squashed with 3/5 instead of that one undoing the introduction of lock-unlock-lock-unlock.

For what the main topic is concerned, I really like the removal of all the rcu_dereference_check(, true) lines and the memory barriers.

But I also think the commit message should explain better what code paths are now taking an extra lock - under which circumstances is the lock now taken for all scheduler users, and which amdgpu paths use drm_sched_entity_error() a lot so could be affected. I doubt it creates a measurable performance impact but it needs to be explained.

I am also happy to give it a spin on the Steam Deck to see if I can observe anything.

...

Signed-off-by: Philipp Stanner phasta@kernel.org

drivers/gpu/drm/scheduler/sched_entity.c | 50 ++++++++++-------------- include/drm/gpu_scheduler.h | 9 ++--- 2 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c51101ec70c1..91aec20611ad 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -135,7 +135,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->num_sched_list = num_sched_list; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->rq = &sched_list[0]->rq;

RCU_INIT_POINTER(entity->last_scheduled, NULL); RB_CLEAR_NODE(&entity->rb_tree_node); init_completion(&entity->entity_idle);

@@ -201,10 +200,10 @@ int drm_sched_entity_error(struct drm_sched_entity *entity) struct dma_fence *fence; int r;

rcu_read_lock();

fence = rcu_dereference(entity->last_scheduled);

spin_lock(&entity->lock);

fence = entity->last_scheduled; r = fence ? fence->error : 0;

rcu_read_unlock();

spin_unlock(&entity->lock);

return r; } @@ -287,9 +286,10 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity) /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle);

/* The entity is guaranteed to not be used by the scheduler */

prev = rcu_dereference_check(entity->last_scheduled, true);

spin_lock(&entity->lock);

prev = entity->last_scheduled; dma_fence_get(prev);

spin_unlock(&entity->lock); while ((job = drm_sched_entity_queue_pop(entity))) { struct drm_sched_fence *s_fence = job->s_fence;

@@ -381,8 +381,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity) entity->dependency = NULL; }

dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));

RCU_INIT_POINTER(entity->last_scheduled, NULL);

dma_fence_put(entity->last_scheduled); drm_sched_entity_stats_put(entity->stats); } EXPORT_SYMBOL(drm_sched_entity_fini);

@@ -507,6 +506,10 @@ drm_sched_job_dependency(struct drm_sched_job *job, struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) {
/* Helper to avoid dropping the reference while the entity lock is held,
* just to have some more robustness.
*/

I don't get this comment. Neither the placement or the content.

Regards,

Tvrtko

...

struct dma_fence *prev_last_scheduled; struct drm_sched_job *sched_job;

sched_job = drm_sched_entity_queue_peek(entity); @@ -523,19 +526,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) if (entity->guilty && atomic_read(entity->guilty)) dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED);
dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));

rcu_assign_pointer(entity->last_scheduled,
	   dma_fence_get(&sched_job->s_fence->finished));
/*
* If the queue is empty we allow drm_sched_entity_select_rq() to
* locklessly access ->last_scheduled. This only works if we set the
* pointer before we dequeue and if we a write barrier here.
*/
smp_wmb();
spin_lock(&entity->lock);

prev_last_scheduled = entity->last_scheduled;

entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished);

spin_unlock(&entity->lock);

spsc_queue_pop(&entity->job_queue);

dma_fence_put(prev_last_scheduled); drm_sched_rq_pop_entity(entity);

/* Jobs and entities might have different lifecycles. Since we're @@ -561,21 +559,15 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (spsc_queue_count(&entity->job_queue)) return;
/*
* Only when the queue is empty are we guaranteed that
* drm_sched_run_job_work() cannot change entity->last_scheduled. To
* enforce ordering we need a read barrier here. See
* drm_sched_entity_pop_job() for the other side.
*/
smp_rmb();

fence = rcu_dereference_check(entity->last_scheduled, true);
spin_lock(&entity->lock);

fence = entity->last_scheduled;

/* stay on the same engine if the previous job hasn't finished */

if (fence && !dma_fence_is_signaled(fence))
if (fence && !dma_fence_is_signaled(fence)) {
spin_unlock(&entity->lock);
return;
}
spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? &sched->rq : NULL; if (rq != entity->rq) {

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d61c19e78182..176ff1f936cd 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -100,7 +100,8 @@ struct drm_sched_entity { * @lock: * * Lock protecting the run-queue (@rq) to which this entity belongs,
* @priority and the list of schedulers (@sched_list, @num_sched_list).
* @priority, @last_scheduled and the list of schedulers (@sched_list,
* @num_sched_list).
*/ spinlock_t lock;
@@ -202,11 +203,9 @@ struct drm_sched_entity { /** * @last_scheduled: *
* Points to the finished fence of the last scheduled job. Only written
* by drm_sched_entity_pop_job(). Can be accessed locklessly from
* drm_sched_job_arm() if the queue is empty.
* Points to the finished fence of the last scheduled job.
*/
struct dma_fence __rcu *last_scheduled;

struct dma_fence *last_scheduled;

/** * @last_user: last group leader pushing a job into the entity.

Philipp Stanner

2:47 p.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On Fri, 2026-07-03 at 12:27 +0100, Tvrtko Ursulin wrote:

...

On 01/07/2026 09:59, Philipp Stanner wrote:

...
The entity->last_scheduled field has always been set and read with special RCU functions in addition to memory barriers.

This was added in

commit 70102d77ff22 ("drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled")

however, no proper justification for that mechanism was provided. There seems to be no obvious reason, since the entity lock is available and taken at all places that evaluate the last_scheduled field. The only exception is drm_sched_entity_error(), which is not performance critical in any way.

Improve robustness, readability and maintainability by replacing RCU and barriers with the lock.

First thing, and regardless of other strands of discussion, I think it should be squashed with 3/5 instead of that one undoing the introduction of lock-unlock-lock-unlock.

I agree that there should not be a do-undo pattern, but I don't want to squash that, it's quite a distinctive action. One patch adds locks, the other moves them.

But what I can do is move that patch before №1 here so that it becomes understandable as a preparational commit.

...

For what the main topic is concerned, I really like the removal of all the rcu_dereference_check(, true) lines and the memory barriers.

But I also think the commit message should explain better what code paths are now taking an extra lock - under which circumstances is the lock now taken for all scheduler users, and which amdgpu paths use drm_sched_entity_error() a lot so could be affected. I doubt it creates a measurable performance impact but it needs to be explained.

I think it can detail which functions will now be locked; but mentioning the users would be overkill and is uncommon for API reworks.

...

I am also happy to give it a spin on the Steam Deck to see if I can observe anything.

Could be interesting.

...

...
Signed-off-by: Philipp Stanner phasta@kernel.org

drivers/gpu/drm/scheduler/sched_entity.c | 50 ++++++++++-------------- include/drm/gpu_scheduler.h              | 9 ++--- 2 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c51101ec70c1..91aec20611ad 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -135,7 +135,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,    entity->num_sched_list = num_sched_list;    entity->sched_list = num_sched_list > 1 ? sched_list : NULL;    entity->rq = &sched_list[0]->rq;

RCU_INIT_POINTER(entity->last_scheduled, NULL);

RB_CLEAR_NODE(&entity->rb_tree_node);    init_completion(&entity->entity_idle); @@ -201,10 +200,10 @@ int drm_sched_entity_error(struct drm_sched_entity *entity)    struct dma_fence *fence;    int r;

rcu_read_lock();

fence = rcu_dereference(entity->last_scheduled);

spin_lock(&entity->lock);

fence = entity->last_scheduled;

r = fence ? fence->error : 0;

rcu_read_unlock();

spin_unlock(&entity->lock);

return r; } @@ -287,9 +286,10 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity)    /* Make sure this entity is not used by the scheduler at the moment */    wait_for_completion(&entity->entity_idle);

/* The entity is guaranteed to not be used by the scheduler */

prev = rcu_dereference_check(entity->last_scheduled, true);

spin_lock(&entity->lock);

prev = entity->last_scheduled;

dma_fence_get(prev);

spin_unlock(&entity->lock);

while ((job = drm_sched_entity_queue_pop(entity))) {    struct drm_sched_fence *s_fence = job->s_fence; @@ -381,8 +381,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)    entity->dependency = NULL;    }

dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));

RCU_INIT_POINTER(entity->last_scheduled, NULL);

dma_fence_put(entity->last_scheduled);

drm_sched_entity_stats_put(entity->stats); } EXPORT_SYMBOL(drm_sched_entity_fini); @@ -507,6 +506,10 @@ drm_sched_job_dependency(struct drm_sched_job *job, struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) {
/* Helper to avoid dropping the reference while the entity lock is held,
* just to have some more robustness.
*/
I don't get this comment. Neither the placement or the content.

It explains the purpose of the variable 'prev_last_scheduled', which exists so that a reference does not drop under lock protection.

Tvrtko Ursulin

6 Jul 6 Jul

8:46 a.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On 03/07/2026 15:47, Philipp Stanner wrote:

...

On Fri, 2026-07-03 at 12:27 +0100, Tvrtko Ursulin wrote:

...
On 01/07/2026 09:59, Philipp Stanner wrote:

...
The entity->last_scheduled field has always been set and read with special RCU functions in addition to memory barriers.

This was added in

commit 70102d77ff22 ("drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled")

however, no proper justification for that mechanism was provided. There seems to be no obvious reason, since the entity lock is available and taken at all places that evaluate the last_scheduled field. The only exception is drm_sched_entity_error(), which is not performance critical in any way.

Improve robustness, readability and maintainability by replacing RCU and barriers with the lock.

First thing, and regardless of other strands of discussion, I think it should be squashed with 3/5 instead of that one undoing the introduction of lock-unlock-lock-unlock.

I agree that there should not be a do-undo pattern, but I don't want to squash that, it's quite a distinctive action. One patch adds locks, the other moves them.

Hm, maybe it is a semantic discussion whether there is any real adding of the locks, when the effective end result is just widening of it's scope by pulling it out of the helper to caller. But okay, prep patch to move the lock out sounds like it could look acceptable.

...

But what I can do is move that patch before №1 here so that it becomes understandable as a preparational commit.

...
For what the main topic is concerned, I really like the removal of all the rcu_dereference_check(, true) lines and the memory barriers.

But I also think the commit message should explain better what code paths are now taking an extra lock - under which circumstances is the lock now taken for all scheduler users, and which amdgpu paths use drm_sched_entity_error() a lot so could be affected. I doubt it creates a measurable performance impact but it needs to be explained.

I think it can detail which functions will now be locked; but mentioning the users would be overkill and is uncommon for API reworks.

Here I disagree quite strongly. Given the patch is making strong claims that the lockless access was added for no obvious reason, and that we have now established the lockless helper is in fact used on the submission paths, it is really required that those strong claims are backed by a concrete analysis instead of just saying "not performance critical in any way".

...

...
I am also happy to give it a spin on the Steam Deck to see if I can observe anything.

Could be interesting.

Okay I'll try to do it in reasonable time. You can either respin or wait for it, I don't mind either way.

...

...
...
Signed-off-by: Philipp Stanner phasta@kernel.org

drivers/gpu/drm/scheduler/sched_entity.c | 50 ++++++++++-------------- include/drm/gpu_scheduler.h              | 9 ++--- 2 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c51101ec70c1..91aec20611ad 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -135,7 +135,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,    entity->num_sched_list = num_sched_list;    entity->sched_list = num_sched_list > 1 ? sched_list : NULL;    entity->rq = &sched_list[0]->rq;

RCU_INIT_POINTER(entity->last_scheduled, NULL);

RB_CLEAR_NODE(&entity->rb_tree_node);    init_completion(&entity->entity_idle); @@ -201,10 +200,10 @@ int drm_sched_entity_error(struct drm_sched_entity *entity)    struct dma_fence *fence;    int r;

rcu_read_lock();

fence = rcu_dereference(entity->last_scheduled);

spin_lock(&entity->lock);

fence = entity->last_scheduled;

r = fence ? fence->error : 0;

rcu_read_unlock();

spin_unlock(&entity->lock);

return r; } @@ -287,9 +286,10 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity)    /* Make sure this entity is not used by the scheduler at the moment */    wait_for_completion(&entity->entity_idle);

/* The entity is guaranteed to not be used by the scheduler */

prev = rcu_dereference_check(entity->last_scheduled, true);

spin_lock(&entity->lock);

prev = entity->last_scheduled;

dma_fence_get(prev);

spin_unlock(&entity->lock);

while ((job = drm_sched_entity_queue_pop(entity))) {    struct drm_sched_fence *s_fence = job->s_fence; @@ -381,8 +381,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)    entity->dependency = NULL;    }

dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));

RCU_INIT_POINTER(entity->last_scheduled, NULL);

dma_fence_put(entity->last_scheduled);

drm_sched_entity_stats_put(entity->stats); } EXPORT_SYMBOL(drm_sched_entity_fini); @@ -507,6 +506,10 @@ drm_sched_job_dependency(struct drm_sched_job *job, struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) {
/* Helper to avoid dropping the reference while the entity lock is held,
* just to have some more robustness.
*/
I don't get this comment. Neither the placement or the content.
It explains the purpose of the variable 'prev_last_scheduled', which exists so that a reference does not drop under lock protection.

Ah a helper _variable_, right, I was thrown off by the comment just below the function and did not even spot you added a new local.

Regards,

Tvrtko

...

P.

Philipp Stanner

9:42 a.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On Mon, 2026-07-06 at 09:45 +0100, Tvrtko Ursulin wrote:

...

On 03/07/2026 15:47, Philipp Stanner wrote:

...
I think it can detail which functions will now be locked; but mentioning the users would be overkill and is uncommon for API reworks.

Here I disagree quite strongly. Given the patch is making strong claims that the lockless access was added for no obvious reason, and that we have now established the lockless helper is in fact used on the submission paths, it is really required that those strong claims are backed by a concrete analysis instead of just saying "not performance critical in any way".

This is a strong case for the reversal of the burden of proof.

The entire code base of drm_sched has been designed on the computer science premise of locks being evil. That's why literally all synchronization primitives except for locks have been used where possible, including undefined behavior. The designers tried as hard as they could to avoid locks.

That is clearly proven by the fact that in all original data type definitions, the only components that were locked were always lists, since those are the structures where you really cannot avoid a lock in most cases.

The aversion to locking was so great that they designed spsc_queue, which uses at least as many as expensive instructions as a lock + list would have needed, and its correctness is not proven, nor are its behavior and rules neither documented or proven.

It's not up to the faction who wants to use correct locking and phase out UB to prove that the locklessness is bad, but to whomever added the locklessness to prove why it is good, i.e., necessary – which was not done here, neither in comments nor commit message. So the reasonable assumption is that it's simply a leftover from a flawed, broken design.

And the kernel-workflow is that things are always on-list for a while before being merged is that parties who do have concerns and who can point out problems have time to do so. Which is of course open to you: do you see a performance-regression problem with this patch, and if so, where?

Anyways: * Correct me if I'm wrong, but it would seem the only driver-usage which could see a *new* lock in its path is drm_sched_entity_error(), for which you yourself agree that it's irrelevant performance-wise. Should we still list the user's of that function? * The other relevant user path, drm_sched_job_arm() via drm_sched_entity_select_rq(), must already be called under a common driver lock for drm_sched_entity_push_job(), and _select_rq() already takes the entity lock. So any significant regression here is hyper unlikely. * The only other contender is the job pull path, which runs serially, by 1 work item at one point in time. * drm_sched_entity_kill() / _fini() are used in user context teardown path. Performance irrelevant.

I can offer to add the list above for the justification of why removing the half-undefined behavior is good.

Or what exactly would you want to see documented? "amdgpu uses drm_sched_job_arm() and now sees a lock-critical section longer by 3 instructions. etnaviv uses drm_sched_job_arm() and now…"?

Tvrtko Ursulin

10:59 a.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On 06/07/2026 10:42, Philipp Stanner wrote:

...

On Mon, 2026-07-06 at 09:45 +0100, Tvrtko Ursulin wrote:

...
On 03/07/2026 15:47, Philipp Stanner wrote:

...
I think it can detail which functions will now be locked; but mentioning the users would be overkill and is uncommon for API reworks.

Here I disagree quite strongly. Given the patch is making strong claims that the lockless access was added for no obvious reason, and that we have now established the lockless helper is in fact used on the submission paths, it is really required that those strong claims are backed by a concrete analysis instead of just saying "not performance critical in any way".

This is a strong case for the reversal of the burden of proof.

I did not want us to discourse into side philosophical arguments about how and why the code base got to where it is. The story at hand is much simpler and narrower so excuse me for skipping the majority of the below.

Put in a different wording, what I was trying to express is this: Lets not write a poor commit message because the original one was poor. Aka two wrongs do not make a right.

Also, I am not putting a burden of proof on you, in fact, I offered to test your series.

As to your closing question to what I suggest commit message needs to add, that is simple, and yes, the first bullet point you list is what I have already asked for in one of the previous replies. So say something along the lines of:

""" drm_sched_entity_error(), which is significantly used from various stages of the amdgpu job submit path, either directly or via amdgpu_vm_generation(), is changed from lockless to taking the entity->lock. As performance was not listed as a reason the lockless approach was chosen in the above referenced commit, although it is suspected that might have been the motivation, it is now thought that the new lock cycles to those paths will not add any measurable overhead.

For other drivers no new lock cycles are added to the submit path, given drm_sched_entity_select_rq() via the job arm path already bails out early due all drivers apart from AMD only passing a single scheduler list to the entity. """

Bonus point if you can spend the time to count how many extra lock-unlock cycles it is adding between it's CS submit, prepare job and run job entry points.

For me, that is not the burden of proof but some minimum standard of a commit message which shows some due diligence was done.

Regards,

Tvrtko

...

The entire code base of drm_sched has been designed on the computer science premise of locks being evil. That's why literally all synchronization primitives except for locks have been used where possible, including undefined behavior. The designers tried as hard as they could to avoid locks.

That is clearly proven by the fact that in all original data type definitions, the only components that were locked were always lists, since those are the structures where you really cannot avoid a lock in most cases.

The aversion to locking was so great that they designed spsc_queue, which uses at least as many as expensive instructions as a lock + list would have needed, and its correctness is not proven, nor are its behavior and rules neither documented or proven.

It's not up to the faction who wants to use correct locking and phase out UB to prove that the locklessness is bad, but to whomever added the locklessness to prove why it is good, i.e., necessary – which was not done here, neither in comments nor commit message. So the reasonable assumption is that it's simply a leftover from a flawed, broken design.

And the kernel-workflow is that things are always on-list for a while before being merged is that parties who do have concerns and who can point out problems have time to do so. Which is of course open to you: do you see a performance-regression problem with this patch, and if so, where?

Anyways:

Correct me if I'm wrong, but it would seem the only driver-usage which could see a *new* lock in its path is drm_sched_entity_error(), for which you yourself agree that it's irrelevant performance-wise. Should we still list the user's of that function?

The other relevant user path, drm_sched_job_arm() via drm_sched_entity_select_rq(), must already be called under a common driver lock for drm_sched_entity_push_job(), and _select_rq() already takes the entity lock. So any significant regression here is hyper unlikely.

The only other contender is the job pull path, which runs serially, by 1 work item at one point in time.

drm_sched_entity_kill() / _fini() are used in user context teardown path. Performance irrelevant.

I can offer to add the list above for the justification of why removing the half-undefined behavior is good.

Or what exactly would you want to see documented? "amdgpu uses drm_sched_job_arm() and now sees a lock-critical section longer by 3 instructions. etnaviv uses drm_sched_job_arm() and now…"?

P.

Tvrtko Ursulin

2:38 p.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On 06/07/2026 09:45, Tvrtko Ursulin wrote:

...

On 03/07/2026 15:47, Philipp Stanner wrote:

...
On Fri, 2026-07-03 at 12:27 +0100, Tvrtko Ursulin wrote:

8><

...

...
...
I am also happy to give it a spin on the Steam Deck to see if I can observe anything.

Could be interesting.

Okay I'll try to do it in reasonable time. You can either respin or wait for it, I don't mind either way.

On the topic of benchmarking, I gave it a quick spin against four unsync instances of vkgears. Point being seeing if something can be shown on more datacenter deployments with many cores submitting and large aggregate "fps".

x stock.fps + phasta.fps +----------------------------------------------------------------------+ | + | | + ++ | | + ++ | | + + ++ + x | | + + + ++ + x | | + +++ + +++ + x | | + +++ +++++ +x x | | + +++ +++++++x x * | | ++ +++ +++++++x *x*xx x | | ++ +++x+++++++xx*x**x x | | ++ +*+*+++*+++xx*x*** x xx | | x +++ +*+*+*+**+*x******xxxxx | | x xx**++ **+*+****+********xxxxx x x | | x +xx xx**++ +******************xxxxxxx x xx | | x xxx**x x**************************xx*xxx xxxxx | |xx xxxxx**x*****************************x**x**x*x*x x x x x x| | ||_______A___A_M|________| | +----------------------------------------------------------------------+ N Min Max Median Avg Stddev x 218 5446.984 5862.578 5656.121 5642.9429 76.667613 + 227 5510.432 5762.926 5620.999 5620.3585 45.407235 Difference at 95.0% confidence -22.5844 +/- 11.6534 -0.400224% +/- 0.206513% (Student's t, pooled s = 62.6985)

Numbers are average FPS per vkgears instance. Total run each is around 40 seconds.

More locking does appear to show a small decrease in throughput and, curiously, a tighter range between min and max. Whether or not that is telling us something about the lock cycles and inter core synchronisation I am not sure. Could be just noise and that more runs are needed. I can do that tomorrow.

Regards,

Tvrtko

Tvrtko Ursulin

7 Jul 7 Jul

9:22 a.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On 06/07/2026 15:37, Tvrtko Ursulin wrote:

...

On 06/07/2026 09:45, Tvrtko Ursulin wrote:

...
On 03/07/2026 15:47, Philipp Stanner wrote:

...
On Fri, 2026-07-03 at 12:27 +0100, Tvrtko Ursulin wrote:

8><

...
...
...
I am also happy to give it a spin on the Steam Deck to see if I can observe anything.

Could be interesting.

Okay I'll try to do it in reasonable time. You can either respin or wait for it, I don't mind either way.

On the topic of benchmarking, I gave it a quick spin against four unsync instances of vkgears. Point being seeing if something can be shown on more datacenter deployments with many cores submitting and large aggregate "fps".

x stock.fps

phasta.fps

+----------------------------------------------------------------------+ |                              +                                       | |                           + ++                                      | |                           + ++                                      | |                       +   + ++ + x                                 | |                       + + + ++ + x                                 | |                    + +++ + +++ + x                                 | |                    + +++ +++++ +x x                                 | |                    + +++ +++++++x x *                               | |                   ++ +++ +++++++x *x*xx    x                        | |                   ++ +++x+++++++xx*x**x    x                        | |                   ++ +*+*+++*+++xx*x*** x xx                        | |               x +++ +*+*+*+**+*x******xxxxx                        | |            x xx**++ **+*+****+********xxxxx x    x                 | |        x +xx xx**++ +******************xxxxxxx x xx                | |      x xxx**x x**************************xx*xxx xxxxx                | |xx    xxxxx**x*****************************x**x**x*x*x   x x x x    x| |                    ||_______A___A_M|________|                        | +----------------------------------------------------------------------+     N           Min           Max        Median           Avg Stddev x 218      5446.984      5862.578      5656.121     5642.9429     76.667613

227      5510.432      5762.926      5620.999     5620.3585     45.407235

Difference at 95.0% confidence     -22.5844 +/- 11.6534     -0.400224% +/- 0.206513%     (Student's t, pooled s = 62.6985)

Numbers are average FPS per vkgears instance. Total run each is around 40 seconds.

More locking does appear to show a small decrease in throughput and, curiously, a tighter range between min and max. Whether or not that is telling us something about the lock cycles and inter core synchronisation I am not sure. Could be just noise and that more runs are needed. I can do that tomorrow.

I think it's noise. Repeated much longer run, with double the clients, and this time round got this:

N Min Max Median Avg Stddev x 900 2483.751 2796.339 2620.184 2623.3684 49.850651 + 900 2496.926 2773.642 2633.114 2632.9949 48.848675 Difference at 95.0% confidence 9.6265 +/- 4.55991 0.366952% +/- 0.173819% (Student's t, pooled s = 49.3522)

So it flip-flopped compared to the last run with a similar relative difference.

Therefore, for what I am concerned, it is okay to go ahead with this simplification. Apart from the improved commit message I think Christian should still ack on behalf of AMD though.

Regards,

Tvrtko

Philipp Stanner

16 Jul 16 Jul

11:45 a.m.

New subject: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

On Tue, 2026-07-07 at 10:21 +0100, Tvrtko Ursulin wrote:

...

Therefore, for what I am concerned, it is okay to go ahead with this simplification.

It is not a "simplification", it is establishing of proper computer science standards, which trumps -2% fps, lock cycles and microbenchmarks in any case.

Philipp Stanner

1 Jul 1 Jul

8:59 a.m.

New subject: [PATCH 2/5] drm/sched: Lock spsc_queue in drm_sched_entity_pop_job()

Cleanup work in the preceding commit added locking to drm_sched_entity_pop_job(). This cleanup causes a slightly sub-optimal lock cycle with drm_sched_rq_pop_entity().

sched_entity also utilizes the lockless spsc_queue (partially already used simultaneously with locks), which was marked for removal in

commit 6e7eb171ac96 ("Documentation: drm: Add entry for removing spsc_queue to TODO list")

To remove the lock-cycle mentioned above, the unlock must be moved downwards, also locking the lockless queue.

Guard spsc_queue_pop() in drm_sched_entity_pop_job() with the lock and document why that is being done.

Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 91aec20611ad..5cf0af91faf2 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -529,9 +529,17 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) spin_lock(&entity->lock); prev_last_scheduled = entity->last_scheduled; entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished); - spin_unlock(&entity->lock);

+ /* Preceding cleanup work made it necessary to add the spinlock + * to this function. spsc_queue, a lockless queue, is now + * counterintuitively guarded by the lock as well. spsc_queue is queued + * for removal (see DRM TODO list), so this somewhat serves as a + * preparational step. + * + * TODO: Replace spsc_queue completely with a locked (h)list. + */ spsc_queue_pop(&entity->job_queue); + spin_unlock(&entity->lock);

dma_fence_put(prev_last_scheduled); drm_sched_rq_pop_entity(entity);

-- 2.54.0

Philipp Stanner

8:59 a.m.

New subject: [PATCH 3/5] drm/sched: Avoid lock cycle for sched_entity

Previous cleanup commits created a slightly sub-optimal lock-cycle between the two functions drm_sched_entity_pop_job() and drm_sched_rq_pop_entity().

Avoid the lock-cycle by moving the locking from drm_sched_rq_pop_entity() to drm_sched_entity_pop_job(). Add the appropriate lockdep check.

Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 2 +- drivers/gpu/drm/scheduler/sched_rq.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 5cf0af91faf2..0fc1213a0d3f 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -539,10 +539,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) * TODO: Replace spsc_queue completely with a locked (h)list. */ spsc_queue_pop(&entity->job_queue); + drm_sched_rq_pop_entity(entity); spin_unlock(&entity->lock);

dma_fence_put(prev_last_scheduled); - drm_sched_rq_pop_entity(entity);

/* Jobs and entities might have different lifecycles. Since we're * removing the job from the entities queue, set the jobs entity pointer diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c index 044546bcb5f8..97363f9ef8bc 100644 --- a/drivers/gpu/drm/scheduler/sched_rq.c +++ b/drivers/gpu/drm/scheduler/sched_rq.c @@ -319,11 +319,12 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity) struct drm_sched_job *next_job; struct drm_sched_rq *rq;

+ lockdep_assert_held(&entity->lock); + /* * Update the entity's location in the min heap according to * the timestamp of the next job, if any. */ - spin_lock(&entity->lock); rq = entity->rq; spin_lock(&rq->lock); next_job = drm_sched_entity_queue_peek(entity); @@ -340,7 +341,6 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity) drm_sched_entity_save_vruntime(entity, min_vruntime); } spin_unlock(&rq->lock); - spin_unlock(&entity->lock); }

/**

-- 2.54.0

Philipp Stanner

8:59 a.m.

New subject: [PATCH 4/5] drm/sched: Lock drm_sched_entity_is_idle()

drm_sched_entity_is_idle() contains a badly documented memory barrier and an invalid lockless access to entity->stopped.

This function is in no way performance critical, so it is safer, more readable and more maintainable to take the spinlock. This also enables future cleanup work where the entity can be fully synchronized via its spinlock.

Add locking to drm_sched_entity_is_idle().

Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 0fc1213a0d3f..cb03d6a36578 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -178,14 +178,18 @@ EXPORT_SYMBOL(drm_sched_entity_modify_sched);

static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity) { - rmb(); /* for list_empty to work without lock */ + bool idle = false; + + spin_lock(&entity->lock);

if (list_empty(&entity->list) || spsc_queue_count(&entity->job_queue) == 0 || entity->stopped) - return true; + idle = true;

- return false; + spin_unlock(&entity->lock); + + return idle; }

/**

-- 2.54.0

Tvrtko Ursulin

9:47 a.m.

New subject: [PATCH 4/5] drm/sched: Lock drm_sched_entity_is_idle()

On 01/07/2026 09:59, Philipp Stanner wrote:

...

drm_sched_entity_is_idle() contains a badly documented memory barrier and an invalid lockless access to entity->stopped.

This function is in no way performance critical, so it is safer, more readable and more maintainable to take the spinlock. This also enables future cleanup work where the entity can be fully synchronized via its spinlock.

Add locking to drm_sched_entity_is_idle().

Signed-off-by: Philipp Stanner phasta@kernel.org

drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 0fc1213a0d3f..cb03d6a36578 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -178,14 +178,18 @@ EXPORT_SYMBOL(drm_sched_entity_modify_sched); static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity) {

rmb(); /* for list_empty to work without lock */

bool idle = false;

spin_lock(&entity->lock);

if (list_empty(&entity->list) || spsc_queue_count(&entity->job_queue) == 0 || entity->stopped)
return true;
idle = true;
return false;

spin_unlock(&entity->lock);

return idle; }

/**

I think this is fine and indeed not performance critical in any way so:

Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com

Regards,

Tvrtko

Philipp Stanner

8:59 a.m.

New subject: [PATCH 5/5] drm/sched: Remove entity->entity_idle

The completion entity->entity_idle only existed because the entity was not properly locked through it's spinlock. The completion served to inform waiters about whether the entity is actually idle, which is something locking (previously added to drm_sched_entity_is_idle()) can fully achieve.

Remove the surplus completion.

Signed-off-by: Philipp Stanner phasta@kernel.org --- drivers/gpu/drm/scheduler/sched_entity.c | 9 --------- drivers/gpu/drm/scheduler/sched_main.c | 2 -- drivers/gpu/drm/scheduler/sched_rq.c | 1 - include/drm/gpu_scheduler.h | 7 ------- 4 files changed, 19 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index cb03d6a36578..23536dcfa96a 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -136,10 +136,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->rq = &sched_list[0]->rq; RB_CLEAR_NODE(&entity->rb_tree_node); - init_completion(&entity->entity_idle); - - /* We start in an idle state. */ - complete_all(&entity->entity_idle);

spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); @@ -285,12 +281,7 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity) spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->lock);

- /* Make sure this entity is not used by the scheduler at the moment */ - wait_for_completion(&entity->entity_idle); - - spin_lock(&entity->lock); prev = entity->last_scheduled; dma_fence_get(prev); spin_unlock(&entity->lock); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index d2ca01b31ee4..b90220794a14 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -997,7 +997,6 @@ static void drm_sched_run_job_work(struct work_struct *w)

sched_job = drm_sched_entity_pop_job(entity); if (!sched_job) { - complete_all(&entity->entity_idle); drm_sched_run_job_queue(sched); return; } @@ -1013,7 +1012,6 @@ static void drm_sched_run_job_work(struct work_struct *w) * refcount has been incremented for the scheduler already. */ fence = sched->ops->run_job(sched_job); - complete_all(&entity->entity_idle); drm_sched_fence_scheduled(s_fence, fence);

if (!IS_ERR_OR_NULL(fence)) { diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c index 97363f9ef8bc..54aba1ef0d7a 100644 --- a/drivers/gpu/drm/scheduler/sched_rq.c +++ b/drivers/gpu/drm/scheduler/sched_rq.c @@ -373,7 +373,6 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) return ERR_PTR(-ENOSPC); }

- reinit_completion(&entity->entity_idle); break; } } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 176ff1f936cd..55260cbe880a 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -221,13 +221,6 @@ struct drm_sched_entity { */ bool stopped;

- /** - * @entity_idle: - * - * Signals when entity is not in use, used to sequence entity cleanup in - * drm_sched_entity_fini(). - */ - struct completion entity_idle;

/** * @oldest_job_waiting:

-- 2.54.0

Philipp Stanner

9:38 a.m.

New subject: [PATCH 5/5] drm/sched: Remove entity->entity_idle

On Wed, 2026-07-01 at 10:59 +0200, Philipp Stanner wrote:

...

The completion entity->entity_idle only existed because the entity was not properly locked through it's spinlock. The completion served to inform waiters about whether the entity is actually idle, which is something locking (previously added to drm_sched_entity_is_idle()) can fully achieve.

Remove the surplus completion.

[…]

...

/* Make sure this entity is not used by the scheduler at the moment */

wait_for_completion(&entity->entity_idle);

Alright, my bad, turns out I had a bit too much steam on the kettle and we cannot remove it because of the drm_sched_entity_flush() being able to perform an asynchronous kill while the scheduler work item is still running.

But I think we could probably put Tvrtko's flush_work() [1] here to get the same result.

Opinions?

[1] https://lore.kernel.org/dri-devel/20260611123423.39819-1-tvrtko.ursulin@igal...

days inactive

days old

linaro-mm-sig@lists.linaro.org

15 comments

participants

tags (0)

participants (4)

Philipp Stanner
Philipp Stanner
Tvrtko Ursulin
Tvrtko Ursulin