This fix is the deadline version of the change made to the rt scheduler here: https://lore.kernel.org/lkml/20250225180553.167995-1-harshit@nutanix.com/ Please go through the original change for more details on the issue.
In this fix we bail out or retry in the push_dl_task, if the task is no longer at the head of pushable tasks list because this list changed while trying to lock the runqueue of the other CPU.
Signed-off-by: Harshit Agarwal harshit@nutanix.com Cc: stable@vger.kernel.org --- kernel/sched/deadline.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 38e4537790af..c5048969c640 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq) { struct task_struct *next_task; struct rq *later_rq; + struct task_struct *task; int ret = 0;
next_task = pick_next_pushable_dl_task(rq); @@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq)
/* Will lock the rq it'll find */ later_rq = find_lock_later_rq(next_task, rq); - if (!later_rq) { - struct task_struct *task; + task = pick_next_pushable_dl_task(rq); + if (later_rq && (!task || task != next_task)) { + /* + * We must check all this again, since + * find_lock_later_rq releases rq->lock and it is + * then possible that next_task has migrated and + * is no longer at the head of the pushable list. + */ + double_unlock_balance(rq, later_rq); + if (!task) { + /* No more tasks */ + goto out; + }
+ put_task_struct(next_task); + next_task = task; + goto retry; + } + if (!later_rq) { /* * We must check all this again, since * find_lock_later_rq releases rq->lock and it is * then possible that next_task has migrated. */ - task = pick_next_pushable_dl_task(rq); if (task == next_task) { /* * The task is still there. We don't try @@ -2751,9 +2767,10 @@ static int push_dl_task(struct rq *rq) goto out; }
- if (!task) + if (!task) { /* No more tasks */ goto out; + }
put_task_struct(next_task); next_task = task;
Hi Harshit,
Thanks for this!
On 07/03/25 20:42, Harshit Agarwal wrote:
This fix is the deadline version of the change made to the rt scheduler here: https://lore.kernel.org/lkml/20250225180553.167995-1-harshit@nutanix.com/ Please go through the original change for more details on the issue.
I don't think we want this kind of URLs in the changelog, as URL might disappear while the history remains (at least usually a little longer :). Maybe you could add a very condensed version of the description of the problem you have on the other fix?
In this fix we bail out or retry in the push_dl_task, if the task is no longer at the head of pushable tasks list because this list changed while trying to lock the runqueue of the other CPU.
Signed-off-by: Harshit Agarwal harshit@nutanix.com Cc: stable@vger.kernel.org
kernel/sched/deadline.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 38e4537790af..c5048969c640 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq) { struct task_struct *next_task; struct rq *later_rq;
- struct task_struct *task; int ret = 0;
next_task = pick_next_pushable_dl_task(rq); @@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq) /* Will lock the rq it'll find */ later_rq = find_lock_later_rq(next_task, rq);
- if (!later_rq) {
struct task_struct *task;
- task = pick_next_pushable_dl_task(rq);
- if (later_rq && (!task || task != next_task)) {
/*
* We must check all this again, since
* find_lock_later_rq releases rq->lock and it is
* then possible that next_task has migrated and
* is no longer at the head of the pushable list.
*/
double_unlock_balance(rq, later_rq);
if (!task) {
/* No more tasks */
goto out;
}
put_task_struct(next_task);
next_task = task;
goto retry;
I fear we might hit a pathological condition that can lead us into a never ending (or very long) loop. find_lock_later_rq() tries to find a later_rq for at most DL_MAX_TRIES and it bails out if it can't.
Maybe to discern between find_lock_later_rq() callers we can use dl_throttled flag in dl_se and still implement the fix in find_lock_ later_rq()? I.e., fix similar to the rt.c patch in case the task is not throttled (so caller is push_dl_task()) and not rely on pick_next_ pushable_dl_task() if the task is throttled.
What do you think?
Thanks, Juri
Thanks Juri, for taking a look.
On Mar 12, 2025, at 2:42 AM, Juri Lelli juri.lelli@redhat.com wrote:
Hi Harshit,
Thanks for this!
I don't think we want this kind of URLs in the changelog, as URL might disappear while the history remains (at least usually a little longer :). Maybe you could add a very condensed version of the description of the problem you have on the other fix?
Sorry about this and thanks for pointing it out. I will fix it in the next version of the patch.
In this fix we bail out or retry in the push_dl_task, if the task is no longer at the head of pushable tasks list because this list changed while trying to lock the runqueue of the other CPU.
Signed-off-by: Harshit Agarwal harshit@nutanix.com Cc: stable@vger.kernel.org
kernel/sched/deadline.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 38e4537790af..c5048969c640 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq) { struct task_struct *next_task; struct rq *later_rq;
- struct task_struct *task;
int ret = 0;
next_task = pick_next_pushable_dl_task(rq); @@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq)
/* Will lock the rq it'll find */ later_rq = find_lock_later_rq(next_task, rq);
- if (!later_rq) {
- struct task_struct *task;
task = pick_next_pushable_dl_task(rq);
if (later_rq && (!task || task != next_task)) {
/*
- We must check all this again, since
- find_lock_later_rq releases rq->lock and it is
- then possible that next_task has migrated and
- is no longer at the head of the pushable list.
*/
double_unlock_balance(rq, later_rq);
if (!task) {
/* No more tasks */
goto out;
}
put_task_struct(next_task);
next_task = task;
goto retry;
I fear we might hit a pathological condition that can lead us into a never ending (or very long) loop. find_lock_later_rq() tries to find a later_rq for at most DL_MAX_TRIES and it bails out if it can't.
This pathological case exists today as well and will be there even if we move this check inside find_lock_later_rq. This check is just broadening the scenarios where we would retry, where we would have panicked otherwise (the bug). If this check is moved inside find_lock_later_rq then function will return null and then the caller here will do the same which is retry or bail out if no tasks are available. Specifically, tt will execute the if (!later_rq) block here. The number of retries will be bound by the number of tasks in the pushable tasks list.
Maybe to discern between find_lock_later_rq() callers we can use dl_throttled flag in dl_se and still implement the fix in find_lock_ later_rq()? I.e., fix similar to the rt.c patch in case the task is not throttled (so caller is push_dl_task()) and not rely on pick_next_ pushable_dl_task() if the task is throttled.
Sure I can do this as well but like I mentioned above I don’t think it will be any different than this patch unless we want to handle the race for offline migration case or if you prefer this in find_lock_later_rq just to keep it more inline with the rt patch. I just found the current approach to be less risky :)
Let me know your thoughts.
Regards, Harshit
Hi,
On 12/03/25 18:46, Harshit Agarwal wrote:
Thanks Juri, for taking a look.
Of course! Thanks to you for working on this.
On Mar 12, 2025, at 2:42 AM, Juri Lelli juri.lelli@redhat.com wrote:
Hi Harshit,
Thanks for this!
I don't think we want this kind of URLs in the changelog, as URL might disappear while the history remains (at least usually a little longer :). Maybe you could add a very condensed version of the description of the problem you have on the other fix?
Sorry about this and thanks for pointing it out. I will fix it in the next version of the patch.
No worries and thanks.
In this fix we bail out or retry in the push_dl_task, if the task is no longer at the head of pushable tasks list because this list changed while trying to lock the runqueue of the other CPU.
Signed-off-by: Harshit Agarwal harshit@nutanix.com Cc: stable@vger.kernel.org
kernel/sched/deadline.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 38e4537790af..c5048969c640 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq) { struct task_struct *next_task; struct rq *later_rq;
- struct task_struct *task;
int ret = 0;
next_task = pick_next_pushable_dl_task(rq); @@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq)
/* Will lock the rq it'll find */ later_rq = find_lock_later_rq(next_task, rq);
- if (!later_rq) {
- struct task_struct *task;
task = pick_next_pushable_dl_task(rq);
if (later_rq && (!task || task != next_task)) {
/*
- We must check all this again, since
- find_lock_later_rq releases rq->lock and it is
- then possible that next_task has migrated and
- is no longer at the head of the pushable list.
*/
double_unlock_balance(rq, later_rq);
if (!task) {
/* No more tasks */
goto out;
}
put_task_struct(next_task);
next_task = task;
goto retry;
I fear we might hit a pathological condition that can lead us into a never ending (or very long) loop. find_lock_later_rq() tries to find a later_rq for at most DL_MAX_TRIES and it bails out if it can't.
This pathological case exists today as well and will be there even if we move this check inside find_lock_later_rq. This check is just broadening the scenarios where we would retry, where we would have panicked otherwise (the bug). If this check is moved inside find_lock_later_rq then function will return null and then the caller here will do the same which is retry or bail out if no tasks are available. Specifically, tt will execute the if (!later_rq) block here. The number of retries will be bound by the number of tasks in the pushable tasks list.
Maybe to discern between find_lock_later_rq() callers we can use dl_throttled flag in dl_se and still implement the fix in find_lock_ later_rq()? I.e., fix similar to the rt.c patch in case the task is not throttled (so caller is push_dl_task()) and not rely on pick_next_ pushable_dl_task() if the task is throttled.
Sure I can do this as well but like I mentioned above I don’t think it will be any different than this patch unless we want to handle the race for offline migration case or if you prefer this in find_lock_later_rq just to keep it more inline with the rt patch. I just found the current approach to be less risky :)
What you mean with "handle the race for offline migration case"?
And I am honestly conflicted. I think I like the encapsulation better if we can find a solution inside find_lock_later_rq(), as it also aligns better with rt.c, but you fear it's more fragile?
Best, Juri
Maybe to discern between find_lock_later_rq() callers we can use dl_throttled flag in dl_se and still implement the fix in find_lock_ later_rq()? I.e., fix similar to the rt.c patch in case the task is not throttled (so caller is push_dl_task()) and not rely on pick_next_ pushable_dl_task() if the task is throttled.
Sure I can do this as well but like I mentioned above I don’t think it will be any different than this patch unless we want to handle the race for offline migration case or if you prefer this in find_lock_later_rq just to keep it more inline with the rt patch. I just found the current approach to be less risky :)
What you mean with "handle the race for offline migration case"?
By offline migration I meant dl_task_offline_migration path which calls find_lock_later_rq. So unless we think the same race that this fix is trying to address for push_dl_task can happen for dl_task_offline_migration, there is one less reason to encapsulate this in find_lock_later_rq.
And I am honestly conflicted. I think I like the encapsulation better if we can find a solution inside find_lock_later_rq(), as it also aligns better with rt.c, but you fear it's more fragile?
Yes I agree that encapsulation in find_lock_later_rq will be ideal but by keeping it limited to push_dl_task I wanted to keep the change more targeted to avoid any possible side effect on dl_task_offline_migration call path.
Let’s say if we go ahead with making the change in find_lock_later_rq itself then we will have to fallback to current checks for throttled case and for remaining we will use the task != pick_next_pushable_dl_task(rq) check. Below is the diff of how it will be:
/* Retry if something changed. */ if (double_lock_balance(rq, later_rq)) { - if (unlikely(task_rq(task) != rq || + if (unlikely(is_migration_disabled(task) || !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) || - task_on_cpu(rq, task) || - !dl_task(task) || - is_migration_disabled(task) || - !task_on_rq_queued(task))) { + (task->dl.dl_throttled && + (task_rq(task) != rq || + task_on_cpu(rq, task) || + !dl_task(task) + !task_on_rq_queued(task))) || + (!task->dl.dl_throttled && + task != pick_next_pushable_dl_task(rq)))) { double_unlock_balance(rq, later_rq); later_rq = NULL; break;
Let me know your thoughts and I can send v2 patch accordingly.
Thanks, Harshit
On 13/03/25 19:38, Harshit Agarwal wrote:
Maybe to discern between find_lock_later_rq() callers we can use dl_throttled flag in dl_se and still implement the fix in find_lock_ later_rq()? I.e., fix similar to the rt.c patch in case the task is not throttled (so caller is push_dl_task()) and not rely on pick_next_ pushable_dl_task() if the task is throttled.
Sure I can do this as well but like I mentioned above I don’t think it will be any different than this patch unless we want to handle the race for offline migration case or if you prefer this in find_lock_later_rq just to keep it more inline with the rt patch. I just found the current approach to be less risky :)
What you mean with "handle the race for offline migration case"?
By offline migration I meant dl_task_offline_migration path which calls find_lock_later_rq. So unless we think the same race that this fix is trying to address for push_dl_task can happen for dl_task_offline_migration, there is one less reason to encapsulate this in find_lock_later_rq.
And I am honestly conflicted. I think I like the encapsulation better if we can find a solution inside find_lock_later_rq(), as it also aligns better with rt.c, but you fear it's more fragile?
Yes I agree that encapsulation in find_lock_later_rq will be ideal but by keeping it limited to push_dl_task I wanted to keep the change more targeted to avoid any possible side effect on dl_task_offline_migration call path.
Let’s say if we go ahead with making the change in find_lock_later_rq itself then we will have to fallback to current checks for throttled case and for remaining we will use the task != pick_next_pushable_dl_task(rq) check. Below is the diff of how it will be:
/* Retry if something changed. */ if (double_lock_balance(rq, later_rq)) {
if (unlikely(task_rq(task) != rq ||
if (unlikely(is_migration_disabled(task) || !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) ||
task_on_cpu(rq, task) ||
!dl_task(task) ||
is_migration_disabled(task) ||
!task_on_rq_queued(task))) {
(task->dl.dl_throttled &&
(task_rq(task) != rq ||
task_on_cpu(rq, task) ||
!dl_task(task)
!task_on_rq_queued(task))) ||
(!task->dl.dl_throttled &&
task != pick_next_pushable_dl_task(rq)))) { double_unlock_balance(rq, later_rq); later_rq = NULL; break;
Let me know your thoughts and I can send v2 patch accordingly.
So, it looks definitely more complicated (and fragile?), but I think I still like it better. Maybe you could add a comment in the code documenting the two different paths and the associated checks, so that we don't forget. :)
What do others think?
Thanks! Juri
So, it looks definitely more complicated (and fragile?), but I think I still like it better. Maybe you could add a comment in the code documenting the two different paths and the associated checks, so that we don't forget. :)
Sent the v2 with this approach: https://lore.kernel.org/lkml/20250317022325.52791-1-harshit@nutanix.com/T/#u
Please take a look.
Thanks, Harshit
linux-stable-mirror@lists.linaro.org