Re: [PATCH] sched/deadline: Fix race in push_dl_task

12 Mar 2025

Hi Harshit,
Thanks for this!
On 07/03/25 20:42, Harshit Agarwal wrote:
...
This fix is the deadline version of the change made to the rt scheduler
here:
https://lore.kernel.org/lkml/20250225180553.167995-1-harshit@nutanix.com/
Please go through the original change for more details on the issue.
I don't think we want this kind of URLs in the changelog, as URL might
disappear while the history remains (at least usually a little longer
:). Maybe you could add a very condensed version of the description of
the problem you have on the other fix?
...
In this fix we bail out or retry in the push_dl_task, if the task is no
longer at the head of pushable tasks list because this list changed
while trying to lock the runqueue of the other CPU.
Signed-off-by: Harshit Agarwal harshit@nutanix.com
Cc: stable@vger.kernel.org

kernel/sched/deadline.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 38e4537790af..c5048969c640 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq)
 {
   struct task_struct *next_task;
   struct rq *later_rq;

struct task_struct *task;
int ret = 0;

next_task = pick_next_pushable_dl_task(rq);
@@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq)
 
   /* Will lock the rq it'll find */
   later_rq = find_lock_later_rq(next_task, rq);

if (!later_rq) {
struct task_struct *task;




task = pick_next_pushable_dl_task(rq);
if (later_rq && (!task || task != next_task)) {
/*


 * We must check all this again, since


 * find_lock_later_rq releases rq->lock and it is


 * then possible that next_task has migrated and


 * is no longer at the head of the pushable list.


 */


double_unlock_balance(rq, later_rq);


if (!task) {


	/* No more tasks */


	goto out;


}




put_task_struct(next_task);


next_task = task;


goto retry;



I fear we might hit a pathological condition that can lead us into a
never ending (or very long) loop. find_lock_later_rq() tries to find a
later_rq for at most DL_MAX_TRIES and it bails out if it can't.
Maybe to discern between find_lock_later_rq() callers we can use
dl_throttled flag in dl_se and still implement the fix in find_lock_
later_rq()? I.e., fix similar to the rt.c patch in case the task is not
throttled (so caller is push_dl_task()) and not rely on pick_next_
pushable_dl_task() if the task is throttled.
What do you think?
Thanks,
Juri

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] sched/deadline: Fix race in push_dl_task