Hi Harshit,
On 25/02/25 18:05, Harshit Agarwal wrote:
Overview
When a CPU chooses to call push_rt_task and picks a task to push to another CPU's runqueue then it will call find_lock_lowest_rq method which would take a double lock on both CPUs' runqueues. If one of the locks aren't readily available, it may lead to dropping the current runqueue lock and reacquiring both the locks at once. During this window it is possible that the task is already migrated and is running on some other CPU. These cases are already handled. However, if the task is migrated and has already been executed and another CPU is now trying to wake it up (ttwu) such that it is queued again on the runqeue (on_rq is 1) and also if the task was run by the same CPU, then the current checks will pass even though the task was migrated out and is no longer in the pushable tasks list.
...
kernel/sched/rt.c | 54 +++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 28 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 4b8e33c615b1..4762dd3f50c5 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1885,6 +1885,27 @@ static int find_lowest_rq(struct task_struct *task) return -1; } +static struct task_struct *pick_next_pushable_task(struct rq *rq) +{
- struct task_struct *p;
- if (!has_pushable_tasks(rq))
return NULL;
- p = plist_first_entry(&rq->rt.pushable_tasks,
struct task_struct, pushable_tasks);
- BUG_ON(rq->cpu != task_cpu(p));
- BUG_ON(task_current(rq, p));
- BUG_ON(task_current_donor(rq, p));
- BUG_ON(p->nr_cpus_allowed <= 1);
- BUG_ON(!task_on_rq_queued(p));
- BUG_ON(!rt_task(p));
- return p;
+}
/* Will lock the rq it finds */ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq) { @@ -1915,18 +1936,16 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq) /* * We had to unlock the run queue. In * the mean time, task could have
* migrated already or had its affinity changed.
* Also make sure that it wasn't scheduled on its rq.
* migrated already or had its affinity changed,
* therefore check if the task is still at the
* head of the pushable tasks list. * It is possible the task was scheduled, set * "migrate_disabled" and then got preempted, so we must * check the task migration disable flag here too. */
if (unlikely(task_rq(task) != rq ||
if (unlikely(is_migration_disabled(task) || !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
task_on_cpu(rq, task) ||
!rt_task(task) ||
is_migration_disabled(task) ||
!task_on_rq_queued(task))) {
task != pick_next_pushable_task(rq))) {
double_unlock_balance(rq, lowest_rq); lowest_rq = NULL; @@ -1946,27 +1965,6 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq) return lowest_rq; } -static struct task_struct *pick_next_pushable_task(struct rq *rq) -{
- struct task_struct *p;
- if (!has_pushable_tasks(rq))
return NULL;
- p = plist_first_entry(&rq->rt.pushable_tasks,
struct task_struct, pushable_tasks);
- BUG_ON(rq->cpu != task_cpu(p));
- BUG_ON(task_current(rq, p));
- BUG_ON(task_current_donor(rq, p));
- BUG_ON(p->nr_cpus_allowed <= 1);
- BUG_ON(!task_on_rq_queued(p));
- BUG_ON(!rt_task(p));
- return p;
-}
As usual, we have essentially the same in deadline.c, do you think we should/could implement the same fix proactively in there as well? Steve?
Thanks, Juri