Re: [PATCH v3] sched/rt: Fix race in push_rt_task

2 Apr 2025

      On Tue, Feb 25, 2025 at 06:05:53PM +0000, Harshit Agarwal wrote:
...
Details
Let's look at the following scenario to understand this race.

CPU A enters push_rt_task

a) CPU A has chosen next_task = task p.
  b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).
  c) CPU A identifies CPU X as a destination CPU (X < Z).
  d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).
  e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has
     locked CPU X’s rq, and thus, CPU A must wait.

At CPU Z

a) Previous task has completed execution and thus, CPU Z enters
     schedule, locks its own rq after CPU A releases it.
  b) CPU Z dequeues previous task and begins executing task p.
  c) CPU Z unlocks its rq.
  d) Task p yields the CPU (ex. by doing IO or waiting to acquire a
     lock) which triggers the schedule function on CPU Z.
  e) CPU Z enters schedule again, locks its own rq, and dequeues task p.
  f) As part of dequeue, it sets p.on_rq = 0 and unlocks its rq.

At CPU B

a) CPU B enters try_to_wake_up with input task p.
  b) Since CPU Z dequeued task p, p.on_rq = 0, and CPU B updates
     B.state = WAKING.
  c) CPU B via select_task_rq determines CPU Y as the target CPU.

The race

a) CPU A acquires CPU X’s lock and relocks CPU Z.
  b) CPU A reads task p.cpu = Z and incorrectly concludes task p is
     still on CPU Z.
  c) CPU A failed to notice task p had been dequeued from CPU Z while
     CPU A was waiting for locks in double_lock_balance. If CPU A knew
     that task p had been dequeued, it would return NULL forcing
     push_rt_task to give up the task p's migration.
  d) CPU B updates task p.cpu = Y and calls ttwu_queue.
  e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task
     p.on_rq = 1.
  f) CPU B unlocks CPU Y, triggering memory synchronization.
  g) CPU A reads task p.on_rq = 1, cementing its assumption that task p
     has not migrated.
  h) CPU A decides to migrate p to CPU X.
This leads to A dequeuing p from Y's queue and various crashes down the
line.
Solution
The solution here is fairly simple. After obtaining the lock (at 4a),
the check is enhanced to make sure that the task is still at the head of
the pushable tasks list. If not, then it is anyway not suitable for
being pushed out.
Testing
The fix is tested on a cluster of 3 nodes, where the panics due to this
are hit every couple of days. A fix similar to this was deployed on such
cluster and was stable for more than 30 days.
Co-developed-by: Jon Kohler jon@nutanix.com
Signed-off-by: Jon Kohler jon@nutanix.com
Co-developed-by: Gauri Patwardhan gauri.patwardhan@nutanix.com
Signed-off-by: Gauri Patwardhan gauri.patwardhan@nutanix.com
Co-developed-by: Rahul Chunduru rahul.chunduru@nutanix.com
Signed-off-by: Rahul Chunduru rahul.chunduru@nutanix.com
Signed-off-by: Harshit Agarwal harshit@nutanix.com
Tested-by: Will Ton william.ton@nutanix.com
Reviewed-by: Steven Rostedt (Google) rostedt@goodmis.org
Cc: stable@vger.kernel.org

Thanks, I've picked this up to land after -rc1.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3] sched/rt: Fix race in push_rt_task

Details

Solution

Testing