On Thu, Nov 06, 2025 at 11:59:56AM +0100, Juri Lelli wrote:
Hi,
On 29/10/25 20:08, Andrea Righi wrote:
sched_ext currently suffers starvation due to RT. The same workload when converted to EXT can get zero runtime if RT is 100% running, causing EXT processes to stall. Fix it by adding a DL server for EXT.
A kselftest is also provided later to verify:
# ./runner -t rt_stall ===== START ===== TEST: rt_stall DESCRIPTION: Verify that RT tasks cannot stall SCHED_EXT tasks OUTPUT: # Runtime of EXT task (PID 23338) is 0.250000 seconds # Runtime of RT task (PID 23339) is 4.750000 seconds # EXT task got 5.00% of total runtime ok 1 PASS: EXT task got more than 4.00% of runtime ===== END =====
v3: - clarify that fair is not the only dl_server (Juri Lelli) - remove explicit stop to reduce timer reprogramming overhead (Juri Lelli) - do not restart pick_task() when it's invoked by the dl_server (Tejun Heo) - depend on CONFIG_SCHED_CLASS_EXT (Andrea Righi) v2: - drop ->balance() now that pick_task() has an rf argument (Andrea Righi)
Cc: Luigi De Matteis ldematteis123@gmail.com Co-developed-by: Joel Fernandes joelagnelf@nvidia.com Signed-off-by: Joel Fernandes joelagnelf@nvidia.com Signed-off-by: Andrea Righi arighi@nvidia.com
...
@@ -1409,6 +1412,15 @@ static void enqueue_task_scx(struct rq *rq, struct task_struct *p, int enq_flags if (enq_flags & SCX_ENQ_WAKEUP) touch_core_sched(rq, p);
- if (rq->scx.nr_running == 1) {
/* Account for idle runtime */if (!rq->nr_running)Hummm, didn't we just add_nr_running(rq, 1) before gettng here?
Oh, good catch, let me run some tests to see what happens here. :)
But looking at the code, it seems that we definitely need to move add_nr_running() after this part.
Thanks! -Andrea
dl_server_update_idle_time(rq, rq->curr, &rq->ext_server);/* Start dl_server if this is the first task being enqueued */dl_server_start(&rq->ext_server);- }
Thanks, Juri