On Wed, Nov 05, 2025 at 03:20:50PM +0100, Juri Lelli wrote:
On 05/11/25 14:59, Peter Zijlstra wrote:
On Wed, Nov 05, 2025 at 02:47:39PM +0100, Andrea Righi wrote:
On Wed, Oct 29, 2025 at 08:08:37PM +0100, Andrea Righi wrote:
sched_ext tasks can be starved by long-running RT tasks, especially since RT throttling was replaced by deadline servers to boost only SCHED_NORMAL tasks.
Several users in the community have reported issues with RT stalling sched_ext tasks. This is fairly common on distributions or environments where applications like video compositors, audio services, etc. run as RT tasks by default.
Example trace (showing a per-CPU kthread stalled due to the sway Wayland compositor running as an RT task):
runnable task stall (kworker/0:0[106377] failed to run for 5.043s) ... CPU 0 : nr_run=3 flags=0xd cpu_rel=0 ops_qseq=20646200 pnt_seq=45388738 curr=sway[994] class=rt_sched_class R kworker/0:0[106377] -5043ms scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=0/0 sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=0 slice=20000000 cpus=01
This is often perceived as a bug in the BPF schedulers, but in reality schedulers can't do much: RT tasks run outside their control and can potentially consume 100% of the CPU bandwidth.
Fix this by adding a sched_ext deadline server, so that sched_ext tasks are also boosted and do not suffer starvation.
Two kselftests are also provided to verify the starvation fixes and bandwidth allocation is correct.
Peter, Juri, this has now been tested quite extensively on our side and we're considering applying these patches to Tejun's sched_ext branch.
Do you have any objections or concerns?
Yeah, I want to finish this first:
https://lkml.kernel.org/r/20251101000057.GA2184199@noisy.programming.kicks-a...
Because as is, the whole dl_server stuff isn't quite right.
And I'm spending time on "[PATCH 04/11] sched/deadline: Add support to initialize and remove dl_server bandwidth" which I am still not 100% sure is correct (or that is correct how we handle setting runtime to 0 for fair_server today). Apologies, had some travelling and pto, but should be able to write something more about it in the next few days.
No problem and no rush, I just wanted to follow up to make sure I didn't miss anything. :)
Thank you both, –Andrea