6.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fernand Sieber sieberf@amazon.com
[ Upstream commit 79104becf42baeeb4a3f2b106f954b9fc7c10a3c ]
If a task yields, the scheduler may decide to pick it again. The task in turn may decide to yield immediately or shortly after, leading to a tight loop of yields.
If there's another runnable task as this point, the deadline will be increased by the slice at each loop. This can cause the deadline to runaway pretty quickly, and subsequent elevated run delays later on as the task doesn't get picked again. The reason the scheduler can pick the same task again and again despite its deadline increasing is because it may be the only eligible task at that point.
Fix this by making the task forfeiting its remaining vruntime and pushing the deadline one slice ahead. This implements yield behavior more authentically.
We limit the forfeiting to eligible tasks. This is because core scheduling prefers running ineligible tasks rather than force idling. As such, without the condition, we can end up on a yield loop which makes the vruntime increase rapidly, leading to anomalous run delays later down the line.
Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling policy") Signed-off-by: Fernand Sieber sieberf@amazon.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com Link: https://lore.kernel.org/r/20250916140228.452231-1-sieberf@amazon.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4770d25ae2406..35b98cad06515 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8919,7 +8919,19 @@ static void yield_task_fair(struct rq *rq) */ rq_clock_skip_update(rq);
- se->deadline += calc_delta_fair(se->slice, se); + /* + * Forfeit the remaining vruntime, only if the entity is eligible. This + * condition is necessary because in core scheduling we prefer to run + * ineligible tasks rather than force idling. If this happens we may + * end up in a loop where the core scheduler picks the yielding task, + * which yields immediately again; without the condition the vruntime + * ends up quickly running away. + */ + if (entity_eligible(cfs_rq, se)) { + se->vruntime = se->deadline; + se->deadline += calc_delta_fair(se->slice, se); + update_min_vruntime(cfs_rq); + } }
static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)