On Wed, Oct 15, 2025 at 06:03:59AM +0000, K Prateek Nayak wrote:
Dequeuing a fair task on a throttled hierarchy returns early on encountering a throttled cfs_rq since the throttle path has already dequeued the hierarchy above and has adjusted the h_nr_* accounting till the root cfs_rq.
dequeue_entities() crucially misses calling __block_task() for delayed tasks being dequeued on the throttled hierarchies, but this was mostly harmless until commit b7ca5743a260 ("sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks") since all existing cases would re-enqueue the task if task_on_rq_queued() returned true and the task would eventually be blocked at pick after the hierarchy was unthrottled.
wait_task_inactive() is special as it expects the delayed task on throttled hierarchy to reach the blocked state on dequeue but since __block_task() is never called, task_on_rq_queued() continues to return true. Furthermore, since the task is now off the hierarchy, the pick never reaches it to fully block the task even after unthrottle leading to wait_task_inactive() looping endlessly.
Remedy this by calling __block_task() if a delayed task is being dequeued on a throttled hierarchy.
This fix is only required for stabled kernels implementing delay dequeue (>= v6.12) before v6.18 since upstream commit e1fad12dcb66 ("sched/fair: Switch to task based throttle model") indirectly fixes this by removing the early return conditions in dequeue_entities() as part of the per-task throttle feature.
Cc: stable@vger.kernel.org Reported-by: Matt Fleming matt@readmodwrite.com Closes: https://lore.kernel.org/all/20250925133310.1843863-1-matt@readmodwrite.com/ Fixes: b7ca5743a260 ("sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks") Tested-by: Matt Fleming mfleming@cloudflare.com Signed-off-by: K Prateek Nayak kprateek.nayak@amd.com
Greg, Sasha,
This fix cleanly applies on top of v6.16.y and v6.17.y stable kernels too when cherry-picked from v6.12.y branch (or with 'git am -3'). Let me know if you would like me to send a seperate patch for each.
As mentioned above, the upstream fixes this as a part of larger feature and we would only like these bits backported. If there are any future conflicts in this area during backporting, I would be more than happy to help out resolve them.
Why not just backport all of the mainline changes instead? As I say a lot, whenever we do these "one off" changes, it's almost always wrong and causes problems over the years going forward as other changes around the same area can not be backported either.
So please, try to just backport the original commits.
thanks,
greg k-h