Currently, RCU boost testing in rcutorture is broken because it relies on having RT throttling disabled. This means the test will always pass (or rarely fail). This occurs because recently, RT throttling was replaced by DL server which boosts CFS tasks even when rcutorture tried to disable throttling (see rcu_torture_disable_rt_throttle()). However, the systctl_sched_rt_runtime variable is not considered thus still allowing RT tasks to be preempted by CFS tasks.
Therefore this patch prevents DL server from starting when RCU torture sets the sysctl_sched_rt_runtime to -1.
With this patch, boosting in TREE09 fails reliably if RCU_BOOST=n.
Steven also mentioned that this could fix RT usecases where users do not want DL server to be interfering.
Cc: stable@vger.kernel.org Cc: Paul E. McKenney paulmck@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Fixes: cea5a3472ac4 ("sched/fair: Cleanup fair_server") Signed-off-by: Joel Fernandes joelagnelf@nvidia.com --- v1->v2: Updated Fixes tag (Steven) Moved the stoppage of DL server to fair (Juri)
kernel/sched/fair.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1c0ef435a7aa..d7ba333393f2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1242,7 +1242,7 @@ static void update_curr(struct cfs_rq *cfs_rq) * against fair_server such that it can account for this time * and possibly avoid running this period. */ - if (dl_server_active(&rq->fair_server)) + if (dl_server_active(&rq->fair_server) && rt_bandwidth_enabled()) dl_server_update(&rq->fair_server, delta_exec); }
@@ -5957,7 +5957,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) sub_nr_running(rq, queued_delta);
/* Stop the fair server if throttling resulted in no runnable tasks */ - if (rq_h_nr_queued && !rq->cfs.h_nr_queued) + if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server)) dl_server_stop(&rq->fair_server); done: /* @@ -6056,7 +6056,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) }
/* Start the fair server if un-throttling resulted in new runnable tasks */ - if (!rq_h_nr_queued && rq->cfs.h_nr_queued) + if (!rq_h_nr_queued && rq->cfs.h_nr_queued && rt_bandwidth_enabled()) dl_server_start(&rq->fair_server);
/* At this point se is NULL and we are at root level*/ @@ -7005,9 +7005,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!rq_h_nr_queued && rq->cfs.h_nr_queued) { /* Account for idle runtime */ - if (!rq->nr_running) + if (!rq->nr_running && rt_bandwidth_enabled()) dl_server_update_idle_time(rq, rq->curr); - dl_server_start(&rq->fair_server); + + if (rt_bandwidth_enabled()) + dl_server_start(&rq->fair_server); }
/* At this point se is NULL and we are at root level*/ @@ -7134,7 +7136,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
sub_nr_running(rq, h_nr_queued);
- if (rq_h_nr_queued && !rq->cfs.h_nr_queued) + if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server)) dl_server_stop(&rq->fair_server);
/* balance early to pull high priority tasks */
Hi Joel,
On 05/03/25 20:10, Joel Fernandes wrote:
Currently, RCU boost testing in rcutorture is broken because it relies on having RT throttling disabled. This means the test will always pass (or rarely fail). This occurs because recently, RT throttling was replaced by DL server which boosts CFS tasks even when rcutorture tried to disable throttling (see rcu_torture_disable_rt_throttle()). However, the systctl_sched_rt_runtime variable is not considered thus still allowing RT tasks to be preempted by CFS tasks.
Therefore this patch prevents DL server from starting when RCU torture sets the sysctl_sched_rt_runtime to -1.
With this patch, boosting in TREE09 fails reliably if RCU_BOOST=n.
Steven also mentioned that this could fix RT usecases where users do not want DL server to be interfering.
Cc: stable@vger.kernel.org Cc: Paul E. McKenney paulmck@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Fixes: cea5a3472ac4 ("sched/fair: Cleanup fair_server") Signed-off-by: Joel Fernandes joelagnelf@nvidia.com
v1->v2: Updated Fixes tag (Steven) Moved the stoppage of DL server to fair (Juri)
I think what I suggested/wondered (sorry if I wasn't clear) is that we might need a link between sched_rt_runtime and the fair_server per-cpu runtime under sched/debug (i.e., sched_fair_write(), etc), otherwise one can end up with DL server disabled and still non zero runtime on the debug interface. This is only if we want to make that link, though; which I am not entirely sure it is something we want to do, as we will be stuck with an old/legacy interface if we do. Peter?
Thanks, Juri
On 3/6/2025 4:35 AM, Juri Lelli wrote:
Hi Joel,
On 05/03/25 20:10, Joel Fernandes wrote:
Currently, RCU boost testing in rcutorture is broken because it relies on having RT throttling disabled. This means the test will always pass (or rarely fail). This occurs because recently, RT throttling was replaced by DL server which boosts CFS tasks even when rcutorture tried to disable throttling (see rcu_torture_disable_rt_throttle()). However, the systctl_sched_rt_runtime variable is not considered thus still allowing RT tasks to be preempted by CFS tasks.
Therefore this patch prevents DL server from starting when RCU torture sets the sysctl_sched_rt_runtime to -1.
With this patch, boosting in TREE09 fails reliably if RCU_BOOST=n.
Steven also mentioned that this could fix RT usecases where users do not want DL server to be interfering.
Cc: stable@vger.kernel.org Cc: Paul E. McKenney paulmck@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Fixes: cea5a3472ac4 ("sched/fair: Cleanup fair_server") Signed-off-by: Joel Fernandes joelagnelf@nvidia.com
v1->v2: Updated Fixes tag (Steven) Moved the stoppage of DL server to fair (Juri)
I think what I suggested/wondered (sorry if I wasn't clear) is that we might need a link between sched_rt_runtime and the fair_server per-cpu runtime under sched/debug (i.e., sched_fair_write(), etc), otherwise one can end up with DL server disabled and still non zero runtime on the debug interface. This is only if we want to make that link, though; which I am not entirely sure it is something we want to do, as we will be stuck with an old/legacy interface if we do. Peter?
This is true. I was thinking more from a FAIR PoV. rt_bandwidth_enabled() is a generic function that's also called from the DL code. So I didn't per-se look at it like an DL server interface thing.
But you also do have a point and curious to hear what Peter has to say.
thanks,
- Joel
On Thu, Mar 06, 2025 at 09:35:27AM +0000, Juri Lelli wrote:
Hi Joel,
On 05/03/25 20:10, Joel Fernandes wrote:
Currently, RCU boost testing in rcutorture is broken because it relies on having RT throttling disabled. This means the test will always pass (or rarely fail). This occurs because recently, RT throttling was replaced by DL server which boosts CFS tasks even when rcutorture tried to disable throttling (see rcu_torture_disable_rt_throttle()). However, the systctl_sched_rt_runtime variable is not considered thus still allowing RT tasks to be preempted by CFS tasks.
Therefore this patch prevents DL server from starting when RCU torture sets the sysctl_sched_rt_runtime to -1.
With this patch, boosting in TREE09 fails reliably if RCU_BOOST=n.
Steven also mentioned that this could fix RT usecases where users do not want DL server to be interfering.
Cc: stable@vger.kernel.org Cc: Paul E. McKenney paulmck@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Fixes: cea5a3472ac4 ("sched/fair: Cleanup fair_server") Signed-off-by: Joel Fernandes joelagnelf@nvidia.com
v1->v2: Updated Fixes tag (Steven) Moved the stoppage of DL server to fair (Juri)
I think what I suggested/wondered (sorry if I wasn't clear) is that we might need a link between sched_rt_runtime and the fair_server per-cpu runtime under sched/debug (i.e., sched_fair_write(), etc), otherwise one can end up with DL server disabled and still non zero runtime on the debug interface. This is only if we want to make that link, though; which I am not entirely sure it is something we want to do, as we will be stuck with an old/legacy interface if we do. Peter?
While we are discussing, would it be acceptable to provide a sysctl that disabled DL server? We could set this via boot parameters in rcutorture. That would unblock the testing. If feel that is OK to do considering another sysctl systctl_sched_rt_runtime is already there to disable RT throttling, so we are just adding a similar one for DL server. What do you think?
thanks,
- Joel
linux-stable-mirror@lists.linaro.org