On Thu, Mar 06, 2025 at 09:35:27AM +0000, Juri Lelli wrote:
Hi Joel,
On 05/03/25 20:10, Joel Fernandes wrote:
Currently, RCU boost testing in rcutorture is broken because it relies on having RT throttling disabled. This means the test will always pass (or rarely fail). This occurs because recently, RT throttling was replaced by DL server which boosts CFS tasks even when rcutorture tried to disable throttling (see rcu_torture_disable_rt_throttle()). However, the systctl_sched_rt_runtime variable is not considered thus still allowing RT tasks to be preempted by CFS tasks.
Therefore this patch prevents DL server from starting when RCU torture sets the sysctl_sched_rt_runtime to -1.
With this patch, boosting in TREE09 fails reliably if RCU_BOOST=n.
Steven also mentioned that this could fix RT usecases where users do not want DL server to be interfering.
Cc: stable@vger.kernel.org Cc: Paul E. McKenney paulmck@kernel.org Cc: Steven Rostedt rostedt@goodmis.org Fixes: cea5a3472ac4 ("sched/fair: Cleanup fair_server") Signed-off-by: Joel Fernandes joelagnelf@nvidia.com
v1->v2: Updated Fixes tag (Steven) Moved the stoppage of DL server to fair (Juri)
I think what I suggested/wondered (sorry if I wasn't clear) is that we might need a link between sched_rt_runtime and the fair_server per-cpu runtime under sched/debug (i.e., sched_fair_write(), etc), otherwise one can end up with DL server disabled and still non zero runtime on the debug interface. This is only if we want to make that link, though; which I am not entirely sure it is something we want to do, as we will be stuck with an old/legacy interface if we do. Peter?
While we are discussing, would it be acceptable to provide a sysctl that disabled DL server? We could set this via boot parameters in rcutorture. That would unblock the testing. If feel that is OK to do considering another sysctl systctl_sched_rt_runtime is already there to disable RT throttling, so we are just adding a similar one for DL server. What do you think?
thanks,
- Joel