Re: [PATCH 04/11] sched/deadline: Add support to initialize and remove dl_server bandwidth

6 Nov 2025

      Hi,
On 29/10/25 20:08, Andrea Righi wrote:
...
During switching from sched_ext to fair tasks and vice-versa, we need
support for intializing and removing the bandwidth contribution of
either DL server.
My first and more general/design question is do we strictly need this
automagic bandwidth management. We seem to agree [1] that we want to
move towards explicit dl-server(s) and tasks bandwidth handling, so we
might want to consider leaving the burden completely to whomever might
be configuring the system.
...
Add support for handling these transitions.
Anyway, if we still want to do this :) ...
...
Moreover, remove references specific to the fair server, in preparation
for adding the ext server.
v2: - wait for inactive_task_timer to fire before removing the bandwidth
      reservation (Juri Lelli)
    - add WARN_ON_ONCE(!cpus) sanity check in dl_server_apply_params()
      (Andrea Righi)
Co-developed-by: Joel Fernandes joelagnelf@nvidia.com
Signed-off-by: Joel Fernandes joelagnelf@nvidia.com
Signed-off-by: Andrea Righi arighi@nvidia.com

...
...
+/**

dl_server_remove_params - Remove bandwidth reservation for a DL server

@dl_se: The DL server entity to remove bandwidth for

This function removes the bandwidth reservation for a DL server entity,

cleaning up all bandwidth accounting and server state.

Returns: 0 on success, negative error code on failure

*/

+int dl_server_remove_params(struct sched_dl_entity *dl_se,

	    struct rq *rq, struct rq_flags *rf)

+{

if (!dl_se->dl_server)
return 0; /* Already disabled */

/*
* First dequeue if still queued. It should not be queued since

* we call this only after the last dl_server_stop().

*/

if (WARN_ON_ONCE(on_dl_rq(dl_se)))
dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);

if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == -1) {
rq_unlock_irqrestore(rq, rf);

This seems racy. I fear the moment we release the rq lock something can
slip in and the server(s) state might change?
...

hrtimer_cancel(&dl_se->inactive_timer);

I am not sure we actually need to force cancel the timer (but still
contradicting myself every time I go back at staring at code :). The way
I believe this should work 'in theory' is
- we remove a server (either automagic or user sets runtime to 0 -
   which is probably to fix/look at in current implementation as well
   btw)
 - current bandwidth is retained and only freed (and server reset) at
   0-lag (when inactive_timer fires)
 - if server is activated back before 0-lag it will use it's current
   parameters
 - after 0-lag it's a new instance with new parameters
In inactive_timer() we have this behavior for simple tasks, but we skip
__dl_sub() etc for servers (since we clear it up immediately).
In all this I essentially fear that if we clear parameters immediately
one could be able to trick the system by quickly disabling/enabling a
dl-server to let fair/scx tasks execute more than what requested (as
each new enable will be seen as a new instance). But, again, I wasn't
yet able to demonstrate this and I am still uncomfortably uncertain.
Please Peter and others keep me honest.
Also, server parameters changes are root only, so maybe not a big deal?
For scx automagic as well?
Thanks!
Juri
1 - https://lore.kernel.org/lkml/aQiE1ULtInJS6X4R@jlelli-thinkpadt14gen4.remote....

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 04/11] sched/deadline: Add support to initialize and remove dl_server bandwidth