From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz [ pvorel: rebased for 5.4 ] Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 186e7d78ded5..e40f32e3ab06 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,7 +8,7 @@ #include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE; -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime = MAX_BW;
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe ]
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1
Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index e40f32e3ab06..fea993827c4f 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2738,6 +2738,9 @@ int sched_rr_handler(struct ctl_table *table, int write, sched_rr_timeslice = sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE : msecs_to_jiffies(sysctl_sched_rr_timeslice); + + if (sysctl_sched_rr_timeslice <= 0) + sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE); } mutex_unlock(&mutex);
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ]
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@...
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz [ pvorel: rebased for 5.4 ] Reviewed-by: Petr Vorel pvorel@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 5 +---- kernel/sysctl.c | 4 ++++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index fea993827c4f..873f364cb3c6 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2659,9 +2659,6 @@ static int sched_rt_global_constraints(void)
static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && ((sysctl_sched_rt_runtime > sysctl_sched_rt_period) || ((u64)sysctl_sched_rt_runtime * @@ -2693,7 +2690,7 @@ int sched_rt_handler(struct ctl_table *table, int write, old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) { ret = sched_rt_global_validate(); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 4f85f7ed42fc..865e539c7354 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -465,6 +465,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rt_runtime_us", @@ -472,6 +474,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = &neg_one, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rr_timeslice_ms",
linux-stable-mirror@lists.linaro.org