Hi,
maybe you will not like introducing 'static int int_max = INT_MAX;' for this old kernel which EOL in 10 months.
Cyril Hrubis (3): sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 10 +++++----- kernel/sysctl.c | 5 +++++ 2 files changed, 10 insertions(+), 5 deletions(-)
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz [ pvorel: rebased for 4.19 ] Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 394c66442cff..ce4594215728 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,7 +8,7 @@ #include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE; -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ;
static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe ]
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1
Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index ce4594215728..2ea4da8c5f3a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2735,6 +2735,9 @@ int sched_rr_handler(struct ctl_table *table, int write, sched_rr_timeslice = sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE : msecs_to_jiffies(sysctl_sched_rr_timeslice); + + if (sysctl_sched_rr_timeslice <= 0) + sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE); } mutex_unlock(&mutex);
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ]
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@...
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz [ pvorel: rebased for 4.19 ] Reviewed-by: Petr Vorel pvorel@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 5 +---- kernel/sysctl.c | 5 +++++ 2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 2ea4da8c5f3a..deb9366e4f30 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2658,9 +2658,6 @@ static int sched_rt_global_constraints(void)
static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && (sysctl_sched_rt_runtime > sysctl_sched_rt_period)) return -EINVAL; @@ -2690,7 +2687,7 @@ int sched_rt_handler(struct ctl_table *table, int write, old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) { ret = sched_rt_global_validate(); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 4bb194f096ec..6ce9f10b9c7d 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -127,6 +127,7 @@ static int zero; static int __maybe_unused one = 1; static int __maybe_unused two = 2; static int __maybe_unused four = 4; +static int int_max = INT_MAX; static unsigned long zero_ul; static unsigned long one_ul = 1; static unsigned long long_max = LONG_MAX; @@ -464,6 +465,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = &one, + .extra2 = &int_max, }, { .procname = "sched_rt_runtime_us", @@ -471,6 +474,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = &neg_one, + .extra2 = &int_max, }, { .procname = "sched_rr_timeslice_ms",
Cyril Hrubis (3): sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 10 +++++----- kernel/sysctl.c | 4 ++++ 2 files changed, 9 insertions(+), 5 deletions(-)
Hi all,
Cyril Hrubis (3): sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us
I'm sorry for the noise, while this patchset is working for 5.15 and 5.10, it fails to compile on 5.4 due missing SYSCTL_ONE:
kernel/sysctl.c:477:14: error: ‘SYSCTL_NEG_ONE’ undeclared here (not in a function); did you mean ‘SYSCTL_ONE’? .extra1 = SYSCTL_NEG_ONE, ^~~~~~~~~~~~~~ SYSCTL_ONE
This is because 78e36f3b0dae ("sysctl: move some boundary constants from sysctl.c to sysctl_vals") was backported from 5.17 to 5.15 and 5.10 but not to 5.4. If you agree on the same approach I took for 4.19 (create variable in kernel/sysctl.c) and send patch for 5.4.
Or, we can backport just to >= 5.10. Cyril and others who merged this obviously did not consider this important enough to backport. OTOH we have 2 LTP tests which are testing these:
* sched_rr_get_interval01 tests c7fcb99877f9 (on non-default CONFIG_HZ) * the other two commits are tested by proc_sched_rt01
Therefore I thought fixing at least 6.* would be good.
Kind regards, Petr
kernel/sched/rt.c | 10 +++++----- kernel/sysctl.c | 4 ++++ 2 files changed, 9 insertions(+), 5 deletions(-)
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz [ pvorel: rebased for 5.15, 5.10, 5.4 ] Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 7045595aacac..3394b7f923a0 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,7 +8,7 @@ #include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE; -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime = MAX_BW;
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe ]
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1
Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 3394b7f923a0..b05684202b20 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2883,6 +2883,9 @@ int sched_rr_handler(struct ctl_table *table, int write, void *buffer, sched_rr_timeslice = sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE : msecs_to_jiffies(sysctl_sched_rr_timeslice); + + if (sysctl_sched_rr_timeslice <= 0) + sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE); } mutex_unlock(&mutex);
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ]
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@...
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz [ pvorel: rebased for 5.15, 5.10, 5.4 ] Reviewed-by: Petr Vorel pvorel@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 5 +---- kernel/sysctl.c | 4 ++++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index b05684202b20..5fc99dce5145 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2806,9 +2806,6 @@ static int sched_rt_global_constraints(void)
static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && ((sysctl_sched_rt_runtime > sysctl_sched_rt_period) || ((u64)sysctl_sched_rt_runtime * @@ -2839,7 +2836,7 @@ int sched_rt_handler(struct ctl_table *table, int write, void *buffer, old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) { ret = sched_rt_global_validate(); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 928798f89ca1..4554e80c4272 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1821,6 +1821,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rt_runtime_us", @@ -1828,6 +1830,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_NEG_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_deadline_period_max_us",
Hi,
FYI c7fcb99877f9 has already been cherry-picked as 5fce29ab20cb05797239e4628fb683d2ee9aa140.
Cyril Hrubis (2): sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe ]
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1
Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 76bafa8d331a..f79a6f36777a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -3047,6 +3047,9 @@ static int sched_rr_handler(struct ctl_table *table, int write, void *buffer, sched_rr_timeslice = sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE : msecs_to_jiffies(sysctl_sched_rr_timeslice); + + if (sysctl_sched_rr_timeslice <= 0) + sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE); } mutex_unlock(&mutex);
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ]
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@...
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f79a6f36777a..3a2335bc1d58 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -37,6 +37,8 @@ static struct ctl_table sched_rt_sysctls[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rt_runtime_us", @@ -44,6 +46,8 @@ static struct ctl_table sched_rt_sysctls[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_NEG_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rr_timeslice_ms", @@ -2970,9 +2974,6 @@ static int sched_rt_global_constraints(void) #ifdef CONFIG_SYSCTL static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && ((sysctl_sched_rt_runtime > sysctl_sched_rt_period) || ((u64)sysctl_sched_rt_runtime * @@ -3003,7 +3004,7 @@ static int sched_rt_handler(struct ctl_table *table, int write, void *buffer, old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) { ret = sched_rt_global_validate();
Cyril Hrubis (1): sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
From: Cyril Hrubis chrubis@suse.cz
[ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ]
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@...
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz --- kernel/sched/rt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 904dd8534597..4ac36eb4cdee 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -37,6 +37,8 @@ static struct ctl_table sched_rt_sysctls[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rt_runtime_us", @@ -44,6 +46,8 @@ static struct ctl_table sched_rt_sysctls[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_NEG_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rr_timeslice_ms", @@ -2989,9 +2993,6 @@ static int sched_rt_global_constraints(void) #ifdef CONFIG_SYSCTL static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && ((sysctl_sched_rt_runtime > sysctl_sched_rt_period) || ((u64)sysctl_sched_rt_runtime * @@ -3022,7 +3023,7 @@ static int sched_rt_handler(struct ctl_table *table, int write, void *buffer, old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) { ret = sched_rt_global_validate();
On Thu, Feb 22, 2024 at 04:13:21PM +0100, Petr Vorel wrote:
Hi,
maybe you will not like introducing 'static int int_max = INT_MAX;' for this old kernel which EOL in 10 months.
That's fine, not a big deal :)
Cyril Hrubis (3): sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 10 +++++----- kernel/sysctl.c | 5 +++++ 2 files changed, 10 insertions(+), 5 deletions(-)
Thanks for the patches, but they all got connected into the same thread, making it impossible to detect which ones are for what branches :(
Can you put the version in the [PATCH X/Y] section like [PATCH 4.14 X/Y] or just make separate threads so we have a chance?
thanks,
greg k-h
Hi Greg,
On Thu, Feb 22, 2024 at 04:13:21PM +0100, Petr Vorel wrote:
Hi,
maybe you will not like introducing 'static int int_max = INT_MAX;' for this old kernel which EOL in 10 months.
That's fine, not a big deal :)
Thanks for a quick info. I guess this is a reply to my question about SYSCTL_NEG_ONE failure on missing SYSCTL_NEG_ONE. Therefore I'll create static int __maybe_unused neg_one = -1; (which was used before 78e36f3b0dae).
Cyril Hrubis (3): sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 10 +++++----- kernel/sysctl.c | 5 +++++ 2 files changed, 10 insertions(+), 5 deletions(-)
Thanks for the patches, but they all got connected into the same thread, making it impossible to detect which ones are for what branches :(
Can you put the version in the [PATCH X/Y] section like [PATCH 4.14 X/Y] or just make separate threads so we have a chance?
I'm sorry, I'll resent all patches properly.
Kind regards, Petr
thanks,
greg k-h
On Thu, Feb 22, 2024 at 05:50:49PM +0100, Petr Vorel wrote:
Hi Greg,
On Thu, Feb 22, 2024 at 04:13:21PM +0100, Petr Vorel wrote:
Hi,
maybe you will not like introducing 'static int int_max = INT_MAX;' for this old kernel which EOL in 10 months.
That's fine, not a big deal :)
Thanks for a quick info. I guess this is a reply to my question about SYSCTL_NEG_ONE failure on missing SYSCTL_NEG_ONE. Therefore I'll create static int __maybe_unused neg_one = -1; (which was used before 78e36f3b0dae).
Great.
Cyril Hrubis (3): sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 10 +++++----- kernel/sysctl.c | 5 +++++ 2 files changed, 10 insertions(+), 5 deletions(-)
Thanks for the patches, but they all got connected into the same thread, making it impossible to detect which ones are for what branches :(
Can you put the version in the [PATCH X/Y] section like [PATCH 4.14 X/Y] or just make separate threads so we have a chance?
I'm sorry, I'll resent all patches properly.
No worries, the second round looks good. I'll queue them up after this latest set of stable kernels goes out in a few days.
thanks!
greg k-h
linux-stable-mirror@lists.linaro.org