Allow sysctl override of sched_tick_max_deferment in order to ease finding/fixing the remaining issues with full nohz.
The value to be written is in jiffies, and -1 means the max deferment is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
Cc: Frederic Weisbecker fweisbec@gmail.com Signed-off-by: Kevin Hilman khilman@linaro.org --- include/linux/sched/sysctl.h | 3 +++ kernel/sched/core.c | 6 +++++- kernel/sched/debug.c | 1 + kernel/sysctl.c | 9 +++++++++ 4 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index bf8086b..2ad07bb 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -57,6 +57,9 @@ extern unsigned int sysctl_sched_nr_migrate; extern unsigned int sysctl_sched_time_avg; extern unsigned int sysctl_timer_migration; extern unsigned int sysctl_sched_shares_window; +#ifdef CONFIG_NO_HZ_FULL +extern unsigned int sysctl_sched_tick_max_deferment; +#endif
int sched_proc_update_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e1a27f9..b5d3f99 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2751,12 +2751,16 @@ void scheduler_tick(void) * balancing, etc... continue to move forward, even * with a very low granularity. */ +unsigned int sysctl_sched_tick_max_deferment = HZ; u64 scheduler_tick_max_deferment(void) { struct rq *rq = this_rq(); unsigned long next, now = ACCESS_ONCE(jiffies);
- next = rq->last_sched_tick + HZ; + if (sysctl_sched_tick_max_deferment == -1) + return KTIME_MAX; + + next = rq->last_sched_tick + sysctl_sched_tick_max_deferment;
if (time_before_eq(next, now)) return 0; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 75024a6..f445ab9 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -374,6 +374,7 @@ static void sched_debug_header(struct seq_file *m) PN(sysctl_sched_wakeup_granularity); P(sysctl_sched_child_runs_first); P(sysctl_sched_features); + P(sysctl_sched_tick_max_deferment); #undef PN #undef P
diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 9edcf45..fb0b7d8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -393,6 +393,15 @@ static struct ctl_table kern_table[] = { .proc_handler = proc_dointvec, }, #endif /* CONFIG_NUMA_BALANCING */ +#ifdef CONFIG_NO_HZ_FULL + { + .procname = "sched_tick_max_deferment", + .data = &sysctl_sched_tick_max_deferment, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, +#endif /* CONFIG_NO_HZ_FULL */ #endif /* CONFIG_SCHED_DEBUG */ { .procname = "sched_rt_period_us",
The conversion of the max deferment from usecs to nsecs can easily overflow on platforms where a long is 32-bits. To fix, cast the usecs value to u64 before multiplying by NSECS_PER_USEC.
This was discovered on 32-bit ARM platform when extending the max deferment value.
Cc: Frederic Weisbecker fweisbec@gmail.com Signed-off-by: Kevin Hilman khilman@linaro.org --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b5d3f99..b506722 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2765,7 +2765,7 @@ u64 scheduler_tick_max_deferment(void) if (time_before_eq(next, now)) return 0;
- return jiffies_to_usecs(next - now) * NSEC_PER_USEC; + return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC; } #endif
On Tue, Jun 18, 2013 at 04:58:29PM -0700, Kevin Hilman wrote:
The conversion of the max deferment from usecs to nsecs can easily overflow on platforms where a long is 32-bits. To fix, cast the usecs value to u64 before multiplying by NSECS_PER_USEC.
This was discovered on 32-bit ARM platform when extending the max deferment value.
Cc: Frederic Weisbecker fweisbec@gmail.com Signed-off-by: Kevin Hilman khilman@linaro.org
Right, if we make it tunable we need that patch.
Thanks!
Acked-by: Frederic Weisbecker fweisbec@gmail.com
kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b5d3f99..b506722 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2765,7 +2765,7 @@ u64 scheduler_tick_max_deferment(void) if (time_before_eq(next, now)) return 0;
- return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
- return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
} #endif -- 1.8.3
On Tue, Jun 18, 2013 at 04:58:28PM -0700, Kevin Hilman wrote:
Allow sysctl override of sched_tick_max_deferment in order to ease finding/fixing the remaining issues with full nohz.
The value to be written is in jiffies, and -1 means the max deferment is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
Cc: Frederic Weisbecker fweisbec@gmail.com Signed-off-by: Kevin Hilman khilman@linaro.org
This looks like a useful thing but I wonder if a debugfs file would be more appropriate than sysctl.
The scheduler tick max deferment is supposed to be a temporary hack so we probably don't want to bring a real user ABI for that.
I believe sysctl is for permanent ABIs, right?
Frederic Weisbecker fweisbec@gmail.com writes:
On Tue, Jun 18, 2013 at 04:58:28PM -0700, Kevin Hilman wrote:
Allow sysctl override of sched_tick_max_deferment in order to ease finding/fixing the remaining issues with full nohz.
The value to be written is in jiffies, and -1 means the max deferment is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
Cc: Frederic Weisbecker fweisbec@gmail.com Signed-off-by: Kevin Hilman khilman@linaro.org
This looks like a useful thing but I wonder if a debugfs file would be more appropriate than sysctl.
The scheduler tick max deferment is supposed to be a temporary hack so we probably don't want to bring a real user ABI for that.
I wondered about that as well, but I wasn't sure if the existing knobs under CONFIG_SCHED_DEBUG (sched_min_granularity_ns, sched_latency_ns, etc.) are considered permanant ABI, or optional debugging tools.
This new option is inside CONFIG_SCHED_DEBUG along with the others, but if debugfs is preferred I can move it there. It seems strange though to just have this knob in debugfs and the rest in sysctl under CONFIG_SCHED_DEBUG.
Thanks,
Kevin
linaro-kernel@lists.linaro.org