Hi Ingo/Thomas,
Here are few cleanups around timer-core initialization. The first one is suggested by Peter Zijlstra [1] and the other two make few more changes in order to simplify it.
Please see if they look fine.
-- viresh
[1] http://lkml.iu.edu/hypermail/linux/kernel/1503.3/04091.html
Peter Zijlstra (1): timer: Allocate per-cpu tvec_base's statically
Viresh Kumar (2): timer: Limit the scope of __tvec_bases to init_timers_cpu() timer: Don't initialize tvec_base on hotplug
kernel/time/timer.c | 121 +++++++++++++++++++--------------------------------- 1 file changed, 45 insertions(+), 76 deletions(-)
From: Peter Zijlstra peterz@infradead.org
Memory for tvec_base is allocated separately for boot CPU (statically) and non-boot CPUs (dynamically).
The reason is because __TIMER_INITIALIZER() needs to set ->base to a valid pointer (because we've made NULL special, hint: lock_timer_base()) and we cannot get a compile time pointer to per-cpu entries because we don't know where we'll map the section, even for the boot cpu.
This can be simplified a bit by statically allocating per-cpu memory. The only disadvantage is that memory for one of the structures will stay unused, i.e. for the boot CPU, which uses boot_tvec_bases.
This will also guarantee that tvec_base is cacheline aligned. Even though tvec_base has ____cacheline_aligned stuck on, kzalloc_node() does not actually respect that (but guarantees a minimum u64 alignment).
Signed-off-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/timer.c | 36 ++++++++---------------------------- 1 file changed, 8 insertions(+), 28 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 2d3f5c504939..6e8220ec8a62 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -93,6 +93,7 @@ struct tvec_base { struct tvec_base boot_tvec_bases; EXPORT_SYMBOL(boot_tvec_bases); static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases; +static DEFINE_PER_CPU(struct tvec_base, __tvec_bases);
/* Functions below help us manage 'deferrable' flag */ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) @@ -1534,46 +1535,25 @@ EXPORT_SYMBOL(schedule_timeout_uninterruptible);
static int init_timers_cpu(int cpu) { - int j; - struct tvec_base *base; + struct tvec_base *base = per_cpu(tvec_bases, cpu); static char tvec_base_done[NR_CPUS]; + int j;
if (!tvec_base_done[cpu]) { static char boot_done;
- if (boot_done) { - /* - * The APs use this path later in boot - */ - base = kzalloc_node(sizeof(*base), GFP_KERNEL, - cpu_to_node(cpu)); - if (!base) - return -ENOMEM; - - /* Make sure tvec_base has TIMER_FLAG_MASK bits free */ - if (WARN_ON(base != tbase_get_base(base))) { - kfree(base); - return -ENOMEM; - } - per_cpu(tvec_bases, cpu) = base; + if (!boot_done) { + boot_done = 1; /* skip the boot cpu */ } else { - /* - * This is for the boot CPU - we use compile-time - * static initialisation because per-cpu memory isn't - * ready yet and because the memory allocators are not - * initialised either. - */ - boot_done = 1; - base = &boot_tvec_bases; + base = per_cpu_ptr(&__tvec_bases, cpu); + per_cpu(tvec_bases, cpu) = base; } + spin_lock_init(&base->lock); tvec_base_done[cpu] = 1; base->cpu = cpu; - } else { - base = per_cpu(tvec_bases, cpu); }
- for (j = 0; j < TVN_SIZE; j++) { INIT_LIST_HEAD(base->tv5.vec + j); INIT_LIST_HEAD(base->tv4.vec + j);
* Viresh Kumar viresh.kumar@linaro.org wrote:
From: Peter Zijlstra peterz@infradead.org
Memory for tvec_base is allocated separately for boot CPU (statically) and non-boot CPUs (dynamically).
The reason is because __TIMER_INITIALIZER() needs to set ->base to a valid pointer (because we've made NULL special, hint: lock_timer_base()) and we cannot get a compile time pointer to per-cpu entries because we don't know where we'll map the section, even for the boot cpu.
This can be simplified a bit by statically allocating per-cpu memory. The only disadvantage is that memory for one of the structures will stay unused, i.e. for the boot CPU, which uses boot_tvec_bases.
This will also guarantee that tvec_base is cacheline aligned. Even though tvec_base has ____cacheline_aligned stuck on, kzalloc_node() does not actually respect that (but guarantees a minimum u64 alignment).
Signed-off-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
kernel/time/timer.c | 36 ++++++++---------------------------- 1 file changed, 8 insertions(+), 28 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 2d3f5c504939..6e8220ec8a62 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -93,6 +93,7 @@ struct tvec_base { struct tvec_base boot_tvec_bases; EXPORT_SYMBOL(boot_tvec_bases); static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases; +static DEFINE_PER_CPU(struct tvec_base, __tvec_bases); /* Functions below help us manage 'deferrable' flag */ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) @@ -1534,46 +1535,25 @@ EXPORT_SYMBOL(schedule_timeout_uninterruptible); static int init_timers_cpu(int cpu) {
- int j;
- struct tvec_base *base;
- struct tvec_base *base = per_cpu(tvec_bases, cpu); static char tvec_base_done[NR_CPUS];
- int j;
if (!tvec_base_done[cpu]) { static char boot_done;
if (!boot_done) {
boot_done = 1; /* skip the boot cpu */
So it would be a lot more descriptive to name this flag 'boot_cpu_skipped'?
} else {
base = per_cpu_ptr(&__tvec_bases, cpu);
}per_cpu(tvec_bases, cpu) = base;
- spin_lock_init(&base->lock); tvec_base_done[cpu] = 1; base->cpu = cpu; }
Also, I'd put a description about the PER_CPU background into comments as well, because it's not obvious at first sight at all what the whole (boot_tvec_bases, tvec_bases, __tvec_bases) dance does.
Thanks,
Ingo
On 31 March 2015 at 13:15, Ingo Molnar mingo@kernel.org wrote:
- Viresh Kumar viresh.kumar@linaro.org wrote:
From: Peter Zijlstra peterz@infradead.org
if (!boot_done) {
boot_done = 1; /* skip the boot cpu */
So it would be a lot more descriptive to name this flag 'boot_cpu_skipped'?
Yes.
Also, I'd put a description about the PER_CPU background into comments as well, because it's not obvious at first sight at all what the whole (boot_tvec_bases, tvec_bases, __tvec_bases) dance does.
Yeah, so I did that with a one liner in the last patch, but that doesn't look good enough. I will try to do something better in V2.
Thanks for your reviews.
per-cpu variable '__tvec_bases' is used for all CPUs except the boot CPU, which uses 'boot_tvec_bases'. Because of this special case, we shouldn't make direct references to __tvec_bases from timer core as there are chances of using the wrong instance for boot CPU.
To force that, lets move __tvec_bases to the only routine which can use it directly (to set tvec_bases).
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 6e8220ec8a62..40918a5d3e1d 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -93,7 +93,6 @@ struct tvec_base { struct tvec_base boot_tvec_bases; EXPORT_SYMBOL(boot_tvec_bases); static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases; -static DEFINE_PER_CPU(struct tvec_base, __tvec_bases);
/* Functions below help us manage 'deferrable' flag */ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) @@ -1537,6 +1536,7 @@ static int init_timers_cpu(int cpu) { struct tvec_base *base = per_cpu(tvec_bases, cpu); static char tvec_base_done[NR_CPUS]; + static DEFINE_PER_CPU(struct tvec_base, __tvec_bases); int j;
if (!tvec_base_done[cpu]) {
There is no need to call init_timers_cpu() on every cpu hotplug event, there is not much we need to reset.
- Timer-lists are already empty at the end of migrate_timers(). - timer_jiffies will be refreshed while adding a new timer, after the CPU is online again. - active_timers and all_timers can be reset from migrate_timers().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/time/timer.c | 101 +++++++++++++++++++++++----------------------------- 1 file changed, 45 insertions(+), 56 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 40918a5d3e1d..aa02d49b3748 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1532,44 +1532,6 @@ signed long __sched schedule_timeout_uninterruptible(signed long timeout) } EXPORT_SYMBOL(schedule_timeout_uninterruptible);
-static int init_timers_cpu(int cpu) -{ - struct tvec_base *base = per_cpu(tvec_bases, cpu); - static char tvec_base_done[NR_CPUS]; - static DEFINE_PER_CPU(struct tvec_base, __tvec_bases); - int j; - - if (!tvec_base_done[cpu]) { - static char boot_done; - - if (!boot_done) { - boot_done = 1; /* skip the boot cpu */ - } else { - base = per_cpu_ptr(&__tvec_bases, cpu); - per_cpu(tvec_bases, cpu) = base; - } - - spin_lock_init(&base->lock); - tvec_base_done[cpu] = 1; - base->cpu = cpu; - } - - for (j = 0; j < TVN_SIZE; j++) { - INIT_LIST_HEAD(base->tv5.vec + j); - INIT_LIST_HEAD(base->tv4.vec + j); - INIT_LIST_HEAD(base->tv3.vec + j); - INIT_LIST_HEAD(base->tv2.vec + j); - } - for (j = 0; j < TVR_SIZE; j++) - INIT_LIST_HEAD(base->tv1.vec + j); - - base->timer_jiffies = jiffies; - base->next_timer = base->timer_jiffies; - base->active_timers = 0; - base->all_timers = 0; - return 0; -} - #ifdef CONFIG_HOTPLUG_CPU static void migrate_timer_list(struct tvec_base *new_base, struct list_head *head) { @@ -1611,6 +1573,9 @@ static void migrate_timers(int cpu) migrate_timer_list(new_base, old_base->tv5.vec + i); }
+ old_base->active_timers = 0; + old_base->all_timers = 0; + spin_unlock(&old_base->lock); spin_unlock_irq(&new_base->lock); put_cpu_var(tvec_bases); @@ -1620,25 +1585,16 @@ static void migrate_timers(int cpu) static int timer_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { - long cpu = (long)hcpu; - int err; - - switch(action) { - case CPU_UP_PREPARE: - case CPU_UP_PREPARE_FROZEN: - err = init_timers_cpu(cpu); - if (err < 0) - return notifier_from_errno(err); - break; #ifdef CONFIG_HOTPLUG_CPU + switch (action) { case CPU_DEAD: case CPU_DEAD_FROZEN: - migrate_timers(cpu); + migrate_timers((long)hcpu); break; -#endif default: break; } +#endif return NOTIFY_OK; }
@@ -1646,18 +1602,51 @@ static struct notifier_block timers_nb = { .notifier_call = timer_cpu_notify, };
+static void __init init_timer_cpu(struct tvec_base *base, int cpu) +{ + int j;
-void __init init_timers(void) + base->cpu = cpu; + per_cpu(tvec_bases, cpu) = base; + spin_lock_init(&base->lock); + + for (j = 0; j < TVN_SIZE; j++) { + INIT_LIST_HEAD(base->tv5.vec + j); + INIT_LIST_HEAD(base->tv4.vec + j); + INIT_LIST_HEAD(base->tv3.vec + j); + INIT_LIST_HEAD(base->tv2.vec + j); + } + for (j = 0; j < TVR_SIZE; j++) + INIT_LIST_HEAD(base->tv1.vec + j); + + base->timer_jiffies = jiffies; + base->next_timer = base->timer_jiffies; +} + +static void __init init_timer_cpus(void) { - int err; + /* Used for all CPUs, except boot CPU (which uses boot_tvec_bases) */ + static DEFINE_PER_CPU(struct tvec_base, __tvec_bases); + struct tvec_base *base; + int local_cpu = smp_processor_id(); + int cpu; + + for_each_possible_cpu(cpu) { + if (likely(cpu != local_cpu)) + base = per_cpu_ptr(&__tvec_bases, cpu); + else + base = &boot_tvec_bases; + + init_timer_cpu(base, cpu); + } +}
+void __init init_timers(void) +{ /* ensure there are enough low bits for flags in timer->base pointer */ BUILD_BUG_ON(__alignof__(struct tvec_base) & TIMER_FLAG_MASK);
- err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE, - (void *)(long)smp_processor_id()); - BUG_ON(err != NOTIFY_OK); - + init_timer_cpus(); init_timer_stats(); register_cpu_notifier(&timers_nb); open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
On Mon, Mar 30, 2015 at 10:47:16AM +0530, Viresh Kumar wrote:
Hi Ingo/Thomas,
Here are few cleanups around timer-core initialization. The first one is suggested by Peter Zijlstra [1] and the other two make few more changes in order to simplify it.
I did the below on top; I'll sit on these patches for a wee while in the hope of the blackfin maintainer explaining his reasons for wrecking ____cacheline_aligned.
--- Subject: timer: Further simplify SMP and HOTPLUG logic From: Peter Zijlstra peterz@infradead.org Date: Mon Mar 30 10:09:19 CEST 2015
Remove one CONFIG_HOTPLUG_CPU #ifdef in trade for introducing one CONFIG_SMP #ifdef.
The CONFIG_SMP ifdef avoids declaring the per-cpu __tvec_bases storage on UP systems since they already have boot_tvec_bases.
Also (re)add a runtime check on the base alignment -- for the paranoid amongst us :-)
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org --- kernel/time/timer.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-)
--- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1580,12 +1580,10 @@ static void migrate_timers(int cpu) spin_unlock_irq(&new_base->lock); put_cpu_var(tvec_bases); } -#endif /* CONFIG_HOTPLUG_CPU */
static int timer_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { -#ifdef CONFIG_HOTPLUG_CPU switch (action) { case CPU_DEAD: case CPU_DEAD_FROZEN: @@ -1594,18 +1592,17 @@ static int timer_cpu_notify(struct notif default: break; } -#endif + return NOTIFY_OK; } - -static struct notifier_block timers_nb = { - .notifier_call = timer_cpu_notify, -}; +#endif /* CONFIG_HOTPLUG_CPU */
static void __init init_timer_cpu(struct tvec_base *base, int cpu) { int j;
+ BUG_ON(base != tbase_get_base(base)); + base->cpu = cpu; per_cpu(tvec_bases, cpu) = base; spin_lock_init(&base->lock); @@ -1625,18 +1622,19 @@ static void __init init_timer_cpu(struct
static void __init init_timer_cpus(void) { - /* Used for all CPUs, except boot CPU (which uses boot_tvec_bases) */ - static DEFINE_PER_CPU(struct tvec_base, __tvec_bases); - struct tvec_base *base; int local_cpu = smp_processor_id(); + struct tvec_base *base; int cpu;
for_each_possible_cpu(cpu) { - if (likely(cpu != local_cpu)) - base = per_cpu_ptr(&__tvec_bases, cpu); - else + if (cpu == local_cpu) base = &boot_tvec_bases; - +#ifdef CONFIG_SMP + else { + static DEFINE_PER_CPU(struct tvec_base, __tvec_bases); + base = per_cpu_ptr(&__tvec_bases, cpu); + } +#endif init_timer_cpu(base, cpu); } } @@ -1648,7 +1646,7 @@ void __init init_timers(void)
init_timer_cpus(); init_timer_stats(); - register_cpu_notifier(&timers_nb); + cpu_notifier(timer_cpu_notify, 0); open_softirq(TIMER_SOFTIRQ, run_timer_softirq); }
On 30 March 2015 at 13:48, Peter Zijlstra peterz@infradead.org wrote:
I did the below on top; I'll sit on these patches for a wee while in the hope of the blackfin maintainer explaining his reasons for wrecking ____cacheline_aligned.
Two things I wanted to mention:
1.) We may need to drop the patch 2/3 as that breaks build for xtensa and tile.
kernel/time/timer.c: In function 'init_timers_cpu': kernel/time/timer.c:1539:1: error: section attribute cannot be specified for local variables kernel/time/timer.c:1539:1: error: section attribute cannot be specified for local variables
kernel/time/timer.c:1539:1: error: declaration of '__pcpu_unique___tvec_bases' with no linkage follows extern declaration
kernel/time/timer.c:1539:1: note: previous declaration of '__pcpu_unique___tvec_bases' was here kernel/time/timer.c:1539:9: error: section attribute cannot be specified for local variables kernel/time/timer.c:1539:9: error: section attribute cannot be specified for local variables kernel/time/timer.c:1539:9: error: weak declaration of '__tvec_bases' must be public kernel/time/timer.c:1539:9: error: declaration of '__tvec_bases' with no linkage follows extern declaration kernel/time/timer.c:1539:9: note: previous declaration of '__tvec_bases' was here
Can these be fixed somehow ?
2.) We probably don't need to wait for the blackfin guys as we haven't broken anything yet. It will be broken only once we start using an extra bit from base pointer, which will be done by a separate patch trying to migrate running timers.
Subject: timer: Further simplify SMP and HOTPLUG logic From: Peter Zijlstra peterz@infradead.org Date: Mon Mar 30 10:09:19 CEST 2015
Remove one CONFIG_HOTPLUG_CPU #ifdef in trade for introducing one CONFIG_SMP #ifdef.
The CONFIG_SMP ifdef avoids declaring the per-cpu __tvec_bases storage on UP systems since they already have boot_tvec_bases.
Also (re)add a runtime check on the base alignment -- for the paranoid amongst us :-)
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org
kernel/time/timer.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-)
Reviewed-by: Viresh Kumar viresh.kumar@linaro.org
On Mon, Mar 30, 2015 at 03:25:36PM +0530, Viresh Kumar wrote:
On 30 March 2015 at 13:48, Peter Zijlstra peterz@infradead.org wrote:
I did the below on top; I'll sit on these patches for a wee while in the hope of the blackfin maintainer explaining his reasons for wrecking ____cacheline_aligned.
Two things I wanted to mention:
1.) We may need to drop the patch 2/3 as that breaks build for xtensa and tile.
kernel/time/timer.c: In function 'init_timers_cpu': kernel/time/timer.c:1539:1: error: section attribute cannot be specified for local variables
Can these be fixed somehow ?
Hmm, is your xtensa/tile compiler different from your others? But no, I think we need to move it back to global scope. Lemme go do that.
On 30 March 2015 at 15:57, Peter Zijlstra peterz@infradead.org wrote:
Hmm, is your xtensa/tile compiler different from your others? But no, I think we need to move it back to global scope. Lemme go do that.
These were generated by Fenguang's build-bot.
linaro-kernel@lists.linaro.org