Initial booting is setting the task flag to idle (PF_IDLE) by the call path sched_init() -> init_idle(). Having the task idle and calling call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be set. Subsequent calls to any cond_resched() will enable IRQs, potentially earlier than the IRQ setup has completed. Recent changes have caused just this scenario and IRQs have been enabled early.
This causes a warning later in start_kernel() as interrupts are enabled before they are fully set up.
Fix this issue by clearing the PF_IDLE flag on return from sched_init() and restore the flag in rest_init(). Although the boot task was marked as idle since (at least) d80e4fda576d, I am not sure that it is wrong to do so. The forced context-switch on idle task was introduced in the tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.
Link: https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/ Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+Zou... Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task") Cc: stable@vger.kernel.org Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Christophe Leroy christophe.leroy@csgroup.eu Cc: Andreas Schwab schwab@linux-m68k.org Cc: Matthew Wilcox willy@infradead.org Cc: Peng Zhang zhangpeng.00@bytedance.com Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@redhat.com Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Andrew Morton akpm@linux-foundation.org Cc: "Mike Rapoport (IBM)" rppt@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Liam R. Howlett Liam.Howlett@oracle.com --- init/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/init/main.c b/init/main.c index ad920fac325c..f74772acf612 100644 --- a/init/main.c +++ b/init/main.c @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void) */ rcu_read_lock(); tsk = find_task_by_pid_ns(pid, &init_pid_ns); - tsk->flags |= PF_NO_SETAFFINITY; + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE; set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id())); rcu_read_unlock();
@@ -938,6 +938,8 @@ void start_kernel(void) * time - but meanwhile we still have a functioning scheduler. */ sched_init(); + /* Avoid early context switch, rest_init() restores PF_IDLE */ + current->flags &= ~PF_IDLE;
if (WARN(!irqs_disabled(), "Interrupts were enabled *very* early, fixing it\n"))