Hi,
When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
kernel: [ 1734.412974] smpboot: CPU 2 is now offline kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619 kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0 kernel: [ 1734.414964] CPU: 0 UID: 0 PID: 4619 Comm: systemd-sleep Tainted: P OE 6.11.0-rc1-linan-4 #292 kernel: [ 1734.414968] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE kernel: [ 1734.414969] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020 kernel: [ 1734.414970] Call Trace: kernel: [ 1734.414974] <TASK> kernel: [ 1734.414978] dump_stack_lvl+0x60/0x80 kernel: [ 1734.414982] check_preemption_disabled+0xce/0xe0 kernel: [ 1734.414987] hotplug_cpu__broadcast_tick_pull+0x1c/0xc0 kernel: [ 1734.414992] ? __pfx_takedown_cpu+0x10/0x10 kernel: [ 1734.414996] takedown_cpu+0x97/0x130 kernel: [ 1734.414999] cpuhp_invoke_callback+0xf8/0x450 kernel: [ 1734.415004] __cpuhp_invoke_callback_range+0x78/0xe0 kernel: [ 1734.415008] _cpu_down+0xf4/0x360 kernel: [ 1734.415012] freeze_secondary_cpus+0xae/0x290 kernel: [ 1734.415016] suspend_devices_and_enter+0x1da/0x920 kernel: [ 1734.415022] pm_suspend+0x1fa/0x500 kernel: [ 1734.415025] state_store+0x68/0xd0 kernel: [ 1734.415028] kernfs_fop_write_iter+0x169/0x1f0 kernel: [ 1734.415034] vfs_write+0x269/0x440 kernel: [ 1734.415041] ksys_write+0x63/0xe0 kernel: [ 1734.415044] do_syscall_64+0x4b/0x110 kernel: [ 1734.415048] entry_SYSCALL_64_after_hwframe+0x76/0x7e kernel: [ 1734.415052] RIP: 0033:0x7fe885cee240 kernel: [ 1734.415055] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 kernel: [ 1734.415057] RSP: 002b:00007ffc53ccec58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 kernel: [ 1734.415060] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe885cee240 kernel: [ 1734.415062] RDX: 0000000000000004 RSI: 00007ffc53cced40 RDI: 0000000000000004 kernel: [ 1734.415063] RBP: 00007ffc53cced40 R08: 0000000000000007 R09: 000055f34dde8210 kernel: [ 1734.415064] R10: 6bccc22257390b18 R11: 0000000000000202 R12: 0000000000000004 kernel: [ 1734.415066] R13: 000055f34dde42d0 R14: 0000000000000004 R15: 00007fe885dc49e0 kernel: [ 1734.415071] </TASK>
I confirmed that this was introduced by commit: f7d43dd206e7e18c182f200e67a8db8c209907fa tick/broadcast: Make takeover of broadcast hrtimer reliable , and revert this commit can fix it.
Thanks David
On Tue, Jul 30 2024 at 22:25, David Wang wrote:
When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
kernel: [ 1734.412974] smpboot: CPU 2 is now offline kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619 kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
The below should fix that.
Thanks,
tglx --- --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo #ifdef CONFIG_HOTPLUG_CPU void hotplug_cpu__broadcast_tick_pull(int deadcpu) { - struct tick_device *td = this_cpu_ptr(&tick_cpu_device); struct clock_event_device *bc; unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in * device to avoid the starvation. */ if (tick_check_broadcast_expired()) { + struct tick_device *td = this_cpu_ptr(&tick_cpu_device); + cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask); tick_program_event(td->evtdev->next_event, 1); }
On 2024/7/30 23:07, Thomas Gleixner wrote:
On Tue, Jul 30 2024 at 22:25, David Wang wrote:
When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
kernel: [ 1734.412974] smpboot: CPU 2 is now offline kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619 kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
The below should fix that.
Thanks,
tglx
--- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo #ifdef CONFIG_HOTPLUG_CPU void hotplug_cpu__broadcast_tick_pull(int deadcpu) {
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device); struct clock_event_device *bc; unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in * device to avoid the starvation. */ if (tick_check_broadcast_expired()) {
struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
}cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask); tick_program_event(td->evtdev->next_event, 1);
Sorry for causing this issue. I have tested the patch on an x86 machine, this patch can fix the issue.
Tested-by: Yu Liao liaoyu15@huawei.com
Hi,
At 2024-07-30 23:07:41, "Thomas Gleixner" tglx@linutronix.de wrote:
On Tue, Jul 30 2024 at 22:25, David Wang wrote:
When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
kernel: [ 1734.412974] smpboot: CPU 2 is now offline kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619 kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
The below should fix that.
Thanks,
I thought the offending line was smp_processor_id() used for cpumask_clear_cpu, so confused by this patch.... never mind
Sorry for the delay, I applied the patch and it dose fix the issue.
FYI David
tglx
--- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo #ifdef CONFIG_HOTPLUG_CPU void hotplug_cpu__broadcast_tick_pull(int deadcpu) {
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device); struct clock_event_device *bc; unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in * device to avoid the starvation. */ if (tick_check_broadcast_expired()) {
struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
}cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask); tick_program_event(td->evtdev->next_event, 1);
The recent fix for making the take over of the broadcast timer more reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed as compilers hoist the access into the non-preemptible region where the pointer is actually used. But of course it's valid that the compiler keeps it at the place where the code puts it which rightfully triggers:
BUG: using smp_processor_id() in preemptible [00000000] code: caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
Move it to the actual usage site which is in a non-preemptible region.
Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable") Reported-by: David Wang 00107082@163.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org --- kernel/time/tick-broadcast.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo #ifdef CONFIG_HOTPLUG_CPU void hotplug_cpu__broadcast_tick_pull(int deadcpu) { - struct tick_device *td = this_cpu_ptr(&tick_cpu_device); struct clock_event_device *bc; unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in * device to avoid the starvation. */ if (tick_check_broadcast_expired()) { + struct tick_device *td = this_cpu_ptr(&tick_cpu_device); + cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask); tick_program_event(td->evtdev->next_event, 1); }
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 6881e75237a84093d0986f56223db3724619f26e Gitweb: https://git.kernel.org/tip/6881e75237a84093d0986f56223db3724619f26e Author: Thomas Gleixner tglx@linutronix.de AuthorDate: Wed, 31 Jul 2024 12:23:51 +02:00 Committer: Thomas Gleixner tglx@linutronix.de CommitterDate: Wed, 31 Jul 2024 12:37:43 +02:00
tick/broadcast: Move per CPU pointer access into the atomic section
The recent fix for making the take over of the broadcast timer more reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed as compilers hoist the access into the non-preemptible region where the pointer is actually used. But of course it's valid that the compiler keeps it at the place where the code puts it which rightfully triggers:
BUG: using smp_processor_id() in preemptible [00000000] code: caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
Move it to the actual usage site which is in a non-preemptible region.
Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable") Reported-by: David Wang 00107082@163.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Yu Liao liaoyu15@huawei.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx --- kernel/time/tick-broadcast.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index b484309..ed58eeb 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(void) #ifdef CONFIG_HOTPLUG_CPU void hotplug_cpu__broadcast_tick_pull(int deadcpu) { - struct tick_device *td = this_cpu_ptr(&tick_cpu_device); struct clock_event_device *bc; unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(int deadcpu) * device to avoid the starvation. */ if (tick_check_broadcast_expired()) { + struct tick_device *td = this_cpu_ptr(&tick_cpu_device); + cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask); tick_program_event(td->evtdev->next_event, 1); }
linux-stable-mirror@lists.linaro.org