Re: [PATCH] sched/core: Fix potential deadlock on rq lock

11 Sep 2025


      On Thu, Sep 11, 2025 at 05:02:45PM +0200 Frederic Weisbecker wrote:
...
Le Thu, Sep 11, 2025 at 03:53:58PM +0200, Peter Zijlstra a écrit :
...
On Thu, Sep 11, 2025 at 12:42:49PM +0000, Wang Tao wrote:
...
When CPU 1 enters the nohz_full state, and the kworker on CPU 0 executes
the function sched_tick_remote, holding the lock on CPU1's rq
and triggering the warning WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3).
This leads to the process of printing the warning message, where the
console_sem semaphore is held. At this point, the print task on the
CPU1's rq cannot acquire the console_sem and joins the wait queue,
entering the UNINTERRUPTIBLE state. It waits for the console_sem to be
released and then wakes up. After the task on CPU 0 releases
the console_sem, it wakes up the waiting console_sem task.
In try_to_wake_up, it attempts to acquire the lock on CPU1's rq again,
resulting in a deadlock.
The triggering scenario is as follows:
CPU0								CPU1
sched_tick_remote
WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)
report_bug							con_write
printk
console_unlock
   							do_con_write
   							console_lock
   							down(&console_sem)
   							list_add_tail(&waiter.list, &sem->wait_list);
up(&console_sem)
wake_up_q(&wake_q)
try_to_wake_up
__task_rq_lock
_raw_spin_lock
This patch fixes the issue by deffering all printk console printing
during the lock holding period.
Fixes: d84b31313ef8 ("sched/isolation: Offload residual 1Hz scheduler tick")
Signed-off-by: Wang Tao wangtao554@huawei.com
I fundamentally hate that deferred thing and consider it a printk bug.
But really, if you trip that WARN, fix it and the problem goes away.
And probably it triggers a lot of false positives. An overloaded housekeeping
CPU can easily be off for 2 seconds. We should make it 30 seconds.
It does trigger pretty easily. We've done some work to try to make better
(spreading HK work around for example) but you can still hit it. Especially,
if there are virtualization layers involved...
Increasing that time a bit would be great :)
Cheers,
Phil
...
Thanks.
-- 
Frederic Weisbecker
SUSE Labs
--

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] sched/core: Fix potential deadlock on rq lock