TSC could be reset in deep ACPI sleep states, even with invariant TSC. That's the reason we have sched_clock() save/restore functions, to deal with this situation. But happens that such functions are guarded with a check for the stability of sched_clock - if not considered stable, the save/restore routines aren't executed.
On top of that, we have a clear comment on native_sched_clock() saying that *even* with TSC unstable, we continue using TSC for sched_clock due to its speed. In other words, if we have a situation of TSC getting detected as unstable, it marks the sched_clock as unstable as well, so subsequent S3 sleep cycles could bring bogus sched_clock values due to the lack of the save/restore mechanism, causing warnings like this:
[22.954918] ------------[ cut here ]------------ [22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405 [22.954923] If you just came from a suspend/resume, [22.954923] please switch to the trace global clock: [22.954923] echo global > /sys/kernel/tracing/trace_clock [22.954923] or add trace_clock=global to the kernel command line [22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0
Notice that the above was reproduced even with "trace_clock=global".
The fix for that is to _always_ save/restore the sched_clock on suspend cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies the sched_clock_stable() check becomes relevant to save/restore the sched_clock.
Cc: stable@vger.kernel.org Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com --- arch/x86/kernel/tsc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 34dec0b72ea8..88e5a4ed9db3 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -959,7 +959,7 @@ static unsigned long long cyc2ns_suspend;
void tsc_save_sched_clock_state(void) { - if (!sched_clock_stable()) + if (!static_branch_likely(&__use_tsc) && !sched_clock_stable()) return;
cyc2ns_suspend = sched_clock(); @@ -979,7 +979,7 @@ void tsc_restore_sched_clock_state(void) unsigned long flags; int cpu;
- if (!sched_clock_stable()) + if (!static_branch_likely(&__use_tsc) && !sched_clock_stable()) return;
local_irq_save(flags);
On 15/02/2025 17:58, Guilherme G. Piccoli wrote:
TSC could be reset in deep ACPI sleep states, even with invariant TSC. That's the reason we have sched_clock() save/restore functions, to deal with this situation. But happens that such functions are guarded with a check for the stability of sched_clock - if not considered stable, the save/restore routines aren't executed.
On top of that, we have a clear comment on native_sched_clock() saying that *even* with TSC unstable, we continue using TSC for sched_clock due to its speed. In other words, if we have a situation of TSC getting detected as unstable, it marks the sched_clock as unstable as well, so subsequent S3 sleep cycles could bring bogus sched_clock values due to the lack of the save/restore mechanism, causing warnings like this:
[22.954918] ------------[ cut here ]------------ [22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405 [22.954923] If you just came from a suspend/resume, [22.954923] please switch to the trace global clock: [22.954923] echo global > /sys/kernel/tracing/trace_clock [22.954923] or add trace_clock=global to the kernel command line [22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0
Notice that the above was reproduced even with "trace_clock=global".
The fix for that is to _always_ save/restore the sched_clock on suspend cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies the sched_clock_stable() check becomes relevant to save/restore the sched_clock.
Hi folks, I would like to ask if possible to add the following tag:
Debugged-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com
Cascardo helped me a lot on debugging this issue but I forgot to add it earlier, so nothing more fair than add it now!
Thanks,
Guilherme
Cc: stable@vger.kernel.org Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com [...]
The following commit has been merged into the sched/core branch of tip:
Commit-ID: d90c9de9de2f1712df56de6e4f7d6982d358cabe Gitweb: https://git.kernel.org/tip/d90c9de9de2f1712df56de6e4f7d6982d358cabe Author: Guilherme G. Piccoli gpiccoli@igalia.com AuthorDate: Sat, 15 Feb 2025 17:58:16 -03:00 Committer: Ingo Molnar mingo@kernel.org CommitterDate: Fri, 21 Feb 2025 15:27:38 +01:00
x86/tsc: Always save/restore TSC sched_clock() on suspend/resume
TSC could be reset in deep ACPI sleep states, even with invariant TSC.
That's the reason we have sched_clock() save/restore functions, to deal with this situation. But what happens is that such functions are guarded with a check for the stability of sched_clock - if not considered stable, the save/restore routines aren't executed.
On top of that, we have a clear comment in native_sched_clock() saying that *even* with TSC unstable, we continue using TSC for sched_clock due to its speed.
In other words, if we have a situation of TSC getting detected as unstable, it marks the sched_clock as unstable as well, so subsequent S3 sleep cycles could bring bogus sched_clock values due to the lack of the save/restore mechanism, causing warnings like this:
[22.954918] ------------[ cut here ]------------ [22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405 [22.954923] If you just came from a suspend/resume, [22.954923] please switch to the trace global clock: [22.954923] echo global > /sys/kernel/tracing/trace_clock [22.954923] or add trace_clock=global to the kernel command line [22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0
Notice that the above was reproduced even with "trace_clock=global".
The fix for that is to _always_ save/restore the sched_clock on suspend cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies the sched_clock_stable() check becomes relevant to save/restore the sched_clock.
Debugged-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com Signed-off-by: Ingo Molnar mingo@kernel.org Cc: stable@vger.kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/r/20250215210314.351480-1-gpiccoli@igalia.com --- arch/x86/kernel/tsc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 34dec0b..88e5a4e 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -959,7 +959,7 @@ static unsigned long long cyc2ns_suspend;
void tsc_save_sched_clock_state(void) { - if (!sched_clock_stable()) + if (!static_branch_likely(&__use_tsc) && !sched_clock_stable()) return;
cyc2ns_suspend = sched_clock(); @@ -979,7 +979,7 @@ void tsc_restore_sched_clock_state(void) unsigned long flags; int cpu;
- if (!sched_clock_stable()) + if (!static_branch_likely(&__use_tsc) && !sched_clock_stable()) return;
local_irq_save(flags);
On Fri, Feb 21, 2025 at 02:37:42PM -0000, tip-bot2 for Guilherme G. Piccoli wrote:
The following commit has been merged into the sched/core branch of tip:
Commit-ID: d90c9de9de2f1712df56de6e4f7d6982d358cabe Gitweb: https://git.kernel.org/tip/d90c9de9de2f1712df56de6e4f7d6982d358cabe Author: Guilherme G. Piccoli gpiccoli@igalia.com AuthorDate: Sat, 15 Feb 2025 17:58:16 -03:00 Committer: Ingo Molnar mingo@kernel.org CommitterDate: Fri, 21 Feb 2025 15:27:38 +01:00
x86/tsc: Always save/restore TSC sched_clock() on suspend/resume
Should this not go into x86/core or somesuch?
linux-stable-mirror@lists.linaro.org