Hi folks,
A few Fedora users have reported[0] a regression starting in v4.16.8 where the boot will hang ~1/3 of the time with the following RCU stall warning:
INFO: rcu_sched detected stalls on CPUs/tasks: o1-...!: (0 ticks this GP) idle=688/0/0 softirq=171/171 fqs=0 o(detected by 0, t=60002 jiffies, g=-142, c=-143, q=9) Sending NMI from CPU 0 to CPU 1: NMI backtrace for cpu 1 skipped: idling at acpi_processor_ffh_cstate_enter+0x65/0xb0 rcu_sched kthread starved for 60002 jiffies! g18446744073709551474 c1844674407370955143 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 -> cpu=1 RCU grace-period kthread stack dump: rcu_sched I 0 9 2 0x80000000 Call Trace: ? __schedule+0x234/0x850 schedule+0x28/0x80 schedule_timeout+0x166/0x380 ? __next_timer_interrupt+0xc0/0xc0 rcu_gp_kthread+0x368/0x830 ? rcu_process_callbacks+0x4f0/0x4f0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1579925
Thanks, Jeremy
On Mon, Jun 11, 2018 at 01:59:15PM +0000, Jeremy Cline wrote:
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
Weird. So Core2 typically triggers mark_tsc_unstable() in either intel_idle or processor_idle. ISTR testing that when I did the patches.
When I make that mark_tsc_unstable() in the idle drivers unconditional and boot my ivb with that, it doesn't want to fail. I've booted the machine 5 consequctive times without issue.
Let me try and checkout -stable, maybe something's up with that.
On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 01:59:15PM +0000, Jeremy Cline wrote:
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
Weird. So Core2 typically triggers mark_tsc_unstable() in either intel_idle or processor_idle. ISTR testing that when I did the patches.
When I make that mark_tsc_unstable() in the idle drivers unconditional and boot my ivb with that, it doesn't want to fail. I've booted the machine 5 consequctive times without issue.
Let me try and checkout -stable, maybe something's up with that.
Nope -stable seems to be working as well on the IVB (with modification). I just dug up my T500 and that's actually still running the test kernel. Let me try and build the -stable kernel for that.
On Mon, Jun 11, 2018 at 04:38:01PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 01:59:15PM +0000, Jeremy Cline wrote:
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
Weird. So Core2 typically triggers mark_tsc_unstable() in either intel_idle or processor_idle. ISTR testing that when I did the patches.
When I make that mark_tsc_unstable() in the idle drivers unconditional and boot my ivb with that, it doesn't want to fail. I've booted the machine 5 consequctive times without issue.
Let me try and checkout -stable, maybe something's up with that.
Nope -stable seems to be working as well on the IVB (with modification). I just dug up my T500 and that's actually still running the test kernel. Let me try and build the -stable kernel for that.
4.16.8 works without issue on my T500 with a debian/ubuntu like distro config.
On 06/11/2018 11:30 AM, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:38:01PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 01:59:15PM +0000, Jeremy Cline wrote:
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
Weird. So Core2 typically triggers mark_tsc_unstable() in either intel_idle or processor_idle. ISTR testing that when I did the patches.
When I make that mark_tsc_unstable() in the idle drivers unconditional and boot my ivb with that, it doesn't want to fail. I've booted the machine 5 consequctive times without issue.
Let me try and checkout -stable, maybe something's up with that.
Nope -stable seems to be working as well on the IVB (with modification). I just dug up my T500 and that's actually still running the test kernel. Let me try and build the -stable kernel for that.
4.16.8 works without issue on my T500 with a debian/ubuntu like distro config.
Adding mmarget (who bisected the problem) to the CC.
It might well be something Fedora-specific, then. I just noticed mmarget commented over the weekend noting that they couldn't reproduce the problem without using the initramfs generated during the RPM install of the kernel. mmarget's theory was that it's a race condition that doesn't occur when the initramfs takes long enough to unpack, but I don't know enough about the early boot process *or* how Fedora's generating the initramfs for RPM installs vs "make install" yet to know how likely that is. I'm going to have to do some research.
Thanks for looking into this so quickly and also sorry if this turns out to be a Fedora problem :(
Thanks, Jeremy
On 06/11/2018 01:56 PM, Jeremy Cline wrote:
On 06/11/2018 11:30 AM, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:38:01PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 01:59:15PM +0000, Jeremy Cline wrote:
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
Weird. So Core2 typically triggers mark_tsc_unstable() in either intel_idle or processor_idle. ISTR testing that when I did the patches.
When I make that mark_tsc_unstable() in the idle drivers unconditional and boot my ivb with that, it doesn't want to fail. I've booted the machine 5 consequctive times without issue.
Let me try and checkout -stable, maybe something's up with that.
Nope -stable seems to be working as well on the IVB (with modification). I just dug up my T500 and that's actually still running the test kernel. Let me try and build the -stable kernel for that.
4.16.8 works without issue on my T500 with a debian/ubuntu like distro config.
Adding mmarget (who bisected the problem) to the CC.
It might well be something Fedora-specific, then. I just noticed mmarget commented over the weekend noting that they couldn't reproduce the problem without using the initramfs generated during the RPM install of the kernel. mmarget's theory was that it's a race condition that doesn't occur when the initramfs takes long enough to unpack, but I don't know enough about the early boot process *or* how Fedora's generating the initramfs for RPM installs vs "make install" yet to know how likely that is. I'm going to have to do some research.
Thanks for looking into this so quickly and also sorry if this turns out to be a Fedora problem :(
Attached is the Fedora configuration for 4.16.8, as well, in case you'd like to test it with that.
Thanks, Jeremy
On Mon, Jun 11, 2018 at 3:11 PM, Jeremy Cline jeremy@jcline.org wrote:
On 06/11/2018 01:56 PM, Jeremy Cline wrote:
On 06/11/2018 11:30 AM, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:38:01PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 01:59:15PM +0000, Jeremy Cline wrote:
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
Weird. So Core2 typically triggers mark_tsc_unstable() in either intel_idle or processor_idle. ISTR testing that when I did the patches.
When I make that mark_tsc_unstable() in the idle drivers unconditional and boot my ivb with that, it doesn't want to fail. I've booted the machine 5 consequctive times without issue.
Let me try and checkout -stable, maybe something's up with that.
Nope -stable seems to be working as well on the IVB (with modification). I just dug up my T500 and that's actually still running the test kernel. Let me try and build the -stable kernel for that.
4.16.8 works without issue on my T500 with a debian/ubuntu like distro config.
Adding mmarget (who bisected the problem) to the CC.
It might well be something Fedora-specific, then. I just noticed mmarget commented over the weekend noting that they couldn't reproduce the problem without using the initramfs generated during the RPM install of the kernel. mmarget's theory was that it's a race condition that doesn't occur when the initramfs takes long enough to unpack, but I don't know enough about the early boot process *or* how Fedora's generating the initramfs for RPM installs vs "make install" yet to know how likely that is. I'm going to have to do some research.
Thanks for looking into this so quickly and also sorry if this turns out to be a Fedora problem :(
Attached is the Fedora configuration for 4.16.8, as well, in case you'd like to test it with that.
Thanks, Jeremy
Hi Jeremy,
I've compiled 4.16.8 with your config and booted my machine about 10 times with this kernel, and I'm unable to reproduce the issue.
Maybe it's an issue with the Fedora initramfs?
Diego
On 06/11/2018 03:23 PM, Diego Viola wrote:
On Mon, Jun 11, 2018 at 3:11 PM, Jeremy Cline jeremy@jcline.org wrote:
On 06/11/2018 01:56 PM, Jeremy Cline wrote:
On 06/11/2018 11:30 AM, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:38:01PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
On Mon, Jun 11, 2018 at 01:59:15PM +0000, Jeremy Cline wrote: > A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 > ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, > explicitly setting "tsc=" on the kernel command line causes the boot to > always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
Weird. So Core2 typically triggers mark_tsc_unstable() in either intel_idle or processor_idle. ISTR testing that when I did the patches.
When I make that mark_tsc_unstable() in the idle drivers unconditional and boot my ivb with that, it doesn't want to fail. I've booted the machine 5 consequctive times without issue.
Let me try and checkout -stable, maybe something's up with that.
Nope -stable seems to be working as well on the IVB (with modification). I just dug up my T500 and that's actually still running the test kernel. Let me try and build the -stable kernel for that.
4.16.8 works without issue on my T500 with a debian/ubuntu like distro config.
Adding mmarget (who bisected the problem) to the CC.
It might well be something Fedora-specific, then. I just noticed mmarget commented over the weekend noting that they couldn't reproduce the problem without using the initramfs generated during the RPM install of the kernel. mmarget's theory was that it's a race condition that doesn't occur when the initramfs takes long enough to unpack, but I don't know enough about the early boot process *or* how Fedora's generating the initramfs for RPM installs vs "make install" yet to know how likely that is. I'm going to have to do some research.
Thanks for looking into this so quickly and also sorry if this turns out to be a Fedora problem :(
Attached is the Fedora configuration for 4.16.8, as well, in case you'd like to test it with that.
Thanks, Jeremy
Hi Jeremy,
I've compiled 4.16.8 with your config and booted my machine about 10 times with this kernel, and I'm unable to reproduce the issue.
Thanks for confirming.
Maybe it's an issue with the Fedora initramfs?
Indeed, I'll dig into what exactly is different about the RPM-created initramfs and the one created with "make install" to see if we can narrow this down some more.
Thanks, Jeremy
On Mon, Jun 11, 2018 at 10:59 AM, Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
A few Fedora users have reported[0] a regression starting in v4.16.8 where the boot will hang ~1/3 of the time with the following RCU stall warning:
INFO: rcu_sched detected stalls on CPUs/tasks: o1-...!: (0 ticks this GP) idle=688/0/0 softirq=171/171 fqs=0 o(detected by 0, t=60002 jiffies, g=-142, c=-143, q=9) Sending NMI from CPU 0 to CPU 1: NMI backtrace for cpu 1 skipped: idling at acpi_processor_ffh_cstate_enter+0x65/0xb0 rcu_sched kthread starved for 60002 jiffies! g18446744073709551474 c1844674407370955143 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 -> cpu=1 RCU grace-period kthread stack dump: rcu_sched I 0 9 2 0x80000000 Call Trace: ? __schedule+0x234/0x850 schedule+0x28/0x80 schedule_timeout+0x166/0x380 ? __next_timer_interrupt+0xc0/0xc0 rcu_gp_kthread+0x368/0x830 ? rcu_process_callbacks+0x4f0/0x4f0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40
A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4 ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter, explicitly setting "tsc=" on the kernel command line causes the boot to always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1579925
Thanks, Jeremy
Everything works fine here with 4.16.8+ on my desktop with E5500 CPU.
[diego@dualcore ~]$ uname -a Linux dualcore 4.16.13-2-ARCH #1 SMP PREEMPT Fri Jun 1 18:46:11 UTC 2018 x86_64 GNU/Linux [diego@dualcore ~]$
linux-stable-mirror@lists.linaro.org