Re: [Linux-stable-mirror] 4.14.9 doesn't boot (regression)

30 Dec 2017


      ...
On Dec 29, 2017, at 3:53 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
...
On Fri, Dec 29, 2017 at 2:30 PM, Toralf Förster toralf.foerster@gmx.de wrote:
The bad news - the issue is not solved with the changed cflags.
The good news - I could compile eventually a working config for my desktop  (works fine with 4.14.10 with generic CPU) having a higher screen resolution during boot.
So I made a "make distclean", followed by a "sudo zcat /proc/config.gz > .config", changed the .config to use MCORE2 instead of GENERIC and defined the string "-local" to ensure that the modules directory is really unique.
Then I run "time make -j4 && sudo make modules_install && sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-0 && sudo grub-mkconfig -o /boot/grub/grub.cfg", booted and made 3 fotos which were uploaded to [1], look for IMG_*
Ok, so what does seem to be consistent for everybody is that
double-fault in the NMI backtrace.
So the fact that the NMI always hits on a double-fault does make me
suspect that it's a infinite stream of double-faults, and that is
presumably also what causes the RCU timeout.
And as I pointed out elsewhere (damn two threads), I think that it
would help to simply catch the *first* double-fault.
And I *think* that the only thing that can make a double-fault
silently be re-tried is the CONFIG_X86_ESPFIX64 case, so if you can
build a failing kernel with the CONFIG_X86_ESPFIX64 case disabled in
arch/x86/kernel/traps.c do_double_fault(), that would be interesting.
Double faults use IST, so a double fault that double faults will effectively just start over rather than eventually running out of stack and triple faulting.
But check out the registers. We have RSP = ...28fd8 and CR2 = ...27f08. IOW the double fault stack is ...28000 - ...28fff and we're somehow getting a failed page fault a couple hundred bytes below the bottom of the IST stack.  IOW, I think we're just stuck in a neverending loop of stack overflows.
(Also, Josh, the oops code should have printed the contents of the struct pt_regs at the top of the DF stack.  Any idea why it didn't?)
Toralf, can you send the complete output of:
objdump -dr arch/x86/kernel/traps.o
From the build tree of a nonworking kernel?
Also, you wouldn't happen to be using Gentoo perchance?  I already have two reports of a Gentoo system miscompiling the vDSO due to Gentoo enabling -fstack-check and GCC generating stack check code that is highly suboptimal, actively incorrect, and doesn't even manage to check the stack in a particularly helpful way.
If this is indeed what's going on, I'm going to try to come up with a patch to outright fail the build on these buggy systems.  We could probably fudge the build options to avoid the problem, but Gentoo really just needs fix its toolchain.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Linux-stable-mirror] 4.14.9 doesn't boot (regression)