* Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Wed, Jan 03, 2018 at 12:46:00AM -0800, Benjamin Gilbert wrote:
[resending with less web]
(adding lkml and x86 developers)
Hi all,
In our regression tests on kernel 4.14.11, we're occasionally seeing a run of "bad pmd" messages during boot, followed by a "BUG: unable to handle kernel paging request". This happens on no more than a couple percent of boots, but we've seen it on AWS HVM, GCE, Oracle Cloud VMs, and local QEMU instances. It always happens immediately after "Loading compiled-in X.509 certificates". I can't reproduce it on 4.14.10, nor, so far, on 4.14.11 with pti=off. Here's a sample backtrace:
A few other things to check:
first please test the latest WIP.x86/pti branch which has a couple of fixes.
In a -stable kernel tree you should be able to do:
git pull --no-tags git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/pti
in particular this recent fix from a couple of hours ago might make a difference:
52994c256df3: x86/pti: Make sure the user/kernel PTEs match
Note that this commit:
694d99d40972: x86/cpu, x86/pti: Do not enable PTI on AMD processors
disables PTI on AMD CPUs - so if you'd like to test it more broadly on all CPUs then you'll need to add "pti=on" to your boot commandline.
Thanks,
Ingo