On Mon, Nov 11, 2019 at 06:48:59PM +0100, Greg Kroah-Hartman wrote:
On Mon, Nov 11, 2019 at 09:37:57AM -0800, Sean Christopherson wrote:
On Mon, Nov 11, 2019 at 10:32:05AM +0100, Thomas Lamprecht wrote:
On 10/3/19 5:53 PM, Greg Kroah-Hartman wrote:
From: Sean Christopherson sean.j.christopherson@intel.com
commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.
Manually generate the PDPTR reserved bit mask when explicitly loading PDPTRs. The reserved bits that are being tracked by the MMU reflect the
It seems that a backport of this to stable and distro kernels tickled out some issue[0] for KVM Linux 64bit guests on older than about 8-10 year old Intel CPUs[1].
It manifests specifically when running with EPT disabled (no surprise there). Actually, it probably would reproduce simply with unrestricted guest disabled, but that's beside the point.
The issue is a flawed PAE-paging check in kvm_set_cr3(), which causes KVM to incorrectly load PDPTRs in 64-bit mode and inject a #GP. It's a sneaky little bugger because the "if (is_long_mode() ..." makes it appear to be correct at first glance.
if (is_long_mode(vcpu) && (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63))) return 1; else if (is_pae(vcpu) && is_paging(vcpu) && <--- needs !is_long_mode() !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) return 1;
With unrestricted guest, KVM doesn't intercept writes to CR3 and so doesn't trigger the buggy code. This doesn't fail upstream because the offending code was refactored to encapsulate the PAE checks in a single helper, precisely to avoid this type of headache.
commit bf03d4f9334728bf7c8ffc7de787df48abd6340e Author: Paolo Bonzini pbonzini@redhat.com Date: Thu Jun 6 18:52:44 2019 +0200
KVM: x86: introduce is_pae_paging Checking for 32-bit PAE is quite common around code that fiddles with the PDPTRs. Add a function to compress all checks into a single invocation.
Commit bf03d4f93347 ("KVM: x86: introduce is_pae_paging") doesn't apply cleanly to 4.19 or earlier because of the VMX file movement in 4.20. But, the revelant changes in x86.c do apply cleanly, and I've quadruple checked that the PAE checks in vmx.c are correct, i.e. applying the patch and ignoring the nested.c/vmx.c conflicts would be a viable lazy option.
Basically, booting this kernel as host, then running an KVM guest distro or kernel fails it that guest kernel early in the boot phase without any error or other log to serial console, earlyprintk.
...
[1]: models tested as problematic are: intel core2duo E8500; Xeon E5420; so westmere, conroe and that stuff. AFAICT anything from about pre-2010 which has VMX support (i.e. is 64bit based)
Note, not Westmere, which has EPT and unrestricted guest. Xeon E5420 is Harpertown, a.k.a. Penryn, the shrink of Conroe.
Thanks for figuring this out, can you send us a patch that we can apply to fix this issue in the stable tree?
Can do. A custom backport will be need for 4.20 and earlier, not 4.19 and earlier. I misremembered when we did the VMX refactoring.
For 5.0, 5.1 and 5.2, commit bf03d4f93347 can be applied directly.