From: Joerg Roedel jroedel@suse.de
The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff range of kernel memory (with 4-level paging). This range contains the kernels text+data+bss mappings and the module mapping space, but not the direct mapping and the vmalloc area.
This is enough to get an application processors out of real-mode, but for code that switches back to real-mode the trampoline_pgd is missing important parts of the address space. For example, consider this code from arch/x86/kernel/reboot.c, function machine_real_restart() for a 64-bit kernel:
#ifdef CONFIG_X86_32 load_cr3(initial_page_table); #else write_cr3(real_mode_header->trampoline_pgd);
/* Exiting long mode will fail if CR4.PCIDE is set. */ if (boot_cpu_has(X86_FEATURE_PCID)) cr4_clear_bits(X86_CR4_PCIDE); #endif
/* Jump to the identity-mapped low memory code */ #ifdef CONFIG_X86_32 asm volatile("jmpl *%0" : : "rm" (real_mode_header->machine_real_restart_asm), "a" (type)); #else asm volatile("ljmpl *%0" : : "m" (real_mode_header->machine_real_restart_asm), "D" (type)); #endif
The code switches to the trampoline_pgd, which unmaps the direct mapping and also the kernel stack. The call to cr4_clear_bits() will find no stack and crash the machine. The real_mode_header pointer below points into the direct mapping, and dereferencing it also causes a crash.
The reason this does not crash always is only that kernel mappings are global and the CR3 switch does not flush those mappings. But if theses mappings are not in the TLB already, the above code will crash before it can jump to the real-mode stub.
Extend the trampoline_pgd to contain all kernel mappings to prevent these crashes and to make code which runs on this page-table more robust.
Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel jroedel@suse.de --- arch/x86/realmode/init.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 6d98609387ba..c5e29db02a46 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -98,6 +98,7 @@ static void __init setup_real_mode(void) #ifdef CONFIG_X86_64 u64 *trampoline_pgd; u64 efer; + int i; #endif
base = (unsigned char *)real_mode_header; @@ -154,8 +155,17 @@ static void __init setup_real_mode(void) trampoline_header->flags = 0;
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); + + /* Map the real mode stub as virtual == physical */ trampoline_pgd[0] = trampoline_pgd_entry.pgd; - trampoline_pgd[511] = init_top_pgt[511].pgd; + + /* + * Include the entirety of the kernel mapping into the trampoline + * PGD. This way, all mappings present in the normal kernel page + * tables are usable while running on trampoline_pgd. + */ + for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++) + trampoline_pgd[i] = init_top_pgt[i].pgd; #endif
sme_sev_setup_real_mode(trampoline_header);
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 51523ed1c26758de1af7e58730a656875f72f783 Gitweb: https://git.kernel.org/tip/51523ed1c26758de1af7e58730a656875f72f783 Author: Joerg Roedel jroedel@suse.de AuthorDate: Thu, 02 Dec 2021 16:32:26 +01:00 Committer: Borislav Petkov bp@suse.de CommitterDate: Fri, 03 Dec 2021 09:11:43 +01:00
x86/64/mm: Map all kernel memory into trampoline_pgd
The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff range of kernel memory (with 4-level paging). This range contains the kernel's text+data+bss mappings and the module mapping space but not the direct mapping and the vmalloc area.
This is enough to get the application processors out of real-mode, but for code that switches back to real-mode the trampoline_pgd is missing important parts of the address space. For example, consider this code from arch/x86/kernel/reboot.c, function machine_real_restart() for a 64-bit kernel:
#ifdef CONFIG_X86_32 load_cr3(initial_page_table); #else write_cr3(real_mode_header->trampoline_pgd);
/* Exiting long mode will fail if CR4.PCIDE is set. */ if (boot_cpu_has(X86_FEATURE_PCID)) cr4_clear_bits(X86_CR4_PCIDE); #endif
/* Jump to the identity-mapped low memory code */ #ifdef CONFIG_X86_32 asm volatile("jmpl *%0" : : "rm" (real_mode_header->machine_real_restart_asm), "a" (type)); #else asm volatile("ljmpl *%0" : : "m" (real_mode_header->machine_real_restart_asm), "D" (type)); #endif
The code switches to the trampoline_pgd, which unmaps the direct mapping and also the kernel stack. The call to cr4_clear_bits() will find no stack and crash the machine. The real_mode_header pointer below points into the direct mapping, and dereferencing it also causes a crash.
The reason this does not crash always is only that kernel mappings are global and the CR3 switch does not flush those mappings. But if theses mappings are not in the TLB already, the above code will crash before it can jump to the real-mode stub.
Extend the trampoline_pgd to contain all kernel mappings to prevent these crashes and to make code which runs on this page-table more robust.
Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20211202153226.22946-5-joro@8bytes.org --- arch/x86/realmode/init.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 4a3da75..38d24d2 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -72,6 +72,7 @@ static void __init setup_real_mode(void) #ifdef CONFIG_X86_64 u64 *trampoline_pgd; u64 efer; + int i; #endif
base = (unsigned char *)real_mode_header; @@ -128,8 +129,17 @@ static void __init setup_real_mode(void) trampoline_header->flags = 0;
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); + + /* Map the real mode stub as virtual == physical */ trampoline_pgd[0] = trampoline_pgd_entry.pgd; - trampoline_pgd[511] = init_top_pgt[511].pgd; + + /* + * Include the entirety of the kernel mapping into the trampoline + * PGD. This way, all mappings present in the normal kernel page + * tables are usable while running on trampoline_pgd. + */ + for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++) + trampoline_pgd[i] = init_top_pgt[i].pgd; #endif
sme_sev_setup_real_mode(trampoline_header);
linux-stable-mirror@lists.linaro.org