Hi Greg,
This is a backport to v4.4 of the RFI flush series that went upstream recently. There's also a few other commits I noticed had not made it to v4.4 due to needing manual backports.
cheers
From: "Naveen N. Rao" naveen.n.rao@linux.vnet.ibm.com
commit 844e3be47693f92a108cb1fb3b0606bf25e9c7a6 upstream.
Backport required due to rename of HAVE_BPF_JIT to HAVE_CBPF_JIT upstream - mpe.
Classic BPF JIT was never ported completely to work on little endian powerpc. However, it can be enabled and will crash the system when used. As such, disable use of BPF JIT on ppc64le.
Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Reported-by: Thadeu Lima de Souza Cascardo cascardo@redhat.com Signed-off-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Acked-by: Thadeu Lima de Souza Cascardo cascardo@redhat.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index dfb1ee8c3e06..0c26025e18f9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -129,7 +129,7 @@ config PPC select IRQ_FORCED_THREADING select HAVE_RCU_TABLE_FREE if SMP select HAVE_SYSCALL_TRACEPOINTS - select HAVE_BPF_JIT + select HAVE_BPF_JIT if CPU_BIG_ENDIAN select HAVE_ARCH_JUMP_LABEL select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_HAS_GCOV_PROFILE_ALL
From: Oliver O'Halloran oohall@gmail.com
commit 8f5f525d5b83f7d76a6baf9c4e94d4bf312ea7f6 upstream.
When the kernel is compiled to use 64bit ABIv2 the _GLOBAL() macro does not include a global entry point. A function's global entry point is used when the function is called from a different TOC context and in the kernel this typically means a call from a module into the vmlinux (or vice-versa).
There are a few exported asm functions declared with _GLOBAL() and calling them from a module will likely crash the kernel since any TOC relative load will yield garbage.
flush_icache_range() and flush_dcache_range() are both exported to modules, and use the TOC, so must use _GLOBAL_TOC().
[mpe: We can't use _GLOBAL_TOC() in 4.4 for flush_icache_range() because the function needs to be in the .kprobes.text section, which is done by the _KPROBE() macro. So we have to add a _KPROBE_TOC() macro, which does the TOC setup and also puts the function in the .kprobes.text section]
Fixes: 721aeaa9fdf3 ("powerpc: Build little endian ppc64 kernel with ABIv2") Signed-off-by: Oliver O'Halloran oohall@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/include/asm/ppc_asm.h | 12 ++++++++++++ arch/powerpc/kernel/misc_64.S | 4 ++-- 2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index dd0fc18d8103..160bb2311bbb 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -224,6 +224,16 @@ name: \ .globl name; \ name:
+#define _KPROBE_TOC(name) \ + .section ".kprobes.text","a"; \ + .align 2 ; \ + .type name,@function; \ + .globl name; \ +name: \ +0: addis r2,r12,(.TOC.-0b)@ha; \ + addi r2,r2,(.TOC.-0b)@l; \ + .localentry name,.-name + #define DOTSYM(a) a
#else @@ -261,6 +271,8 @@ name: \ .type GLUE(.,name),@function; \ GLUE(.,name):
+#define _KPROBE_TOC(n) _KPROBE(n) + #define DOTSYM(a) GLUE(.,a)
#endif diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index db475d41b57a..415e58565745 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -66,7 +66,7 @@ PPC64_CACHES: * flush all bytes from start through stop-1 inclusive */
-_KPROBE(flush_icache_range) +_KPROBE_TOC(flush_icache_range) BEGIN_FTR_SECTION PURGE_PREFETCHED_INS blr @@ -117,7 +117,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) * * flush all bytes from start to stop-1 inclusive */ -_GLOBAL(flush_dcache_range) +_GLOBAL_TOC(flush_dcache_range)
/* * Flush the data cache to memory
From: Benjamin Herrenschmidt benh@kernel.crashing.org
commit 5a69aec945d27e78abac9fd032533d3aaebf7c1e upstream.
VSX uses a combination of the old vector registers, the old FP registers and new "second halves" of the FP registers.
Thus when we need to see the VSX state in the thread struct (flush_vsx_to_thread()) or when we'll use the VSX in the kernel (enable_kernel_vsx()) we need to ensure they are all flushed into the thread struct if either of them is individually enabled.
Unfortunately we only tested if the whole VSX was enabled, not if they were individually enabled.
Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available") Signed-off-by: Benjamin Herrenschmidt benh@kernel.crashing.org [mpe: Backported due to changed context] Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/kernel/process.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index cf788d7d7e56..a9b10812cbfd 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -209,7 +209,8 @@ void enable_kernel_vsx(void) WARN_ON(preemptible());
#ifdef CONFIG_SMP - if (current->thread.regs && (current->thread.regs->msr & MSR_VSX)) + if (current->thread.regs && + (current->thread.regs->msr & (MSR_VSX|MSR_VEC|MSR_FP))) giveup_vsx(current); else giveup_vsx(NULL); /* just enable vsx for kernel - force */ @@ -231,7 +232,7 @@ void flush_vsx_to_thread(struct task_struct *tsk) { if (tsk->thread.regs) { preempt_disable(); - if (tsk->thread.regs->msr & MSR_VSX) { + if (tsk->thread.regs->msr & (MSR_VSX|MSR_VEC|MSR_FP)) { #ifdef CONFIG_SMP BUG_ON(tsk != current); #endif
From: Alan Modra amodra@gmail.com
commit c153693d7eb9eeb28478aa2deaaf0b4e7b5ff5e9 upstream.
PowerPC64 uses the symbol .TOC. much as other targets use _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in powerpc parlance, the TOC pointer). Global offset tables are generally local to an executable or shared library, or in the kernel, module. Thus it does not make sense for a module to resolve a relocation against .TOC. to the kernel's .TOC. value. A module has its own .TOC., and indeed the powerpc64 module relocation processing ignores the kernel value of .TOC. and instead calculates a module-local value.
This patch removes code involved in exporting the kernel .TOC., tweaks modpost to ignore an undefined .TOC., and the module loader to twiddle the section symbol so that .TOC. isn't seen as undefined.
Note that if the kernel was compiled with -msingle-pic-base then ELFv2 would not have function global entry code setting up r2. In that case the module call stubs would need to be modified to set up r2 using the kernel .TOC. value, requiring some of this code to be reinstated.
mpe: Furthermore a change in binutils master (not yet released) causes the current way we handle the TOC to no longer work when building with MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be loaded due to there being no version found for TOC.
Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Alan Modra amodra@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/kernel/misc_64.S | 28 ---------------------------- arch/powerpc/kernel/module_64.c | 12 +++++++++--- scripts/mod/modpost.c | 3 ++- 3 files changed, 11 insertions(+), 32 deletions(-)
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 415e58565745..107588295b39 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -701,31 +701,3 @@ _GLOBAL(kexec_sequence) li r5,0 blr /* image->start(physid, image->start, 0); */ #endif /* CONFIG_KEXEC */ - -#ifdef CONFIG_MODULES -#if defined(_CALL_ELF) && _CALL_ELF == 2 - -#ifdef CONFIG_MODVERSIONS -.weak __crc_TOC. -.section "___kcrctab+TOC.","a" -.globl __kcrctab_TOC. -__kcrctab_TOC.: - .llong __crc_TOC. -#endif - -/* - * Export a fake .TOC. since both modpost and depmod will complain otherwise. - * Both modpost and depmod strip the leading . so we do the same here. - */ -.section "__ksymtab_strings","a" -__kstrtab_TOC.: - .asciz "TOC." - -.section "___ksymtab+TOC.","a" -/* This symbol name is important: it's used by modpost to find exported syms */ -.globl __ksymtab_TOC. -__ksymtab_TOC.: - .llong 0 /* .value */ - .llong __kstrtab_TOC. -#endif /* ELFv2 */ -#endif /* MODULES */ diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index e4f7d4eed20c..08b7a40de5f8 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -326,7 +326,10 @@ static void dedotify_versions(struct modversion_info *vers, } }
-/* Undefined symbols which refer to .funcname, hack to funcname (or .TOC.) */ +/* + * Undefined symbols which refer to .funcname, hack to funcname. Make .TOC. + * seem to be defined (value set later). + */ static void dedotify(Elf64_Sym *syms, unsigned int numsyms, char *strtab) { unsigned int i; @@ -334,8 +337,11 @@ static void dedotify(Elf64_Sym *syms, unsigned int numsyms, char *strtab) for (i = 1; i < numsyms; i++) { if (syms[i].st_shndx == SHN_UNDEF) { char *name = strtab + syms[i].st_name; - if (name[0] == '.') + if (name[0] == '.') { + if (strcmp(name+1, "TOC.") == 0) + syms[i].st_shndx = SHN_ABS; syms[i].st_name++; + } } } } @@ -351,7 +357,7 @@ static Elf64_Sym *find_dot_toc(Elf64_Shdr *sechdrs, numsyms = sechdrs[symindex].sh_size / sizeof(Elf64_Sym);
for (i = 1; i < numsyms; i++) { - if (syms[i].st_shndx == SHN_UNDEF + if (syms[i].st_shndx == SHN_ABS && strcmp(strtab + syms[i].st_name, "TOC.") == 0) return &syms[i]; } diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index e080746e1a6b..48958d3cec9e 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -594,7 +594,8 @@ static int ignore_undef_symbol(struct elf_info *info, const char *symname) if (strncmp(symname, "_restgpr0_", sizeof("_restgpr0_") - 1) == 0 || strncmp(symname, "_savegpr0_", sizeof("_savegpr0_") - 1) == 0 || strncmp(symname, "_restvr_", sizeof("_restvr_") - 1) == 0 || - strncmp(symname, "_savevr_", sizeof("_savevr_") - 1) == 0) + strncmp(symname, "_savevr_", sizeof("_savevr_") - 1) == 0 || + strcmp(symname, ".TOC.") == 0) return 1; /* Do not ignore this symbol */ return 0;
From: Michael Neuling mikey@neuling.org
commit 191eccb1580939fb0d47deb405b82a85b0379070 upstream.
A new hypervisor call has been defined to communicate various characteristics of the CPU to guests. Add definitions for the hcall number, flags and a wrapper function.
Signed-off-by: Michael Neuling mikey@neuling.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au [Balbir fixed conflicts in backport] Signed-off-by: Balbir Singh bsingharora@gmail.com --- arch/powerpc/include/asm/hvcall.h | 17 +++++++++++++++++ arch/powerpc/include/asm/plpar_wrappers.h | 14 ++++++++++++++ 2 files changed, 31 insertions(+)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 85bc8c0d257b..51adbde09845 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -239,6 +239,7 @@ #define H_GET_HCA_INFO 0x1B8 #define H_GET_PERF_COUNT 0x1BC #define H_MANAGE_TRACE 0x1C0 +#define H_GET_CPU_CHARACTERISTICS 0x1C8 #define H_FREE_LOGICAL_LAN_BUFFER 0x1D4 #define H_QUERY_INT_STATE 0x1E4 #define H_POLL_PENDING 0x1D8 @@ -285,6 +286,17 @@ #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE 3 #define H_SET_MODE_RESOURCE_LE 4
+/* H_GET_CPU_CHARACTERISTICS return values */ +#define H_CPU_CHAR_SPEC_BAR_ORI31 (1ull << 63) // IBM bit 0 +#define H_CPU_CHAR_BCCTRL_SERIALISED (1ull << 62) // IBM bit 1 +#define H_CPU_CHAR_L1D_FLUSH_ORI30 (1ull << 61) // IBM bit 2 +#define H_CPU_CHAR_L1D_FLUSH_TRIG2 (1ull << 60) // IBM bit 3 +#define H_CPU_CHAR_L1D_THREAD_PRIV (1ull << 59) // IBM bit 4 + +#define H_CPU_BEHAV_FAVOUR_SECURITY (1ull << 63) // IBM bit 0 +#define H_CPU_BEHAV_L1D_FLUSH_PR (1ull << 62) // IBM bit 1 +#define H_CPU_BEHAV_BNDS_CHK_SPEC_BAR (1ull << 61) // IBM bit 2 + #ifndef __ASSEMBLY__
/** @@ -423,6 +435,11 @@ extern long pseries_big_endian_exceptions(void);
#endif /* CONFIG_PPC_PSERIES */
+struct h_cpu_char_result { + u64 character; + u64 behaviour; +}; + #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_HVCALL_H */ diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h index 67859edbf8fd..6e05cb397a5c 100644 --- a/arch/powerpc/include/asm/plpar_wrappers.h +++ b/arch/powerpc/include/asm/plpar_wrappers.h @@ -323,4 +323,18 @@ static inline long plapr_set_watchpoint0(unsigned long dawr0, unsigned long dawr return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR, dawr0, dawrx0); }
+static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p) +{ + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + long rc; + + rc = plpar_hcall(H_GET_CPU_CHARACTERISTICS, retbuf); + if (rc == H_SUCCESS) { + p->character = retbuf[0]; + p->behaviour = retbuf[1]; + } + + return rc; +} + #endif /* _ASM_POWERPC_PLPAR_WRAPPERS_H */
On Sun, Feb 04, 2018 at 03:59:59PM +1100, Michael Ellerman wrote:
From: Michael Neuling mikey@neuling.org
commit 191eccb1580939fb0d47deb405b82a85b0379070 upstream.
A new hypervisor call has been defined to communicate various characteristics of the CPU to guests. Add definitions for the hcall number, flags and a wrapper function.
Signed-off-by: Michael Neuling mikey@neuling.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au [Balbir fixed conflicts in backport] Signed-off-by: Balbir Singh bsingharora@gmail.com
arch/powerpc/include/asm/hvcall.h | 17 +++++++++++++++++ arch/powerpc/include/asm/plpar_wrappers.h | 14 ++++++++++++++ 2 files changed, 31 insertions(+)
Also applied to 4.9.y as it was missing from there.
thanks,
greg k-h
From: Nicholas Piggin npiggin@gmail.com
commit 50e51c13b3822d14ff6df4279423e4b7b2269bc3 upstream.
The rfid/hrfid ((Hypervisor) Return From Interrupt) instruction is used for switching from the kernel to userspace, and from the hypervisor to the guest kernel. However it can and is also used for other transitions, eg. from real mode kernel code to virtual mode kernel code, and it's not always clear from the code what the destination context is.
To make it clearer when reading the code, add macros which encode the expected destination context.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/include/asm/exception-64e.h | 6 ++++++ arch/powerpc/include/asm/exception-64s.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+)
diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index a703452d67b6..555e22d5e07f 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -209,5 +209,11 @@ exc_##label##_book3e: ori r3,r3,vector_offset@l; \ mtspr SPRN_IVOR##vector_number,r3;
+#define RFI_TO_KERNEL \ + rfi + +#define RFI_TO_USER \ + rfi + #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 77f52b26dad6..c8c8a81e3976 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -50,6 +50,35 @@ #define EX_PPR 88 /* SMT thread status register (priority) */ #define EX_CTR 96
+/* Macros for annotating the expected destination of (h)rfid */ + +#define RFI_TO_KERNEL \ + rfid + +#define RFI_TO_USER \ + rfid + +#define RFI_TO_USER_OR_KERNEL \ + rfid + +#define RFI_TO_GUEST \ + rfid + +#define HRFI_TO_KERNEL \ + hrfid + +#define HRFI_TO_USER \ + hrfid + +#define HRFI_TO_USER_OR_KERNEL \ + hrfid + +#define HRFI_TO_GUEST \ + hrfid + +#define HRFI_TO_UNKNOWN \ + hrfid + #ifdef CONFIG_RELOCATABLE #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h) \ ld r12,PACAKBASE(r13); /* get high part of &label */ \
On Sun, Feb 04, 2018 at 04:00:00PM +1100, Michael Ellerman wrote:
From: Nicholas Piggin npiggin@gmail.com
commit 50e51c13b3822d14ff6df4279423e4b7b2269bc3 upstream.
The rfid/hrfid ((Hypervisor) Return From Interrupt) instruction is used for switching from the kernel to userspace, and from the hypervisor to the guest kernel. However it can and is also used for other transitions, eg. from real mode kernel code to virtual mode kernel code, and it's not always clear from the code what the destination context is.
To make it clearer when reading the code, add macros which encode the expected destination context.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au
Also applied to 4.9.y, thanks.
greg k-h
From: Nicholas Piggin npiggin@gmail.com
commit 222f20f140623ef6033491d0103ee0875fe87d35 upstream.
This commit does simple conversions of rfi/rfid to the new macros that include the expected destination context. By simple we mean cases where there is a single well known destination context, and it's simply a matter of substituting the instruction for the appropriate macro.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au [Balbir fixed issues with backporting to stable] Signed-off-by: Balbir Singh bsingharora@gmail.com --- arch/powerpc/include/asm/exception-64s.h | 2 +- arch/powerpc/kernel/entry_64.S | 14 +++++++++----- arch/powerpc/kernel/exceptions-64s.S | 18 +++++++++--------- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 7 +++---- arch/powerpc/kvm/book3s_rmhandlers.S | 7 +++++-- arch/powerpc/kvm/book3s_segment.S | 4 ++-- 6 files changed, 29 insertions(+), 23 deletions(-)
diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index c8c8a81e3976..6dfb6e928f81 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -220,7 +220,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) mtspr SPRN_##h##SRR0,r12; \ mfspr r12,SPRN_##h##SRR1; /* and SRR1 */ \ mtspr SPRN_##h##SRR1,r10; \ - h##rfid; \ + h##RFI_TO_KERNEL; \ b . /* prevent speculative execution */ #define EXCEPTION_PROLOG_PSERIES_1(label, h) \ __EXCEPTION_PROLOG_PSERIES_1(label, h) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index f6fd0332c3a2..fb2da020e0ee 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -36,6 +36,11 @@ #include <asm/hw_irq.h> #include <asm/context_tracking.h> #include <asm/tm.h> +#ifdef CONFIG_PPC_BOOK3S +#include <asm/exception-64s.h> +#else +#include <asm/exception-64e.h> +#endif
/* * System calls. @@ -353,8 +358,7 @@ tabort_syscall: mtmsrd r10, 1 mtspr SPRN_SRR0, r11 mtspr SPRN_SRR1, r12 - - rfid + RFI_TO_USER b . /* prevent speculative execution */ #endif
@@ -1077,7 +1081,7 @@ _GLOBAL(enter_rtas) mtspr SPRN_SRR0,r5 mtspr SPRN_SRR1,r6 - rfid + RFI_TO_KERNEL b . /* prevent speculative execution */
rtas_return_loc: @@ -1102,7 +1106,7 @@ rtas_return_loc:
mtspr SPRN_SRR0,r3 mtspr SPRN_SRR1,r4 - rfid + RFI_TO_KERNEL b . /* prevent speculative execution */
.align 3 @@ -1173,7 +1177,7 @@ _GLOBAL(enter_prom) LOAD_REG_IMMEDIATE(r12, MSR_SF | MSR_ISF | MSR_LE) andc r11,r11,r12 mtsrr1 r11 - rfid + RFI_TO_KERNEL #endif /* CONFIG_PPC_BOOK3E */
1: /* Return from OF */ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b81ccc5fb32d..e3a3b81df363 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -46,7 +46,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \ mtspr SPRN_SRR0,r10 ; \ ld r10,PACAKMSR(r13) ; \ mtspr SPRN_SRR1,r10 ; \ - rfid ; \ + RFI_TO_KERNEL ; \ b . ; /* prevent speculative execution */
#define SYSCALL_PSERIES_3 \ @@ -54,7 +54,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \ 1: mfspr r12,SPRN_SRR1 ; \ xori r12,r12,MSR_LE ; \ mtspr SPRN_SRR1,r12 ; \ - rfid ; /* return to userspace */ \ + RFI_TO_USER ; /* return to userspace */ \ b . ; /* prevent speculative execution */
#if defined(CONFIG_RELOCATABLE) @@ -507,7 +507,7 @@ BEGIN_FTR_SECTION LOAD_HANDLER(r12, machine_check_handle_early) 1: mtspr SPRN_SRR0,r12 mtspr SPRN_SRR1,r11 - rfid + RFI_TO_KERNEL b . /* prevent speculative execution */ 2: /* Stack overflow. Stay on emergency stack and panic. @@ -601,7 +601,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) ld r11,PACA_EXGEN+EX_R11(r13) ld r12,PACA_EXGEN+EX_R12(r13) ld r13,PACA_EXGEN+EX_R13(r13) - HRFID + HRFI_TO_UNKNOWN b . #endif
@@ -666,7 +666,7 @@ masked_##_H##interrupt: \ ld r10,PACA_EXGEN+EX_R10(r13); \ ld r11,PACA_EXGEN+EX_R11(r13); \ GET_SCRATCH0(r13); \ - ##_H##rfid; \ + ##_H##RFI_TO_KERNEL; \ b . MASKED_INTERRUPT() @@ -756,7 +756,7 @@ kvmppc_skip_interrupt: addi r13, r13, 4 mtspr SPRN_SRR0, r13 GET_SCRATCH0(r13) - rfid + RFI_TO_KERNEL b .
kvmppc_skip_Hinterrupt: @@ -768,7 +768,7 @@ kvmppc_skip_Hinterrupt: addi r13, r13, 4 mtspr SPRN_HSRR0, r13 GET_SCRATCH0(r13) - hrfid + HRFI_TO_KERNEL b . #endif
@@ -1439,7 +1439,7 @@ machine_check_handle_early: li r3,MSR_ME andc r10,r10,r3 /* Turn off MSR_ME */ mtspr SPRN_SRR1,r10 - rfid + RFI_TO_KERNEL b . 2: /* @@ -1457,7 +1457,7 @@ machine_check_handle_early: */ bl machine_check_queue_event MACHINE_CHECK_HANDLER_WINDUP - rfid + RFI_TO_USER_OR_KERNEL 9: /* Deliver the machine check to host kernel in V mode. */ MACHINE_CHECK_HANDLER_WINDUP diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index ffab9269bfe4..4463718ae614 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -64,7 +64,7 @@ _GLOBAL_TOC(kvmppc_hv_entry_trampoline) mtmsrd r0,1 /* clear RI in MSR */ mtsrr0 r5 mtsrr1 r6 - RFI + RFI_TO_KERNEL
kvmppc_call_hv_entry: ld r4, HSTATE_KVM_VCPU(r13) @@ -170,7 +170,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtsrr0 r8 mtsrr1 r7 beq cr1, 13f /* machine check */ - RFI + RFI_TO_KERNEL
/* On POWER7, we have external interrupts set to use HSRR0/1 */ 11: mtspr SPRN_HSRR0, r8 @@ -965,8 +965,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r0, VCPU_GPR(R0)(r4) ld r4, VCPU_GPR(R4)(r4) - - hrfid + HRFI_TO_GUEST b .
secondary_too_late: diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 16c4d88ba27d..a328f99a887c 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -46,6 +46,9 @@
#define FUNC(name) name
+#define RFI_TO_KERNEL RFI +#define RFI_TO_GUEST RFI + .macro INTERRUPT_TRAMPOLINE intno
.global kvmppc_trampoline_\intno @@ -141,7 +144,7 @@ kvmppc_handler_skip_ins: GET_SCRATCH0(r13)
/* And get back into the code */ - RFI + RFI_TO_KERNEL #endif
/* @@ -164,6 +167,6 @@ _GLOBAL_TOC(kvmppc_entry_trampoline) ori r5, r5, MSR_EE mtsrr0 r7 mtsrr1 r6 - RFI + RFI_TO_KERNEL
#include "book3s_segment.S" diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index ca8f174289bb..7c982956d709 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -156,7 +156,7 @@ no_dcbz32_on: PPC_LL r9, SVCPU_R9(r3) PPC_LL r3, (SVCPU_R3)(r3)
- RFI + RFI_TO_GUEST kvmppc_handler_trampoline_enter_end:
@@ -389,5 +389,5 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) cmpwi r12, BOOK3S_INTERRUPT_DOORBELL beqa BOOK3S_INTERRUPT_DOORBELL
- RFI + RFI_TO_KERNEL kvmppc_handler_trampoline_exit_end:
On Sun, Feb 04, 2018 at 04:00:01PM +1100, Michael Ellerman wrote:
From: Nicholas Piggin npiggin@gmail.com
commit 222f20f140623ef6033491d0103ee0875fe87d35 upstream.
This commit does simple conversions of rfi/rfid to the new macros that include the expected destination context. By simple we mean cases where there is a single well known destination context, and it's simply a matter of substituting the instruction for the appropriate macro.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au [Balbir fixed issues with backporting to stable] Signed-off-by: Balbir Singh bsingharora@gmail.com
Can you provide a backport to 4.9.y for this?
thanks,
greg k-h
From: Nicholas Piggin npiggin@gmail.com
commit a08f828cf47e6c605af21d2cdec68f84e799c318 upstream.
Similar to the syscall return path, in fast_exception_return we may be returning to user or kernel context. We already have a test for that, because we conditionally restore r13. So use that existing test and branch, and bifurcate the return based on that.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/kernel/entry_64.S | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index fb2da020e0ee..8efd8c402cf7 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -891,7 +891,7 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ACCOUNT_CPU_USER_EXIT(r2, r4) REST_GPR(13, r1) -1: + mtspr SPRN_SRR1,r3
ld r2,_CCR(r1) @@ -904,8 +904,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r3,GPR3(r1) ld r4,GPR4(r1) ld r1,GPR1(r1) + RFI_TO_USER + b . /* prevent speculative execution */ + +1: mtspr SPRN_SRR1,r3 + + ld r2,_CCR(r1) + mtcrf 0xFF,r2 + ld r2,_NIP(r1) + mtspr SPRN_SRR0,r2
- rfid + ld r0,GPR0(r1) + ld r2,GPR2(r1) + ld r3,GPR3(r1) + ld r4,GPR4(r1) + ld r1,GPR1(r1) + RFI_TO_KERNEL b . /* prevent speculative execution */
#endif /* CONFIG_PPC_BOOK3E */
From: Nicholas Piggin npiggin@gmail.com
commit b8e90cb7bc04a509e821e82ab6ed7a8ef11ba333 upstream.
In the syscall exit path we may be returning to user or kernel context. We already have a test for that, because we conditionally restore r13. So use that existing test and branch, and bifurcate the return based on that.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/kernel/entry_64.S | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 8efd8c402cf7..2837232bbffb 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -230,13 +230,23 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) ACCOUNT_CPU_USER_EXIT(r11, r12) HMT_MEDIUM_LOW_HAS_PPR ld r13,GPR13(r1) /* only restore r13 if returning to usermode */ + ld r2,GPR2(r1) + ld r1,GPR1(r1) + mtlr r4 + mtcr r5 + mtspr SPRN_SRR0,r7 + mtspr SPRN_SRR1,r8 + RFI_TO_USER + b . /* prevent speculative execution */ + + /* exit to kernel */ 1: ld r2,GPR2(r1) ld r1,GPR1(r1) mtlr r4 mtcr r5 mtspr SPRN_SRR0,r7 mtspr SPRN_SRR1,r8 - RFI + RFI_TO_KERNEL b . /* prevent speculative execution */
syscall_error:
From: Nicholas Piggin npiggin@gmail.com
commit c7305645eb0c1621351cfc104038831ae87c0053 upstream.
In the SLB miss handler we may be returning to user or kernel. We need to add a check early on and save the result in the cr4 register, and then we bifurcate the return path based on that.
Signed-off-by: Nicholas Piggin npiggin@gmail.com [mpe: Backport to 4.4 based on patch from Balbir] Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/kernel/exceptions-64s.S | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e3a3b81df363..2c6494351604 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1503,6 +1503,8 @@ slb_miss_realmode:
andi. r10,r12,MSR_RI /* check for unrecoverable exception */ beq- 2f + andi. r10,r12,MSR_PR /* check for user mode (PR != 0) */ + bne 1f
.machine push .machine "power4" @@ -1516,7 +1518,23 @@ slb_miss_realmode: ld r11,PACA_EXSLB+EX_R11(r13) ld r12,PACA_EXSLB+EX_R12(r13) ld r13,PACA_EXSLB+EX_R13(r13) - rfid + RFI_TO_KERNEL + b . /* prevent speculative execution */ + +1: +.machine push +.machine "power4" + mtcrf 0x80,r9 + mtcrf 0x01,r9 /* slb_allocate uses cr0 and cr7 */ +.machine pop + + RESTORE_PPR_PACA(PACA_EXSLB, r9) + ld r9,PACA_EXSLB+EX_R9(r13) + ld r10,PACA_EXSLB+EX_R10(r13) + ld r11,PACA_EXSLB+EX_R11(r13) + ld r12,PACA_EXSLB+EX_R12(r13) + ld r13,PACA_EXSLB+EX_R13(r13) + RFI_TO_USER b . /* prevent speculative execution */
2: mfspr r11,SPRN_SRR0 @@ -1525,7 +1543,7 @@ slb_miss_realmode: mtspr SPRN_SRR0,r10 ld r10,PACAKMSR(r13) mtspr SPRN_SRR1,r10 - rfid + RFI_TO_KERNEL b .
unrecov_slb:
On Sun, Feb 04, 2018 at 04:00:04PM +1100, Michael Ellerman wrote:
From: Nicholas Piggin npiggin@gmail.com
commit c7305645eb0c1621351cfc104038831ae87c0053 upstream.
In the SLB miss handler we may be returning to user or kernel. We need to add a check early on and save the result in the cr4 register, and then we bifurcate the return path based on that.
Signed-off-by: Nicholas Piggin npiggin@gmail.com [mpe: Backport to 4.4 based on patch from Balbir] Signed-off-by: Michael Ellerman mpe@ellerman.id.au
arch/powerpc/kernel/exceptions-64s.S | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
also applied to 4.9.y
commit aa8a5e0062ac940f7659394f4817c948dc8c0667 upstream.
On some CPUs we can prevent the Meltdown vulnerability by flushing the L1-D cache on exit from kernel to user mode, and from hypervisor to guest.
This is known to be the case on at least Power7, Power8 and Power9. At this time we do not know the status of the vulnerability on other CPUs such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale CPUs. As more information comes to light we can enable this, or other mechanisms on those CPUs.
The vulnerability occurs when the load of an architecturally inaccessible memory region (eg. userspace load of kernel memory) is speculatively executed to the point where its result can influence the address of a subsequent speculatively executed load.
In order for that to happen, the first load must hit in the L1, because before the load is sent to the L2 the permission check is performed. Therefore if no kernel addresses hit in the L1 the vulnerability can not occur. We can ensure that is the case by flushing the L1 whenever we return to userspace. Similarly for hypervisor vs guest.
In order to flush the L1-D cache on exit, we add a section of nops at each (h)rfi location that returns to a lower privileged context, and patch that with some sequence. Newer firmwares are able to advertise to us that there is a special nop instruction that flushes the L1-D. If we do not see that advertised, we fall back to doing a displacement flush in software.
For guest kernels we support migration between some CPU versions, and different CPUs may use different flush instructions. So that we are prepared to migrate to a machine with a different flush instruction activated, we may have to patch more than one flush instruction at boot if the hypervisor tells us to.
In the end this patch is mostly the work of Nicholas Piggin and Michael Ellerman. However a cast of thousands contributed to analysis of the issue, earlier versions of the patch, back ports testing etc. Many thanks to all of them.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au [Balbir - back ported to stable with changes] Signed-off-by: Balbir Singh bsingharora@gmail.com --- arch/powerpc/include/asm/exception-64s.h | 40 +++++++++++--- arch/powerpc/include/asm/feature-fixups.h | 15 ++++++ arch/powerpc/include/asm/paca.h | 10 ++++ arch/powerpc/include/asm/setup.h | 13 +++++ arch/powerpc/kernel/asm-offsets.c | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 86 +++++++++++++++++++++++++++++++ arch/powerpc/kernel/setup_64.c | 79 ++++++++++++++++++++++++++++ arch/powerpc/kernel/vmlinux.lds.S | 9 ++++ arch/powerpc/lib/feature-fixups.c | 42 +++++++++++++++ 9 files changed, 290 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 6dfb6e928f81..9bddbec441b8 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -50,34 +50,58 @@ #define EX_PPR 88 /* SMT thread status register (priority) */ #define EX_CTR 96
-/* Macros for annotating the expected destination of (h)rfid */ +/* + * Macros for annotating the expected destination of (h)rfid + * + * The nop instructions allow us to insert one or more instructions to flush the + * L1-D cache when returning to userspace or a guest. + */ +#define RFI_FLUSH_SLOT \ + RFI_FLUSH_FIXUP_SECTION; \ + nop; \ + nop; \ + nop
#define RFI_TO_KERNEL \ rfid
#define RFI_TO_USER \ - rfid + RFI_FLUSH_SLOT; \ + rfid; \ + b rfi_flush_fallback
#define RFI_TO_USER_OR_KERNEL \ - rfid + RFI_FLUSH_SLOT; \ + rfid; \ + b rfi_flush_fallback
#define RFI_TO_GUEST \ - rfid + RFI_FLUSH_SLOT; \ + rfid; \ + b rfi_flush_fallback
#define HRFI_TO_KERNEL \ hrfid
#define HRFI_TO_USER \ - hrfid + RFI_FLUSH_SLOT; \ + hrfid; \ + b hrfi_flush_fallback
#define HRFI_TO_USER_OR_KERNEL \ - hrfid + RFI_FLUSH_SLOT; \ + hrfid; \ + b hrfi_flush_fallback
#define HRFI_TO_GUEST \ - hrfid + RFI_FLUSH_SLOT; \ + hrfid; \ + b hrfi_flush_fallback
#define HRFI_TO_UNKNOWN \ - hrfid + RFI_FLUSH_SLOT; \ + hrfid; \ + b hrfi_flush_fallback
#ifdef CONFIG_RELOCATABLE #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h) \ diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index 9a67a38bf7b9..7068bafbb2d6 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -184,4 +184,19 @@ label##3: \ FTR_ENTRY_OFFSET label##1b-label##3b; \ .popsection;
+#define RFI_FLUSH_FIXUP_SECTION \ +951: \ + .pushsection __rfi_flush_fixup,"a"; \ + .align 2; \ +952: \ + FTR_ENTRY_OFFSET 951b-952b; \ + .popsection; + + +#ifndef __ASSEMBLY__ + +extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; + +#endif + #endif /* __ASM_POWERPC_FEATURE_FIXUPS_H */ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 70bd4381f8e6..45e2aefece16 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -192,6 +192,16 @@ struct paca_struct { #endif struct kvmppc_host_state kvm_hstate; #endif +#ifdef CONFIG_PPC_BOOK3S_64 + /* + * rfi fallback flush must be in its own cacheline to prevent + * other paca data leaking into the L1d + */ + u64 exrfi[13] __aligned(0x80); + void *rfi_flush_fallback_area; + u64 l1d_flush_congruence; + u64 l1d_flush_sets; +#endif };
extern struct paca_struct *paca; diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index e9d384cbd021..7916b56f2e60 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -26,6 +26,19 @@ void initmem_init(void); void setup_panic(void); #define ARCH_PANIC_TIMEOUT 180
+void rfi_flush_enable(bool enable); + +/* These are bit flags */ +enum l1d_flush_type { + L1D_FLUSH_NONE = 0x1, + L1D_FLUSH_FALLBACK = 0x2, + L1D_FLUSH_ORI = 0x4, + L1D_FLUSH_MTTRIG = 0x8, +}; + +void __init setup_rfi_flush(enum l1d_flush_type, bool enable); +void do_rfi_flush_fixups(enum l1d_flush_type types); + #endif /* !__ASSEMBLY__ */
#endif /* _ASM_POWERPC_SETUP_H */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 40da69163d51..d92705e3a0c1 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -243,6 +243,10 @@ int main(void) #ifdef CONFIG_PPC_BOOK3S_64 DEFINE(PACAMCEMERGSP, offsetof(struct paca_struct, mc_emergency_sp)); DEFINE(PACA_IN_MCE, offsetof(struct paca_struct, in_mce)); + DEFINE(PACA_RFI_FLUSH_FALLBACK_AREA, offsetof(struct paca_struct, rfi_flush_fallback_area)); + DEFINE(PACA_EXRFI, offsetof(struct paca_struct, exrfi)); + DEFINE(PACA_L1D_FLUSH_CONGRUENCE, offsetof(struct paca_struct, l1d_flush_congruence)); + DEFINE(PACA_L1D_FLUSH_SETS, offsetof(struct paca_struct, l1d_flush_sets)); #endif DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state)); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 2c6494351604..938a30fef031 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1564,6 +1564,92 @@ power4_fixup_nap: blr #endif
+ .globl rfi_flush_fallback +rfi_flush_fallback: + SET_SCRATCH0(r13); + GET_PACA(r13); + std r9,PACA_EXRFI+EX_R9(r13) + std r10,PACA_EXRFI+EX_R10(r13) + std r11,PACA_EXRFI+EX_R11(r13) + std r12,PACA_EXRFI+EX_R12(r13) + std r8,PACA_EXRFI+EX_R13(r13) + mfctr r9 + ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) + ld r11,PACA_L1D_FLUSH_SETS(r13) + ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ + addi r12,r12,8 + mtctr r11 + DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ + + /* order ld/st prior to dcbt stop all streams with flushing */ + sync +1: li r8,0 + .rept 8 /* 8-way set associative */ + ldx r11,r10,r8 + add r8,r8,r12 + xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not + add r8,r8,r11 // Add 0, this creates a dependency on the ldx + .endr + addi r10,r10,128 /* 128 byte cache line */ + bdnz 1b + + mtctr r9 + ld r9,PACA_EXRFI+EX_R9(r13) + ld r10,PACA_EXRFI+EX_R10(r13) + ld r11,PACA_EXRFI+EX_R11(r13) + ld r12,PACA_EXRFI+EX_R12(r13) + ld r8,PACA_EXRFI+EX_R13(r13) + GET_SCRATCH0(r13); + rfid + + .globl hrfi_flush_fallback +hrfi_flush_fallback: + SET_SCRATCH0(r13); + GET_PACA(r13); + std r9,PACA_EXRFI+EX_R9(r13) + std r10,PACA_EXRFI+EX_R10(r13) + std r11,PACA_EXRFI+EX_R11(r13) + std r12,PACA_EXRFI+EX_R12(r13) + std r8,PACA_EXRFI+EX_R13(r13) + mfctr r9 + ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) + ld r11,PACA_L1D_FLUSH_SETS(r13) + ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ + addi r12,r12,8 + mtctr r11 + DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ + + /* order ld/st prior to dcbt stop all streams with flushing */ + sync +1: li r8,0 + .rept 8 /* 8-way set associative */ + ldx r11,r10,r8 + add r8,r8,r12 + xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not + add r8,r8,r11 // Add 0, this creates a dependency on the ldx + .endr + addi r10,r10,128 /* 128 byte cache line */ + bdnz 1b + + mtctr r9 + ld r9,PACA_EXRFI+EX_R9(r13) + ld r10,PACA_EXRFI+EX_R10(r13) + ld r11,PACA_EXRFI+EX_R11(r13) + ld r12,PACA_EXRFI+EX_R12(r13) + ld r8,PACA_EXRFI+EX_R13(r13) + GET_SCRATCH0(r13); + hrfid + /* * Hash table stuff */ diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index a20823210ac0..1a3bd6a35c44 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -834,4 +834,83 @@ static int __init disable_hardlockup_detector(void) return 0; } early_initcall(disable_hardlockup_detector); + +#ifdef CONFIG_PPC_BOOK3S_64 +static enum l1d_flush_type enabled_flush_types; +static void *l1d_flush_fallback_area; +bool rfi_flush; + +static void do_nothing(void *unused) +{ + /* + * We don't need to do the flush explicitly, just enter+exit kernel is + * sufficient, the RFI exit handlers will do the right thing. + */ +} + +void rfi_flush_enable(bool enable) +{ + if (rfi_flush == enable) + return; + + if (enable) { + do_rfi_flush_fixups(enabled_flush_types); + on_each_cpu(do_nothing, NULL, 1); + } else + do_rfi_flush_fixups(L1D_FLUSH_NONE); + + rfi_flush = enable; +} + +static void init_fallback_flush(void) +{ + u64 l1d_size, limit; + int cpu; + + l1d_size = ppc64_caches.dsize; + limit = min(safe_stack_limit(), ppc64_rma_size); + + /* + * Align to L1d size, and size it at 2x L1d size, to catch possible + * hardware prefetch runoff. We don't have a recipe for load patterns to + * reliably avoid the prefetcher. + */ + l1d_flush_fallback_area = __va(memblock_alloc_base(l1d_size * 2, l1d_size, limit)); + memset(l1d_flush_fallback_area, 0, l1d_size * 2); + + for_each_possible_cpu(cpu) { + /* + * The fallback flush is currently coded for 8-way + * associativity. Different associativity is possible, but it + * will be treated as 8-way and may not evict the lines as + * effectively. + * + * 128 byte lines are mandatory. + */ + u64 c = l1d_size / 8; + + paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area; + paca[cpu].l1d_flush_congruence = c; + paca[cpu].l1d_flush_sets = c / 128; + } +} + +void __init setup_rfi_flush(enum l1d_flush_type types, bool enable) +{ + if (types & L1D_FLUSH_FALLBACK) { + pr_info("rfi-flush: Using fallback displacement flush\n"); + init_fallback_flush(); + } + + if (types & L1D_FLUSH_ORI) + pr_info("rfi-flush: Using ori type flush\n"); + + if (types & L1D_FLUSH_MTTRIG) + pr_info("rfi-flush: Using mttrig type flush\n"); + + enabled_flush_types = types; + + rfi_flush_enable(enable); +} +#endif /* CONFIG_PPC_BOOK3S_64 */ #endif diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index d41fd0af8980..072a23a17350 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -72,6 +72,15 @@ SECTIONS /* Read-only data */ RODATA
+#ifdef CONFIG_PPC64 + . = ALIGN(8); + __rfi_flush_fixup : AT(ADDR(__rfi_flush_fixup) - LOAD_OFFSET) { + __start___rfi_flush_fixup = .; + *(__rfi_flush_fixup) + __stop___rfi_flush_fixup = .; + } +#endif + EXCEPTION_TABLE(0)
NOTES :kernel :notes diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 7ce3870d7ddd..a18d648d31a6 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -20,6 +20,7 @@ #include <asm/code-patching.h> #include <asm/page.h> #include <asm/sections.h> +#include <asm/setup.h>
struct fixup_entry { @@ -113,6 +114,47 @@ void do_feature_fixups(unsigned long value, void *fixup_start, void *fixup_end) } }
+#ifdef CONFIG_PPC_BOOK3S_64 +void do_rfi_flush_fixups(enum l1d_flush_type types) +{ + unsigned int instrs[3], *dest; + long *start, *end; + int i; + + start = PTRRELOC(&__start___rfi_flush_fixup), + end = PTRRELOC(&__stop___rfi_flush_fixup); + + instrs[0] = 0x60000000; /* nop */ + instrs[1] = 0x60000000; /* nop */ + instrs[2] = 0x60000000; /* nop */ + + if (types & L1D_FLUSH_FALLBACK) + /* b .+16 to fallback flush */ + instrs[0] = 0x48000010; + + i = 0; + if (types & L1D_FLUSH_ORI) { + instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */ + instrs[i++] = 0x63de0000; /* ori 30,30,0 L1d flush*/ + } + + if (types & L1D_FLUSH_MTTRIG) + instrs[i++] = 0x7c12dba6; /* mtspr TRIG2,r0 (SPR #882) */ + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + + patch_instruction(dest, instrs[0]); + patch_instruction(dest + 1, instrs[1]); + patch_instruction(dest + 2, instrs[2]); + } + + printk(KERN_DEBUG "rfi-flush: patched %d locations\n", i); +} +#endif /* CONFIG_PPC_BOOK3S_64 */ + void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) { long *start, *end;
On Sun, Feb 04, 2018 at 04:00:05PM +1100, Michael Ellerman wrote:
commit aa8a5e0062ac940f7659394f4817c948dc8c0667 upstream.
On some CPUs we can prevent the Meltdown vulnerability by flushing the L1-D cache on exit from kernel to user mode, and from hypervisor to guest.
This is known to be the case on at least Power7, Power8 and Power9. At this time we do not know the status of the vulnerability on other CPUs such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale CPUs. As more information comes to light we can enable this, or other mechanisms on those CPUs.
The vulnerability occurs when the load of an architecturally inaccessible memory region (eg. userspace load of kernel memory) is speculatively executed to the point where its result can influence the address of a subsequent speculatively executed load.
In order for that to happen, the first load must hit in the L1, because before the load is sent to the L2 the permission check is performed. Therefore if no kernel addresses hit in the L1 the vulnerability can not occur. We can ensure that is the case by flushing the L1 whenever we return to userspace. Similarly for hypervisor vs guest.
In order to flush the L1-D cache on exit, we add a section of nops at each (h)rfi location that returns to a lower privileged context, and patch that with some sequence. Newer firmwares are able to advertise to us that there is a special nop instruction that flushes the L1-D. If we do not see that advertised, we fall back to doing a displacement flush in software.
For guest kernels we support migration between some CPU versions, and different CPUs may use different flush instructions. So that we are prepared to migrate to a machine with a different flush instruction activated, we may have to patch more than one flush instruction at boot if the hypervisor tells us to.
In the end this patch is mostly the work of Nicholas Piggin and Michael Ellerman. However a cast of thousands contributed to analysis of the issue, earlier versions of the patch, back ports testing etc. Many thanks to all of them.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au [Balbir - back ported to stable with changes] Signed-off-by: Balbir Singh bsingharora@gmail.com
Also applied to 4.9.y
commit bc9c9304a45480797e13a8e1df96ffcf44fb62fe upstream.
Because there may be some performance overhead of the RFI flush, add kernel command line options to disable it.
We add a sensibly named 'no_rfi_flush' option, but we also hijack the x86 option 'nopti'. The RFI flush is not the same as KPTI, but if we see 'nopti' we can guess that the user is trying to avoid any overhead of Meltdown mitigations, and it means we don't have to educate every one about a different command line option.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/kernel/setup_64.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 1a3bd6a35c44..7db464c2b129 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -838,8 +838,29 @@ early_initcall(disable_hardlockup_detector); #ifdef CONFIG_PPC_BOOK3S_64 static enum l1d_flush_type enabled_flush_types; static void *l1d_flush_fallback_area; +static bool no_rfi_flush; bool rfi_flush;
+static int __init handle_no_rfi_flush(char *p) +{ + pr_info("rfi-flush: disabled on command line."); + no_rfi_flush = true; + return 0; +} +early_param("no_rfi_flush", handle_no_rfi_flush); + +/* + * The RFI flush is not KPTI, but because users will see doco that says to use + * nopti we hijack that option here to also disable the RFI flush. + */ +static int __init handle_no_pti(char *p) +{ + pr_info("rfi-flush: disabling due to 'nopti' on command line.\n"); + handle_no_rfi_flush(NULL); + return 0; +} +early_param("nopti", handle_no_pti); + static void do_nothing(void *unused) { /* @@ -910,7 +931,8 @@ void __init setup_rfi_flush(enum l1d_flush_type types, bool enable)
enabled_flush_types = types;
- rfi_flush_enable(enable); + if (!no_rfi_flush) + rfi_flush_enable(enable); } #endif /* CONFIG_PPC_BOOK3S_64 */ #endif
From: Michael Neuling mikey@neuling.org
commit 8989d56878a7735dfdb234707a2fee6faf631085 upstream.
A new hypervisor call is available which tells the guest settings related to the RFI flush. Use it to query the appropriate flush instruction(s), and whether the flush is required.
Signed-off-by: Michael Neuling mikey@neuling.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/platforms/pseries/setup.c | 37 +++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 36df46eaba24..dd2545fc9947 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -499,6 +499,39 @@ static void __init find_and_init_phbs(void) of_pci_check_probe_only(); }
+static void pseries_setup_rfi_flush(void) +{ + struct h_cpu_char_result result; + enum l1d_flush_type types; + bool enable; + long rc; + + /* Enable by default */ + enable = true; + + rc = plpar_get_cpu_characteristics(&result); + if (rc == H_SUCCESS) { + types = L1D_FLUSH_NONE; + + if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2) + types |= L1D_FLUSH_MTTRIG; + if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30) + types |= L1D_FLUSH_ORI; + + /* Use fallback if nothing set in hcall */ + if (types == L1D_FLUSH_NONE) + types = L1D_FLUSH_FALLBACK; + + if (!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) + enable = false; + } else { + /* Default to fallback if case hcall is not available */ + types = L1D_FLUSH_FALLBACK; + } + + setup_rfi_flush(types, enable); +} + static void __init pSeries_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); @@ -515,7 +548,9 @@ static void __init pSeries_setup_arch(void)
fwnmi_init();
- /* By default, only probe PCI (can be overriden by rtas_pci) */ + pseries_setup_rfi_flush(); + + /* By default, only probe PCI (can be overridden by rtas_pci) */ pci_add_flags(PCI_PROBE_ONLY);
/* Find and initialize PCI host bridges */
From: Oliver O'Halloran oohall@gmail.com
commit 6e032b350cd1fdb830f18f8320ef0e13b4e24094 upstream.
New device-tree properties are available which tell the hypervisor settings related to the RFI flush. Use them to determine the appropriate flush instruction to use, and whether the flush is required.
Signed-off-by: Oliver O'Halloran oohall@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/platforms/powernv/setup.c | 50 ++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index f48afc06ba14..30c6b3b7be90 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -35,13 +35,63 @@ #include <asm/opal.h> #include <asm/kexec.h> #include <asm/smp.h> +#include <asm/tm.h> +#include <asm/setup.h>
#include "powernv.h"
+static void pnv_setup_rfi_flush(void) +{ + struct device_node *np, *fw_features; + enum l1d_flush_type type; + int enable; + + /* Default to fallback in case fw-features are not available */ + type = L1D_FLUSH_FALLBACK; + enable = 1; + + np = of_find_node_by_name(NULL, "ibm,opal"); + fw_features = of_get_child_by_name(np, "fw-features"); + of_node_put(np); + + if (fw_features) { + np = of_get_child_by_name(fw_features, "inst-l1d-flush-trig2"); + if (np && of_property_read_bool(np, "enabled")) + type = L1D_FLUSH_MTTRIG; + + of_node_put(np); + + np = of_get_child_by_name(fw_features, "inst-l1d-flush-ori30,30,0"); + if (np && of_property_read_bool(np, "enabled")) + type = L1D_FLUSH_ORI; + + of_node_put(np); + + /* Enable unless firmware says NOT to */ + enable = 2; + np = of_get_child_by_name(fw_features, "needs-l1d-flush-msr-hv-1-to-0"); + if (np && of_property_read_bool(np, "disabled")) + enable--; + + of_node_put(np); + + np = of_get_child_by_name(fw_features, "needs-l1d-flush-msr-pr-0-to-1"); + if (np && of_property_read_bool(np, "disabled")) + enable--; + + of_node_put(np); + of_node_put(fw_features); + } + + setup_rfi_flush(type, enable > 0); +} + static void __init pnv_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
+ pnv_setup_rfi_flush(); + /* Initialize SMP */ pnv_smp_init();
On Sun, Feb 04, 2018 at 04:00:08PM +1100, Michael Ellerman wrote:
From: Oliver O'Halloran oohall@gmail.com
commit 6e032b350cd1fdb830f18f8320ef0e13b4e24094 upstream.
New device-tree properties are available which tell the hypervisor settings related to the RFI flush. Use them to determine the appropriate flush instruction to use, and whether the flush is required.
Signed-off-by: Oliver O'Halloran oohall@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au
arch/powerpc/platforms/powernv/setup.c | 50 ++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)
Also applied to 4.9.y
commit fd6e440f20b1a4304553775fc55938848ff617c9 upstream.
The recent commit 87590ce6e373 ("sysfs/cpu: Add vulnerability folder") added a generic folder and set of files for reporting information on CPU vulnerabilities. One of those was for meltdown:
/sys/devices/system/cpu/vulnerabilities/meltdown
This commit wires up that file for 64-bit Book3S powerpc.
For now we default to "Vulnerable" unless the RFI flush is enabled. That may not actually be true on all hardware, further patches will refine the reporting based on the CPU/platform etc. But for now we default to being pessimists.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/setup_64.c | 8 ++++++++ 2 files changed, 9 insertions(+)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 0c26025e18f9..e23e555adc3a 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -135,6 +135,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select GENERIC_SMP_IDLE_THREAD select GENERIC_CMOS_UPDATE + select GENERIC_CPU_VULNERABILITIES if PPC_BOOK3S_64 select GENERIC_TIME_VSYSCALL_OLD select GENERIC_CLOCKEVENTS select GENERIC_CLOCKEVENTS_BROADCAST if SMP diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 7db464c2b129..cd7bfec17654 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -934,5 +934,13 @@ void __init setup_rfi_flush(enum l1d_flush_type types, bool enable) if (!no_rfi_flush) rfi_flush_enable(enable); } + +ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) +{ + if (rfi_flush) + return sprintf(buf, "Mitigation: RFI Flush\n"); + + return sprintf(buf, "Vulnerable\n"); +} #endif /* CONFIG_PPC_BOOK3S_64 */ #endif
On Sun, Feb 04, 2018 at 04:00:09PM +1100, Michael Ellerman wrote:
commit fd6e440f20b1a4304553775fc55938848ff617c9 upstream.
The recent commit 87590ce6e373 ("sysfs/cpu: Add vulnerability folder") added a generic folder and set of files for reporting information on CPU vulnerabilities. One of those was for meltdown:
/sys/devices/system/cpu/vulnerabilities/meltdown
This commit wires up that file for 64-bit Book3S powerpc.
For now we default to "Vulnerable" unless the RFI flush is enabled. That may not actually be true on all hardware, further patches will refine the reporting based on the CPU/platform etc. But for now we default to being pessimists.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au
Also applied to 4.9.y and 4.14.y, thanks.
greg k-h
commit 236003e6b5443c45c18e613d2b0d776a9f87540e upstream.
Expose the state of the RFI flush (enabled/disabled) via debugfs, and allow it to be enabled/disabled at runtime.
eg: $ cat /sys/kernel/debug/powerpc/rfi_flush 1 $ echo 0 > /sys/kernel/debug/powerpc/rfi_flush $ cat /sys/kernel/debug/powerpc/rfi_flush 0
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Reviewed-by: Nicholas Piggin npiggin@gmail.com --- arch/powerpc/kernel/setup_64.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index cd7bfec17654..311b1f560f8a 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -38,7 +38,9 @@ #include <linux/hugetlb.h> #include <linux/memory.h> #include <linux/nmi.h> +#include <linux/debugfs.h>
+#include <asm/debug.h> #include <asm/io.h> #include <asm/kdump.h> #include <asm/prom.h> @@ -935,6 +937,35 @@ void __init setup_rfi_flush(enum l1d_flush_type types, bool enable) rfi_flush_enable(enable); }
+#ifdef CONFIG_DEBUG_FS +static int rfi_flush_set(void *data, u64 val) +{ + if (val == 1) + rfi_flush_enable(true); + else if (val == 0) + rfi_flush_enable(false); + else + return -EINVAL; + + return 0; +} + +static int rfi_flush_get(void *data, u64 *val) +{ + *val = rfi_flush ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_rfi_flush, rfi_flush_get, rfi_flush_set, "%llu\n"); + +static __init int rfi_flush_debugfs_init(void) +{ + debugfs_create_file("rfi_flush", 0600, powerpc_debugfs_root, NULL, &fops_rfi_flush); + return 0; +} +device_initcall(rfi_flush_debugfs_init); +#endif + ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { if (rfi_flush)
On Sun, Feb 04, 2018 at 04:00:10PM +1100, Michael Ellerman wrote:
commit 236003e6b5443c45c18e613d2b0d776a9f87540e upstream.
Expose the state of the RFI flush (enabled/disabled) via debugfs, and allow it to be enabled/disabled at runtime.
eg: $ cat /sys/kernel/debug/powerpc/rfi_flush 1 $ echo 0 > /sys/kernel/debug/powerpc/rfi_flush $ cat /sys/kernel/debug/powerpc/rfi_flush 0
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Reviewed-by: Nicholas Piggin npiggin@gmail.com
arch/powerpc/kernel/setup_64.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
Also applied to 4.9.y and 4.14.y, thanks.
greg k-h
On Sun, Feb 04, 2018 at 03:59:54PM +1100, Michael Ellerman wrote:
Hi Greg,
This is a backport to v4.4 of the RFI flush series that went upstream recently. There's also a few other commits I noticed had not made it to v4.4 due to needing manual backports.
I've queued these up, and filled in for the 4.9.y and 4.14.y trees where needed. I think I'm only missing one 4.9.y backport, and have responded to that specific patch.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org