This is a note to let you know that I've just added the patch titled
x86/kvm: Update spectre-v1 mitigation
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86kvm_Update_spectre-v1_mitigation.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86/kvm: Update spectre-v1 mitigation
From: Dan Williams dan.j.williams(a)intel.com
Date: Wed Jan 31 17:47:03 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit 085331dfc6bbe3501fb936e657331ca943827600
Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'.
The lfence can be avoided in this path by using the array_index_nospec()
helper designed for these types of fixes.
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Andrew Honig <ahonig(a)google.com>
Cc: kvm(a)vger.kernel.org
Cc: Jim Mattson <jmattson(a)google.com>
Link: https://lkml.kernel.org/r/151744959670.6342.3001723920950249067.stgit@dwill…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/vmx.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
#include <linux/tboot.h>
#include <linux/hrtimer.h>
#include <linux/frame.h>
+#include <linux/nospec.h>
#include "kvm_cache_regs.h"
#include "x86.h"
@@ -898,21 +899,18 @@ static const unsigned short vmcs_field_t
static inline short vmcs_field_to_offset(unsigned long field)
{
- BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
+ const size_t size = ARRAY_SIZE(vmcs_field_to_offset_table);
+ unsigned short offset;
- if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
+ BUILD_BUG_ON(size > SHRT_MAX);
+ if (field >= size)
return -ENOENT;
- /*
- * FIXME: Mitigation for CVE-2017-5753. To be replaced with a
- * generic mechanism.
- */
- asm("lfence");
-
- if (vmcs_field_to_offset_table[field] == 0)
+ field = array_index_nospec(field, size);
+ offset = vmcs_field_to_offset_table[field];
+ if (offset == 0)
return -ENOENT;
-
- return vmcs_field_to_offset_table[field];
+ return offset;
}
static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.15/x86kvm_Update_spectre-v1_mitigation.patch
queue-4.15/x86_Introduce_barrier_nospec.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.15/Documentation_Document_array_index_nospec.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/x86_Implement_array_index_mask_nospec.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
This is a note to let you know that I've just added the patch titled
x86/get_user: Use pointer masking to limit speculation
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86get_user_Use_pointer_masking_to_limit_speculation.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86/get_user: Use pointer masking to limit speculation
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:02:54 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit c7f631cb07e7da06ac1d231ca178452339e32a94
Quoting Linus:
I do think that it would be a good idea to very expressly document
the fact that it's not that the user access itself is unsafe. I do
agree that things like "get_user()" want to be protected, but not
because of any direct bugs or problems with get_user() and friends,
but simply because get_user() is an excellent source of a pointer
that is obviously controlled from a potentially attacking user
space. So it's a prime candidate for then finding _subsequent_
accesses that can then be used to perturb the cache.
Unlike the __get_user() case get_user() includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier, i.e.
array_index_nospec(). Where the narrowing is performed by:
cmp %limit, %ptr
sbb %mask, %mask
and %mask, %ptr
With respect to speculation the value of %ptr is either less than %limit
or NULL.
Co-developed-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-arch(a)vger.kernel.org
Cc: Kees Cook <keescook(a)chromium.org>
Cc: kernel-hardening(a)lists.openwall.com
Cc: gregkh(a)linuxfoundation.org
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: torvalds(a)linux-foundation.org
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727417469.33451.11804043010080838495.stgit@dwi…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/lib/getuser.S | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,8 @@ ENTRY(__get_user_1)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
1: movzbl (%_ASM_AX),%edx
xor %eax,%eax
@@ -54,6 +56,8 @@ ENTRY(__get_user_2)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
2: movzwl -1(%_ASM_AX),%edx
xor %eax,%eax
@@ -68,6 +72,8 @@ ENTRY(__get_user_4)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
3: movl -3(%_ASM_AX),%edx
xor %eax,%eax
@@ -83,6 +89,8 @@ ENTRY(__get_user_8)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
4: movq -7(%_ASM_AX),%rdx
xor %eax,%eax
@@ -94,6 +102,8 @@ ENTRY(__get_user_8)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user_8
+ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
+ and %_ASM_DX, %_ASM_AX
ASM_STAC
4: movl -7(%_ASM_AX),%edx
5: movl -3(%_ASM_AX),%ecx
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86_Introduce_barrier_nospec.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/x86alternative_Print_unadorned_pointers.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.15/Documentation_Document_array_index_nospec.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/x86_Implement_array_index_mask_nospec.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/moduleretpoline_Warn_about_missing_retpoline_in_module.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86nospec_Fix_header_guards_names.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
x86/entry/64: Remove the SYSCALL64 fast path
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86entry64_Remove_the_SYSCALL64_fast_path.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86/entry/64: Remove the SYSCALL64 fast path
From: Andy Lutomirski luto(a)kernel.org
Date: Sun Jan 28 10:38:49 2018 -0800
From: Andy Lutomirski luto(a)kernel.org
commit 21d375b6b34ff511a507de27bf316b3dde6938d9
The SYCALLL64 fast path was a nice, if small, optimization back in the good
old days when syscalls were actually reasonably fast. Now there is PTI to
slow everything down, and indirect branches are verboten, making everything
messier. The retpoline code in the fast path is particularly nasty.
Just get rid of the fast path. The slow path is barely slower.
[ tglx: Split out the 'push all extra regs' part ]
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Kernel Hardening <kernel-hardening(a)lists.openwall.com>
Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.15171644…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 117 --------------------------------------------
arch/x86/entry/syscall_64.c | 7 --
2 files changed, 2 insertions(+), 122 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -241,86 +241,11 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
TRACE_IRQS_OFF
- /*
- * If we need to do entry work or if we guess we'll need to do
- * exit work, go straight to the slow path.
- */
- movq PER_CPU_VAR(current_task), %r11
- testl $_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
- jnz entry_SYSCALL64_slow_path
-
-entry_SYSCALL_64_fastpath:
- /*
- * Easy case: enable interrupts and issue the syscall. If the syscall
- * needs pt_regs, we'll call a stub that disables interrupts again
- * and jumps to the slow path.
- */
- TRACE_IRQS_ON
- ENABLE_INTERRUPTS(CLBR_NONE)
-#if __SYSCALL_MASK == ~0
- cmpq $__NR_syscall_max, %rax
-#else
- andl $__SYSCALL_MASK, %eax
- cmpl $__NR_syscall_max, %eax
-#endif
- ja 1f /* return -ENOSYS (already in pt_regs->ax) */
- movq %r10, %rcx
-
- /*
- * This call instruction is handled specially in stub_ptregs_64.
- * It might end up jumping to the slow path. If it jumps, RAX
- * and all argument registers are clobbered.
- */
-#ifdef CONFIG_RETPOLINE
- movq sys_call_table(, %rax, 8), %rax
- call __x86_indirect_thunk_rax
-#else
- call *sys_call_table(, %rax, 8)
-#endif
-.Lentry_SYSCALL_64_after_fastpath_call:
-
- movq %rax, RAX(%rsp)
-1:
-
- /*
- * If we get here, then we know that pt_regs is clean for SYSRET64.
- * If we see that no exit work is required (which we are required
- * to check with IRQs off), then we can go straight to SYSRET64.
- */
- DISABLE_INTERRUPTS(CLBR_ANY)
- TRACE_IRQS_OFF
- movq PER_CPU_VAR(current_task), %r11
- testl $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
- jnz 1f
-
- LOCKDEP_SYS_EXIT
- TRACE_IRQS_ON /* user mode is traced as IRQs on */
- movq RIP(%rsp), %rcx
- movq EFLAGS(%rsp), %r11
- addq $6*8, %rsp /* skip extra regs -- they were preserved */
- UNWIND_HINT_EMPTY
- jmp .Lpop_c_regs_except_rcx_r11_and_sysret
-
-1:
- /*
- * The fast path looked good when we started, but something changed
- * along the way and we need to switch to the slow path. Calling
- * raise(3) will trigger this, for example. IRQs are off.
- */
- TRACE_IRQS_ON
- ENABLE_INTERRUPTS(CLBR_ANY)
- SAVE_EXTRA_REGS
- movq %rsp, %rdi
- call syscall_return_slowpath /* returns with IRQs disabled */
- jmp return_from_SYSCALL_64
-
-entry_SYSCALL64_slow_path:
/* IRQs are off. */
SAVE_EXTRA_REGS
movq %rsp, %rdi
call do_syscall_64 /* returns with IRQs disabled */
-return_from_SYSCALL_64:
TRACE_IRQS_IRETQ /* we're about to change IF */
/*
@@ -393,7 +318,6 @@ syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
UNWIND_HINT_EMPTY
POP_EXTRA_REGS
-.Lpop_c_regs_except_rcx_r11_and_sysret:
popq %rsi /* skip r11 */
popq %r10
popq %r9
@@ -424,47 +348,6 @@ syscall_return_via_sysret:
USERGS_SYSRET64
END(entry_SYSCALL_64)
-ENTRY(stub_ptregs_64)
- /*
- * Syscalls marked as needing ptregs land here.
- * If we are on the fast path, we need to save the extra regs,
- * which we achieve by trying again on the slow path. If we are on
- * the slow path, the extra regs are already saved.
- *
- * RAX stores a pointer to the C function implementing the syscall.
- * IRQs are on.
- */
- cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
- jne 1f
-
- /*
- * Called from fast path -- disable IRQs again, pop return address
- * and jump to slow path
- */
- DISABLE_INTERRUPTS(CLBR_ANY)
- TRACE_IRQS_OFF
- popq %rax
- UNWIND_HINT_REGS extra=0
- jmp entry_SYSCALL64_slow_path
-
-1:
- JMP_NOSPEC %rax /* Called from C */
-END(stub_ptregs_64)
-
-.macro ptregs_stub func
-ENTRY(ptregs_\func)
- UNWIND_HINT_FUNC
- leaq \func(%rip), %rax
- jmp stub_ptregs_64
-END(ptregs_\func)
-.endm
-
-/* Instantiate ptregs_stub for each ptregs-using syscall */
-#define __SYSCALL_64_QUAL_(sym)
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
-#include <asm/syscalls_64.h>
-
/*
* %rdi: prev task
* %rsi: next task
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -7,14 +7,11 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>
-#define __SYSCALL_64_QUAL_(sym) sym
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
-
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
#include <asm/syscalls_64.h>
#undef __SYSCALL_64
-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
+#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Mark_constant_arrays_as___initconst.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
x86/entry/64: Push extra regs right away
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86entry64_Push_extra_regs_right_away.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86/entry/64: Push extra regs right away
From: Andy Lutomirski luto(a)kernel.org
Date: Sun Jan 28 10:38:49 2018 -0800
From: Andy Lutomirski luto(a)kernel.org
commit d1f7732009e0549eedf8ea1db948dc37be77fd46
With the fast path removed there is no point in splitting the push of the
normal and the extra register set. Just push the extra regs right away.
[ tglx: Split out from 'x86/entry/64: Remove the SYSCALL64 fast path' ]
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Kernel Hardening <kernel-hardening(a)lists.openwall.com>
Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.15171644…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -236,13 +236,17 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq %r9 /* pt_regs->r9 */
pushq %r10 /* pt_regs->r10 */
pushq %r11 /* pt_regs->r11 */
- sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
- UNWIND_HINT_REGS extra=0
+ pushq %rbx /* pt_regs->rbx */
+ pushq %rbp /* pt_regs->rbp */
+ pushq %r12 /* pt_regs->r12 */
+ pushq %r13 /* pt_regs->r13 */
+ pushq %r14 /* pt_regs->r14 */
+ pushq %r15 /* pt_regs->r15 */
+ UNWIND_HINT_REGS
TRACE_IRQS_OFF
/* IRQs are off. */
- SAVE_EXTRA_REGS
movq %rsp, %rdi
call do_syscall_64 /* returns with IRQs disabled */
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Mark_constant_arrays_as___initconst.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
From: David Woodhouse dwmw(a)amazon.co.uk
Date: Tue Jan 30 14:30:23 2018 +0000
From: David Woodhouse dwmw(a)amazon.co.uk
commit 7fcae1118f5fd44a862aa5c3525248e35ee67c3b
Despite the fact that all the other code there seems to be doing it, just
using set_cpu_cap() in early_intel_init() doesn't actually work.
For CPUs with PKU support, setup_pku() calls get_cpu_cap() after
c->c_init() has set those feature bits. That resets those bits back to what
was queried from the hardware.
Turning the bits off for bad microcode is easy to fix. That can just use
setup_clear_cpu_cap() to force them off for all CPUs.
I was less keen on forcing the feature bits *on* that way, just in case
of inconsistencies. I appreciate that the kernel is going to get this
utterly wrong if CPU features are not consistent, because it has already
applied alternatives by the time secondary CPUs are brought up.
But at least if setup_force_cpu_cap() isn't being used, we might have a
chance of *detecting* the lack of the corresponding bit and either
panicking or refusing to bring the offending CPU online.
So ensure that the appropriate feature bits are set within get_cpu_cap()
regardless of how many extra times it's called.
Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: karahmed(a)amazon.de
Cc: peterz(a)infradead.org
Cc: bp(a)alien8.de
Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/common.c | 21 +++++++++++++++++++++
arch/x86/kernel/cpu/intel.c | 27 ++++++++-------------------
2 files changed, 29 insertions(+), 19 deletions(-)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -750,6 +750,26 @@ static void apply_forced_caps(struct cpu
}
}
+static void init_speculation_control(struct cpuinfo_x86 *c)
+{
+ /*
+ * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
+ * and they also have a different bit for STIBP support. Also,
+ * a hypervisor might have set the individual AMD bits even on
+ * Intel CPUs, for finer-grained selection of what's available.
+ *
+ * We use the AMD bits in 0x8000_0008 EBX as the generic hardware
+ * features, which are visible in /proc/cpuinfo and used by the
+ * kernel. So set those accordingly from the Intel bits.
+ */
+ if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
+ set_cpu_cap(c, X86_FEATURE_IBRS);
+ set_cpu_cap(c, X86_FEATURE_IBPB);
+ }
+ if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
+ set_cpu_cap(c, X86_FEATURE_STIBP);
+}
+
void get_cpu_cap(struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;
@@ -844,6 +864,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
init_scattered_cpuid_features(c);
+ init_speculation_control(c);
/*
* Clear/Set all flags overridden by options, after probe.
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -175,28 +175,17 @@ static void early_init_intel(struct cpui
if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64))
c->microcode = intel_get_microcode_revision();
- /*
- * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
- * and they also have a different bit for STIBP support. Also,
- * a hypervisor might have set the individual AMD bits even on
- * Intel CPUs, for finer-grained selection of what's available.
- */
- if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
- set_cpu_cap(c, X86_FEATURE_IBRS);
- set_cpu_cap(c, X86_FEATURE_IBPB);
- }
- if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
- set_cpu_cap(c, X86_FEATURE_STIBP);
-
/* Now if any of them are set, check the blacklist and clear the lot */
- if ((cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
+ if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) ||
+ cpu_has(c, X86_FEATURE_INTEL_STIBP) ||
+ cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
cpu_has(c, X86_FEATURE_STIBP)) && bad_spectre_microcode(c)) {
pr_warn("Intel Spectre v2 broken microcode detected; disabling Speculation Control\n");
- clear_cpu_cap(c, X86_FEATURE_IBRS);
- clear_cpu_cap(c, X86_FEATURE_IBPB);
- clear_cpu_cap(c, X86_FEATURE_STIBP);
- clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL);
- clear_cpu_cap(c, X86_FEATURE_INTEL_STIBP);
+ setup_clear_cpu_cap(X86_FEATURE_IBRS);
+ setup_clear_cpu_cap(X86_FEATURE_IBPB);
+ setup_clear_cpu_cap(X86_FEATURE_STIBP);
+ setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL);
+ setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP);
}
/*
Patches currently in stable-queue which might be from dwmw(a)amazon.co.uk are
queue-4.15/x86spectre_Simplify_spectre_v2_command_line_parsing.patch
queue-4.15/x86pti_Mark_constant_arrays_as___initconst.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch
queue-4.15/x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86spectre_Check_CONFIG_RETPOLINE_in_command_line_parser.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86speculation_Fix_typo_IBRS_ATT_which_should_be_IBRS_ALL.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch
queue-4.15/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch
This is a note to let you know that I've just added the patch titled
x86/asm: Move 'status' from thread_struct to thread_info
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86asm_Move_status_from_thread_struct_to_thread_info.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86/asm: Move 'status' from thread_struct to thread_info
From: Andy Lutomirski luto(a)kernel.org
Date: Sun Jan 28 10:38:50 2018 -0800
From: Andy Lutomirski luto(a)kernel.org
commit 37a8f7c38339b22b69876d6f5a0ab851565284e3
The TS_COMPAT bit is very hot and is accessed from code paths that mostly
also touch thread_info::flags. Move it into struct thread_info to improve
cache locality.
The only reason it was in thread_struct is that there was a brief period
during which arch-specific fields were not allowed in struct thread_info.
Linus suggested further changing:
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
to:
if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED)))
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
on the theory that frequently dirtying the cacheline even in pure 64-bit
code that never needs to modify status hurts performance. That could be a
reasonable followup patch, but I suspect it matters less on top of this
patch.
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Acked-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Kernel Hardening <kernel-hardening(a)lists.openwall.com>
Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.15171644…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/common.c | 4 ++--
arch/x86/include/asm/processor.h | 2 --
arch/x86/include/asm/syscall.h | 6 +++---
arch/x86/include/asm/thread_info.h | 3 ++-
arch/x86/kernel/process_64.c | 4 ++--
arch/x86/kernel/ptrace.c | 2 +-
arch/x86/kernel/signal.c | 2 +-
7 files changed, 11 insertions(+), 12 deletions(-)
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -206,7 +206,7 @@ __visible inline void prepare_exit_to_us
* special case only applies after poking regs and before the
* very next return to user mode.
*/
- current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
+ ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
#endif
user_enter_irqoff();
@@ -304,7 +304,7 @@ static __always_inline void do_syscall_3
unsigned int nr = (unsigned int)regs->orig_ax;
#ifdef CONFIG_IA32_EMULATION
- current->thread.status |= TS_COMPAT;
+ ti->status |= TS_COMPAT;
#endif
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -460,8 +460,6 @@ struct thread_struct {
unsigned short gsindex;
#endif
- u32 status; /* thread synchronous flags */
-
#ifdef CONFIG_X86_64
unsigned long fsbase;
unsigned long gsbase;
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -60,7 +60,7 @@ static inline long syscall_get_error(str
* TS_COMPAT is set for 32-bit syscall entries and then
* remains set until we return to user mode.
*/
- if (task->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
+ if (task->thread_info.status & (TS_COMPAT|TS_I386_REGS_POKED))
/*
* Sign-extend the value so (int)-EFOO becomes (long)-EFOO
* and will match correctly in comparisons.
@@ -116,7 +116,7 @@ static inline void syscall_get_arguments
unsigned long *args)
{
# ifdef CONFIG_IA32_EMULATION
- if (task->thread.status & TS_COMPAT)
+ if (task->thread_info.status & TS_COMPAT)
switch (i) {
case 0:
if (!n--) break;
@@ -177,7 +177,7 @@ static inline void syscall_set_arguments
const unsigned long *args)
{
# ifdef CONFIG_IA32_EMULATION
- if (task->thread.status & TS_COMPAT)
+ if (task->thread_info.status & TS_COMPAT)
switch (i) {
case 0:
if (!n--) break;
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -55,6 +55,7 @@ struct task_struct;
struct thread_info {
unsigned long flags; /* low level flags */
+ u32 status; /* thread synchronous flags */
};
#define INIT_THREAD_INFO(tsk) \
@@ -221,7 +222,7 @@ static inline int arch_within_stack_fram
#define in_ia32_syscall() true
#else
#define in_ia32_syscall() (IS_ENABLED(CONFIG_IA32_EMULATION) && \
- current->thread.status & TS_COMPAT)
+ current_thread_info()->status & TS_COMPAT)
#endif
/*
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -557,7 +557,7 @@ static void __set_personality_x32(void)
* Pretend to come from a x32 execve.
*/
task_pt_regs(current)->orig_ax = __NR_x32_execve | __X32_SYSCALL_BIT;
- current->thread.status &= ~TS_COMPAT;
+ current_thread_info()->status &= ~TS_COMPAT;
#endif
}
@@ -571,7 +571,7 @@ static void __set_personality_ia32(void)
current->personality |= force_personality32;
/* Prepare the first "return" to user space */
task_pt_regs(current)->orig_ax = __NR_ia32_execve;
- current->thread.status |= TS_COMPAT;
+ current_thread_info()->status |= TS_COMPAT;
#endif
}
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -935,7 +935,7 @@ static int putreg32(struct task_struct *
*/
regs->orig_ax = value;
if (syscall_get_nr(child, regs) >= 0)
- child->thread.status |= TS_I386_REGS_POKED;
+ child->thread_info.status |= TS_I386_REGS_POKED;
break;
case offsetof(struct user32, regs.eflags):
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -787,7 +787,7 @@ static inline unsigned long get_nr_resta
* than the tracee.
*/
#ifdef CONFIG_IA32_EMULATION
- if (current->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
+ if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
return __NR_ia32_restart_syscall;
#endif
#ifdef CONFIG_X86_X32_ABI
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86_Introduce_barrier_nospec.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/x86alternative_Print_unadorned_pointers.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.15/Documentation_Document_array_index_nospec.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/x86_Implement_array_index_mask_nospec.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/moduleretpoline_Warn_about_missing_retpoline_in_module.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86nospec_Fix_header_guards_names.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
x86: Introduce barrier_nospec
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86_Introduce_barrier_nospec.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86: Introduce barrier_nospec
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:02:33 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit b3d7ad85b80bbc404635dca80f5b129f6242bc7a
Rename the open coded form of this instruction sequence from
rdtsc_ordered() into a generic barrier primitive, barrier_nospec().
One of the mitigations for Spectre variant1 vulnerabilities is to fence
speculative execution after successfully validating a bounds check. I.e.
force the result of a bounds check to resolve in the instruction pipeline
to ensure speculative execution honors that result before potentially
operating on out-of-bounds data.
No functional changes.
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Suggested-by: Andi Kleen <ak(a)linux.intel.com>
Suggested-by: Ingo Molnar <mingo(a)redhat.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-arch(a)vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: kernel-hardening(a)lists.openwall.com
Cc: gregkh(a)linuxfoundation.org
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727415361.33451.9049453007262764675.stgit@dwil…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/barrier.h | 4 ++++
arch/x86/include/asm/msr.h | 3 +--
2 files changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -48,6 +48,10 @@ static inline unsigned long array_index_
/* Override the default implementation from linux/nospec.h. */
#define array_index_mask_nospec array_index_mask_nospec
+/* Prevent speculative execution past this barrier. */
+#define barrier_nospec() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+ "lfence", X86_FEATURE_LFENCE_RDTSC)
+
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb() rmb()
#else
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long lon
* that some other imaginary CPU is updating continuously with a
* time stamp.
*/
- alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
- "lfence", X86_FEATURE_LFENCE_RDTSC);
+ barrier_nospec();
return rdtsc();
}
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86_Introduce_barrier_nospec.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/x86alternative_Print_unadorned_pointers.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.15/Documentation_Document_array_index_nospec.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/x86_Implement_array_index_mask_nospec.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/moduleretpoline_Warn_about_missing_retpoline_in_module.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86nospec_Fix_header_guards_names.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:02:39 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit b3bbfb3fb5d25776b8e3f361d2eedaabb0b496cd
For __get_user() paths, do not allow the kernel to speculate on the value
of a user controlled pointer. In addition to the 'stac' instruction for
Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the
access_ok() result to resolve in the pipeline before the CPU might take any
speculative action on the pointer value. Given the cost of 'stac' the
speculation barrier is placed after 'stac' to hopefully overlap the cost of
disabling SMAP with the cost of flushing the instruction pipeline.
Since __get_user is a major kernel interface that deals with user
controlled pointers, the __uaccess_begin_nospec() mechanism will prevent
speculative execution past an access_ok() permission check. While
speculative execution past access_ok() is not enough to lead to a kernel
memory leak, it is a necessary precondition.
To be clear, __uaccess_begin_nospec() is addressing a class of potential
problems near __get_user() usages.
Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used
to protect __get_user(), pointer masking similar to array_index_nospec()
will be used for get_user() since it incorporates a bounds check near the
usage.
uaccess_try_nospec provides the same mechanism for get_user_try.
No functional changes.
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Suggested-by: Andi Kleen <ak(a)linux.intel.com>
Suggested-by: Ingo Molnar <mingo(a)redhat.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-arch(a)vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: kernel-hardening(a)lists.openwall.com
Cc: gregkh(a)linuxfoundation.org
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwil…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/uaccess.h | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
#define __uaccess_begin() stac()
#define __uaccess_end() clac()
+#define __uaccess_begin_nospec() \
+({ \
+ stac(); \
+ barrier_nospec(); \
+})
/*
* This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long bu
__uaccess_begin(); \
barrier();
+#define uaccess_try_nospec do { \
+ current->thread.uaccess_err = 0; \
+ __uaccess_begin_nospec(); \
+
#define uaccess_catch(err) \
__uaccess_end(); \
(err) |= (current->thread.uaccess_err ? -EFAULT : 0); \
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86_Introduce_barrier_nospec.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/x86alternative_Print_unadorned_pointers.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.15/Documentation_Document_array_index_nospec.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/x86_Implement_array_index_mask_nospec.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/moduleretpoline_Warn_about_missing_retpoline_in_module.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86nospec_Fix_header_guards_names.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
x86: Implement array_index_mask_nospec
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86_Implement_array_index_mask_nospec.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86: Implement array_index_mask_nospec
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:02:28 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit babdde2698d482b6c0de1eab4f697cf5856c5859
array_index_nospec() uses a mask to sanitize user controllable array
indexes, i.e. generate a 0 mask if 'index' >= 'size', and a ~0 mask
otherwise. While the default array_index_mask_nospec() handles the
carry-bit from the (index - size) result in software.
The x86 array_index_mask_nospec() does the same, but the carry-bit is
handled in the processor CF flag without conditional instructions in the
control flow.
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-arch(a)vger.kernel.org
Cc: kernel-hardening(a)lists.openwall.com
Cc: gregkh(a)linuxfoundation.org
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727414808.33451.1873237130672785331.stgit@dwil…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/barrier.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,30 @@
#define wmb() asm volatile("sfence" ::: "memory")
#endif
+/**
+ * array_index_mask_nospec() - generate a mask that is ~0UL when the
+ * bounds check succeeds and 0 otherwise
+ * @index: array element index
+ * @size: number of elements in array
+ *
+ * Returns:
+ * 0 - (index < size)
+ */
+static inline unsigned long array_index_mask_nospec(unsigned long index,
+ unsigned long size)
+{
+ unsigned long mask;
+
+ asm ("cmp %1,%2; sbb %0,%0;"
+ :"=r" (mask)
+ :"r"(size),"r" (index)
+ :"cc");
+ return mask;
+}
+
+/* Override the default implementation from linux/nospec.h. */
+#define array_index_mask_nospec array_index_mask_nospec
+
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb() rmb()
#else
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86_Introduce_barrier_nospec.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/x86alternative_Print_unadorned_pointers.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.15/Documentation_Document_array_index_nospec.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/x86_Implement_array_index_mask_nospec.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/moduleretpoline_Warn_about_missing_retpoline_in_module.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86nospec_Fix_header_guards_names.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
vfs, fdtable: Prevent bounds-check bypass via speculative execution
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: vfs, fdtable: Prevent bounds-check bypass via speculative execution
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:03:05 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit 56c30ba7b348b90484969054d561f711ba196507
'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array. In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.
Co-developed-by: Elena Reshetova <elena.reshetova(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-arch(a)vger.kernel.org
Cc: kernel-hardening(a)lists.openwall.com
Cc: gregkh(a)linuxfoundation.org
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: torvalds(a)linux-foundation.org
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727418500.33451.17392199002892248656.stgit@dwi…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/fdtable.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
#include <linux/compiler.h>
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
+#include <linux/nospec.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/fs.h>
@@ -82,8 +83,10 @@ static inline struct file *__fcheck_file
{
struct fdtable *fdt = rcu_dereference_raw(files->fdt);
- if (fd < fdt->max_fds)
+ if (fd < fdt->max_fds) {
+ fd = array_index_nospec(fd, fdt->max_fds);
return rcu_dereference_raw(fdt->fd[fd]);
+ }
return NULL;
}
Patches currently in stable-queue which might be from elena.reshetova(a)intel.com are
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
This is a note to let you know that I've just added the patch titled
objtool: Warn on stripped section symbol
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
objtool_Warn_on_stripped_section_symbol.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: objtool: Warn on stripped section symbol
From: Josh Poimboeuf jpoimboe(a)redhat.com
Date: Mon Jan 29 22:00:41 2018 -0600
From: Josh Poimboeuf jpoimboe(a)redhat.com
commit 830c1e3d16b2c1733cd1ec9c8f4d47a398ae31bc
With the following fix:
2a0098d70640 ("objtool: Fix seg fault with gold linker")
... a seg fault was avoided, but the original seg fault condition in
objtool wasn't fixed. Replace the seg fault with an error message.
Suggested-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/dc4585a70d6b975c99fc51d1957ccdde7bd52f3a.151728434…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/objtool/orc_gen.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -98,6 +98,11 @@ static int create_orc_entry(struct secti
struct orc_entry *orc;
struct rela *rela;
+ if (!insn_sec->sym) {
+ WARN("missing symbol for section %s", insn_sec->name);
+ return -1;
+ }
+
/* populate ORC data */
orc = (struct orc_entry *)u_sec->data->d_buf + idx;
memcpy(orc, o, sizeof(*orc));
Patches currently in stable-queue which might be from mingo(a)kernel.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
objtool: Improve retpoline alternative handling
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
objtool_Improve_retpoline_alternative_handling.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: objtool: Improve retpoline alternative handling
From: Josh Poimboeuf jpoimboe(a)redhat.com
Date: Mon Jan 29 22:00:39 2018 -0600
From: Josh Poimboeuf jpoimboe(a)redhat.com
commit a845c7cf4b4cb5e9e3b2823867892b27646f3a98
Currently objtool requires all retpolines to be:
a) patched in with alternatives; and
b) annotated with ANNOTATE_NOSPEC_ALTERNATIVE.
If you forget to do both of the above, objtool segfaults trying to
dereference a NULL 'insn->call_dest' pointer.
Avoid that situation and print a more helpful error message:
quirks.o: warning: objtool: efi_delete_dummy_variable()+0x99: unsupported intra-function call
quirks.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
Future improvements can be made to make objtool smarter with respect to
retpolines, but this is a good incremental improvement for now.
Reported-and-tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/819e50b6d9c2e1a22e34c1a636c0b2057cc8c6e5.151728434…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/objtool/check.c | 36 ++++++++++++++++--------------------
1 file changed, 16 insertions(+), 20 deletions(-)
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -543,18 +543,14 @@ static int add_call_destinations(struct
dest_off = insn->offset + insn->len + insn->immediate;
insn->call_dest = find_symbol_by_offset(insn->sec,
dest_off);
- /*
- * FIXME: Thanks to retpolines, it's now considered
- * normal for a function to call within itself. So
- * disable this warning for now.
- */
-#if 0
- if (!insn->call_dest) {
- WARN_FUNC("can't find call dest symbol at offset 0x%lx",
- insn->sec, insn->offset, dest_off);
+
+ if (!insn->call_dest && !insn->ignore) {
+ WARN_FUNC("unsupported intra-function call",
+ insn->sec, insn->offset);
+ WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
return -1;
}
-#endif
+
} else if (rela->sym->type == STT_SECTION) {
insn->call_dest = find_symbol_by_offset(rela->sym->sec,
rela->addend+4);
@@ -648,6 +644,8 @@ static int handle_group_alt(struct objto
last_new_insn = insn;
+ insn->ignore = orig_insn->ignore_alts;
+
if (insn->type != INSN_JUMP_CONDITIONAL &&
insn->type != INSN_JUMP_UNCONDITIONAL)
continue;
@@ -729,10 +727,6 @@ static int add_special_section_alts(stru
goto out;
}
- /* Ignore retpoline alternatives. */
- if (orig_insn->ignore_alts)
- continue;
-
new_insn = NULL;
if (!special_alt->group || special_alt->new_len) {
new_insn = find_insn(file, special_alt->new_sec,
@@ -1089,11 +1083,11 @@ static int decode_sections(struct objtoo
if (ret)
return ret;
- ret = add_call_destinations(file);
+ ret = add_special_section_alts(file);
if (ret)
return ret;
- ret = add_special_section_alts(file);
+ ret = add_call_destinations(file);
if (ret)
return ret;
@@ -1720,10 +1714,12 @@ static int validate_branch(struct objtoo
insn->visited = true;
- list_for_each_entry(alt, &insn->alts, list) {
- ret = validate_branch(file, alt->insn, state);
- if (ret)
- return 1;
+ if (!insn->ignore_alts) {
+ list_for_each_entry(alt, &insn->alts, list) {
+ ret = validate_branch(file, alt->insn, state);
+ if (ret)
+ return 1;
+ }
}
switch (insn->type) {
Patches currently in stable-queue which might be from linux(a)roeck-us.net are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
objtool: Add support for alternatives at the end of a section
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: objtool: Add support for alternatives at the end of a section
From: Josh Poimboeuf jpoimboe(a)redhat.com
Date: Mon Jan 29 22:00:40 2018 -0600
From: Josh Poimboeuf jpoimboe(a)redhat.com
commit 17bc33914bcc98ba3c6b426fd1c49587a25c0597
Now that the previous patch gave objtool the ability to read retpoline
alternatives, it shows a new warning:
arch/x86/entry/entry_64.o: warning: objtool: .entry_trampoline: don't know how to handle alternatives at end of section
This is due to the JMP_NOSPEC in entry_SYSCALL_64_trampoline().
Previously, objtool ignored this situation because it wasn't needed, and
it would have required a bit of extra code. Now that this case exists,
add proper support for it.
Signed-off-by: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/2a30a3c2158af47d891a76e69bb1ef347e0443fd.151728434…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/objtool/check.c | 53 +++++++++++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 22 deletions(-)
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -594,7 +594,7 @@ static int handle_group_alt(struct objto
struct instruction *orig_insn,
struct instruction **new_insn)
{
- struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump;
+ struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL;
unsigned long dest_off;
last_orig_insn = NULL;
@@ -610,28 +610,30 @@ static int handle_group_alt(struct objto
last_orig_insn = insn;
}
- if (!next_insn_same_sec(file, last_orig_insn)) {
- WARN("%s: don't know how to handle alternatives at end of section",
- special_alt->orig_sec->name);
- return -1;
- }
-
- fake_jump = malloc(sizeof(*fake_jump));
- if (!fake_jump) {
- WARN("malloc failed");
- return -1;
+ if (next_insn_same_sec(file, last_orig_insn)) {
+ fake_jump = malloc(sizeof(*fake_jump));
+ if (!fake_jump) {
+ WARN("malloc failed");
+ return -1;
+ }
+ memset(fake_jump, 0, sizeof(*fake_jump));
+ INIT_LIST_HEAD(&fake_jump->alts);
+ clear_insn_state(&fake_jump->state);
+
+ fake_jump->sec = special_alt->new_sec;
+ fake_jump->offset = -1;
+ fake_jump->type = INSN_JUMP_UNCONDITIONAL;
+ fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
+ fake_jump->ignore = true;
}
- memset(fake_jump, 0, sizeof(*fake_jump));
- INIT_LIST_HEAD(&fake_jump->alts);
- clear_insn_state(&fake_jump->state);
-
- fake_jump->sec = special_alt->new_sec;
- fake_jump->offset = -1;
- fake_jump->type = INSN_JUMP_UNCONDITIONAL;
- fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
- fake_jump->ignore = true;
if (!special_alt->new_len) {
+ if (!fake_jump) {
+ WARN("%s: empty alternative at end of section",
+ special_alt->orig_sec->name);
+ return -1;
+ }
+
*new_insn = fake_jump;
return 0;
}
@@ -654,8 +656,14 @@ static int handle_group_alt(struct objto
continue;
dest_off = insn->offset + insn->len + insn->immediate;
- if (dest_off == special_alt->new_off + special_alt->new_len)
+ if (dest_off == special_alt->new_off + special_alt->new_len) {
+ if (!fake_jump) {
+ WARN("%s: alternative jump to end of section",
+ special_alt->orig_sec->name);
+ return -1;
+ }
insn->jump_dest = fake_jump;
+ }
if (!insn->jump_dest) {
WARN_FUNC("can't find alternative jump destination",
@@ -670,7 +678,8 @@ static int handle_group_alt(struct objto
return -1;
}
- list_add(&fake_jump->list, &last_new_insn->list);
+ if (fake_jump)
+ list_add(&fake_jump->list, &last_new_insn->list);
return 0;
}
Patches currently in stable-queue which might be from jpoimboe(a)redhat.com are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86alternative_Print_unadorned_pointers.patch
queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.15/x86nospec_Fix_header_guards_names.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
nl80211: Sanitize array index in parse_txq_params
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nl80211_Sanitize_array_index_in_parse_txq_params.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: nl80211: Sanitize array index in parse_txq_params
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:03:15 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit 259d8c1e984318497c84eef547bbb6b1d9f4eb05
Wireless drivers rely on parse_txq_params to validate that txq_params->ac
is less than NL80211_NUM_ACS by the time the low-level driver's ->conf_tx()
handler is called. Use a new helper, array_index_nospec(), to sanitize
txq_params->ac with respect to speculation. I.e. ensure that any
speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).
Reported-by: Christian Lamparter <chunkeey(a)gmail.com>
Reported-by: Elena Reshetova <elena.reshetova(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Johannes Berg <johannes(a)sipsolutions.net>
Cc: linux-arch(a)vger.kernel.org
Cc: kernel-hardening(a)lists.openwall.com
Cc: gregkh(a)linuxfoundation.org
Cc: linux-wireless(a)vger.kernel.org
Cc: torvalds(a)linux-foundation.org
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727419584.33451.7700736761686184303.stgit@dwil…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/wireless/nl80211.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
#include <linux/nl80211.h>
#include <linux/rtnetlink.h>
#include <linux/netlink.h>
+#include <linux/nospec.h>
#include <linux/etherdevice.h>
#include <net/net_namespace.h>
#include <net/genetlink.h>
@@ -2056,20 +2057,22 @@ static const struct nla_policy txq_param
static int parse_txq_params(struct nlattr *tb[],
struct ieee80211_txq_params *txq_params)
{
+ u8 ac;
+
if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
!tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
!tb[NL80211_TXQ_ATTR_AIFS])
return -EINVAL;
- txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+ ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
- if (txq_params->ac >= NL80211_NUM_ACS)
+ if (ac >= NL80211_NUM_ACS)
return -EINVAL;
-
+ txq_params->ac = array_index_nospec(ac, NL80211_NUM_ACS);
return 0;
}
Patches currently in stable-queue which might be from chunkeey(a)gmail.com are
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
This is a note to let you know that I've just added the patch titled
array_index_nospec: Sanitize speculative array de-references
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
array_index_nospec_Sanitize_speculative_array_de-references.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: array_index_nospec: Sanitize speculative array de-references
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:02:22 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit f3804203306e098dae9ca51540fcd5eb700d7f40
array_index_nospec() is proposed as a generic mechanism to mitigate
against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary
checks via speculative execution. The array_index_nospec()
implementation is expected to be safe for current generation CPUs across
multiple architectures (ARM, x86).
Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.
Co-developed-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Co-developed-by: Alexei Starovoitov <ast(a)kernel.org>
Suggested-by: Cyril Novikov <cnovikov(a)lynx.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-arch(a)vger.kernel.org
Cc: kernel-hardening(a)lists.openwall.com
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: gregkh(a)linuxfoundation.org
Cc: torvalds(a)linux-foundation.org
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwi…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/nospec.h | 72 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Linus Torvalds. All rights reserved.
+// Copyright(c) 2018 Alexei Starovoitov. All rights reserved.
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef _LINUX_NOSPEC_H
+#define _LINUX_NOSPEC_H
+
+/**
+ * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise
+ * @index: array element index
+ * @size: number of elements in array
+ *
+ * When @index is out of bounds (@index >= @size), the sign bit will be
+ * set. Extend the sign bit to all bits and invert, giving a result of
+ * zero for an out of bounds index, or ~0 if within bounds [0, @size).
+ */
+#ifndef array_index_mask_nospec
+static inline unsigned long array_index_mask_nospec(unsigned long index,
+ unsigned long size)
+{
+ /*
+ * Warn developers about inappropriate array_index_nospec() usage.
+ *
+ * Even if the CPU speculates past the WARN_ONCE branch, the
+ * sign bit of @index is taken into account when generating the
+ * mask.
+ *
+ * This warning is compiled out when the compiler can infer that
+ * @index and @size are less than LONG_MAX.
+ */
+ if (WARN_ONCE(index > LONG_MAX || size > LONG_MAX,
+ "array_index_nospec() limited to range of [0, LONG_MAX]\n"))
+ return 0;
+
+ /*
+ * Always calculate and emit the mask even if the compiler
+ * thinks the mask is not needed. The compiler does not take
+ * into account the value of @index under speculation.
+ */
+ OPTIMIZER_HIDE_VAR(index);
+ return ~(long)(index | (size - 1UL - index)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/*
+ * array_index_nospec - sanitize an array index after a bounds check
+ *
+ * For a code sequence like:
+ *
+ * if (index < size) {
+ * index = array_index_nospec(index, size);
+ * val = array[index];
+ * }
+ *
+ * ...if the CPU speculates past the bounds check then
+ * array_index_nospec() will clamp the index within the range of [0,
+ * size).
+ */
+#define array_index_nospec(index, size) \
+({ \
+ typeof(index) _i = (index); \
+ typeof(size) _s = (size); \
+ unsigned long _mask = array_index_mask_nospec(_i, _s); \
+ \
+ BUILD_BUG_ON(sizeof(_i) > sizeof(long)); \
+ BUILD_BUG_ON(sizeof(_s) > sizeof(long)); \
+ \
+ _i &= _mask; \
+ _i; \
+})
+#endif /* _LINUX_NOSPEC_H */
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86_Introduce_barrier_nospec.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/x86alternative_Print_unadorned_pointers.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.15/Documentation_Document_array_index_nospec.patch
queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/x86_Implement_array_index_mask_nospec.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.15/moduleretpoline_Warn_about_missing_retpoline_in_module.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86nospec_Fix_header_guards_names.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.15/x86entry64_Push_extra_regs_right_away.patch
queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/objtool_Warn_on_stripped_section_symbol.patch
queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.15/objtool_Improve_retpoline_alternative_handling.patch
This is a note to let you know that I've just added the patch titled
KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
From: KarimAllah Ahmed karahmed(a)amazon.de
Date: Thu Feb 1 22:59:42 2018 +0100
From: KarimAllah Ahmed karahmed(a)amazon.de
commit b7b27aa011a1df42728d1768fc181d9ce69e6911
[dwmw2: Stop using KF() for bits in it, too]
Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Cc: kvm(a)vger.kernel.org
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/cpuid.c | 8 +++-----
arch/x86/kvm/cpuid.h | 1 +
2 files changed, 4 insertions(+), 5 deletions(-)
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
#define F(x) bit(X86_FEATURE_##x)
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
#define KF(x) bit(KVM_CPUID_BIT_##x)
int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
- KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
- entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+ cpuid_mask(&entry->edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cp
[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
+ [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
};
static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
Patches currently in stable-queue which might be from karahmed(a)amazon.de are
queue-4.15/x86spectre_Simplify_spectre_v2_command_line_parsing.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch
queue-4.15/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch
This is a note to let you know that I've just added the patch titled
KVM/x86: Add IBPB support
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
KVMx86_Add_IBPB_support.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: KVM/x86: Add IBPB support
From: Ashok Raj ashok.raj(a)intel.com
Date: Thu Feb 1 22:59:43 2018 +0100
From: Ashok Raj ashok.raj(a)intel.com
commit 15d45071523d89b3fb7372e2135fbd72f6af9506
The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.
Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.
IBPB helps mitigate against three potential attacks:
* Mitigate guests from being attacked by other guests.
- This is addressed by issing IBPB when we do a guest switch.
* Mitigate attacks from guest/ring3->host/ring3.
These would require a IBPB during context switch in host, or after
VMEXIT. The host process has two ways to mitigate
- Either it can be compiled with retpoline
- If its going through context switch, and has set !dumpable then
there is a IBPB in that path.
(Tim's patch: https://patchwork.kernel.org/patch/10192871)
- The case where after a VMEXIT you return back to Qemu might make
Qemu attackable from guest when Qemu isn't compiled with retpoline.
There are issues reported when doing IBPB on every VMEXIT that resulted
in some tsc calibration woes in guest.
* Mitigate guest/ring0->host/ring0 attacks.
When host kernel is using retpoline it is safe against these attacks.
If host kernel isn't using retpoline we might need to do a IBPB flush on
every VMEXIT.
Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.
* IBPB is issued only for SVM during svm_free_vcpu().
VMX has a vmclear and SVM doesn't. Follow discussion here:
https://lkml.org/lkml/2018/1/15/146
Please refer to the following spec for more details on the enumeration
and control.
Refer here to get documentation about mitigations.
https://software.intel.com/en-us/side-channel-security-support
[peterz: rebase and changelog rewrite]
[karahmed: - rebase
- vmx: expose PRED_CMD if guest has it in CPUID
- svm: only pass through IBPB if guest has it in CPUID
- vmx: support !cpu_has_vmx_msr_bitmap()]
- vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]
Signed-off-by: Ashok Raj <ashok.raj(a)intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: kvm(a)vger.kernel.org
Cc: Asit Mallick <asit.k.mallick(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Jun Nakajima <jun.nakajima(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.…
Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/cpuid.c | 11 ++++++-
arch/x86/kvm/svm.c | 28 +++++++++++++++++
arch/x86/kvm/vmx.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 116 insertions(+), 3 deletions(-)
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
+ /* cpuid 0x80000008.ebx */
+ const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+ F(IBPB);
+
/* cpuid 0xC0000001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
- entry->ebx = entry->edx = 0;
+ entry->edx = 0;
+ /* IBPB isn't necessarily present in hardware cpuid */
+ if (boot_cpu_has(X86_FEATURE_IBPB))
+ entry->ebx |= F(IBPB);
+ entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+ cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
}
case 0x80000019:
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -249,6 +249,7 @@ static const struct svm_direct_access_ms
{ .index = MSR_CSTAR, .always = true },
{ .index = MSR_SYSCALL_MASK, .always = true },
#endif
+ { .index = MSR_IA32_PRED_CMD, .always = false },
{ .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
@@ -529,6 +530,7 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;
struct page *save_area;
+ struct vmcb *current_vmcb;
};
static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcp
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
+ /*
+ * The vmcb page can be recycled, causing a false negative in
+ * svm_vcpu_load(). So do a full IBPB now.
+ */
+ indirect_branch_prediction_barrier();
}
static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int i;
if (unlikely(cpu != vcpu->cpu)) {
@@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcp
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+ if (sd->current_vmcb != svm->vmcb) {
+ sd->current_vmcb = svm->vmcb;
+ indirect_branch_prediction_barrier();
+ }
avic_vcpu_load(vcpu, cpu);
}
@@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu *
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+ case MSR_IA32_PRED_CMD:
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+ return 1;
+
+ if (data & ~PRED_CMD_IBPB)
+ return 1;
+
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+ if (is_guest_mode(vcpu))
+ break;
+ set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+ break;
case MSR_STAR:
svm->vmcb->save.star = data;
break;
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,7 @@ struct vcpu_vmx {
u64 msr_host_kernel_gs_base;
u64 msr_guest_kernel_gs_base;
#endif
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -934,6 +935,8 @@ static void vmx_set_nmi_mask(struct kvm_
static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+ u32 msr, int type);
static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -1905,6 +1908,29 @@ static void update_exception_bitmap(stru
vmcs_write32(EXCEPTION_BITMAP, eb);
}
+/*
+ * Check if MSR is intercepted for L01 MSR bitmap.
+ */
+static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
+{
+ unsigned long *msr_bitmap;
+ int f = sizeof(unsigned long);
+
+ if (!cpu_has_vmx_msr_bitmap())
+ return true;
+
+ msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+
+ if (msr <= 0x1fff) {
+ return !!test_bit(msr, msr_bitmap + 0x800 / f);
+ } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+ msr &= 0x1fff;
+ return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+ }
+
+ return true;
+}
+
static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
unsigned long entry, unsigned long exit)
{
@@ -2283,6 +2309,7 @@ static void vmx_vcpu_load(struct kvm_vcp
if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
+ indirect_branch_prediction_barrier();
}
if (!already_loaded) {
@@ -3340,6 +3367,34 @@ static int vmx_set_msr(struct kvm_vcpu *
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
+ case MSR_IA32_PRED_CMD:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+ return 1;
+
+ if (data & ~PRED_CMD_IBPB)
+ return 1;
+
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+ /*
+ * For non-nested:
+ * When it's written (to non-zero) for the first time, pass
+ * it through.
+ *
+ * For nested:
+ * The handling of the MSR bitmap for L2 guests is done in
+ * nested_vmx_merge_msr_bitmap. We should not touch the
+ * vmcs02.msr_bitmap here since it gets completely overwritten
+ * in the merging.
+ */
+ vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+ MSR_TYPE_W);
+ break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -10042,9 +10097,23 @@ static inline bool nested_vmx_merge_msr_
struct page *page;
unsigned long *msr_bitmap_l1;
unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
+ /*
+ * pred_cmd is trying to verify two things:
+ *
+ * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
+ * ensures that we do not accidentally generate an L02 MSR bitmap
+ * from the L12 MSR bitmap that is too permissive.
+ * 2. That L1 or L2s have actually used the MSR. This avoids
+ * unnecessarily merging of the bitmap if the MSR is unused. This
+ * works properly because we only update the L01 MSR bitmap lazily.
+ * So even if L0 should pass L1 these MSRs, the L01 bitmap is only
+ * updated to reflect this when L1 (or its L2s) actually write to
+ * the MSR.
+ */
+ bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
- /* This shortcut is ok because we support only x2APIC MSRs so far. */
- if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+ if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+ !pred_cmd)
return false;
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10077,6 +10146,13 @@ static inline bool nested_vmx_merge_msr_
MSR_TYPE_W);
}
}
+
+ if (pred_cmd)
+ nested_vmx_disable_intercept_for_msr(
+ msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PRED_CMD,
+ MSR_TYPE_W);
+
kunmap(page);
kvm_release_page_clean(page);
Patches currently in stable-queue which might be from ashok.raj(a)intel.com are
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
This is a note to let you know that I've just added the patch titled
KVM: nVMX: Eliminate vmcs02 pool
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
KVM_nVMX_Eliminate_vmcs02_pool.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: KVM: nVMX: Eliminate vmcs02 pool
From: Jim Mattson jmattson(a)google.com
Date: Mon Nov 27 17:22:25 2017 -0600
From: Jim Mattson jmattson(a)google.com
commit de3a0021a60635de96aa92713c1a31a96747d72c
The potential performance advantages of a vmcs02 pool have never been
realized. To simplify the code, eliminate the pool. Instead, a single
vmcs02 is allocated per VCPU when the VCPU enters VMX operation.
Cc: stable(a)vger.kernel.org # prereq for Spectre mitigation
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Mark Kanda <mark.kanda(a)oracle.com>
Reviewed-by: Ameya More <ameya.more(a)oracle.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/vmx.c | 146 ++++++++---------------------------------------------
1 file changed, 23 insertions(+), 123 deletions(-)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -185,7 +185,6 @@ module_param(ple_window_max, int, S_IRUG
extern const ulong vmx_return;
#define NR_AUTOLOAD_MSRS 8
-#define VMCS02_POOL_SIZE 1
struct vmcs {
u32 revision_id;
@@ -226,7 +225,7 @@ struct shared_msr_entry {
* stored in guest memory specified by VMPTRLD, but is opaque to the guest,
* which must access it using VMREAD/VMWRITE/VMCLEAR instructions.
* More than one of these structures may exist, if L1 runs multiple L2 guests.
- * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the
+ * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the
* underlying hardware which will be used to run L2.
* This structure is packed to ensure that its layout is identical across
* machines (necessary for live migration).
@@ -409,13 +408,6 @@ struct __packed vmcs12 {
*/
#define VMCS12_SIZE 0x1000
-/* Used to remember the last vmcs02 used for some recently used vmcs12s */
-struct vmcs02_list {
- struct list_head list;
- gpa_t vmptr;
- struct loaded_vmcs vmcs02;
-};
-
/*
* The nested_vmx structure is part of vcpu_vmx, and holds information we need
* for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -440,15 +432,15 @@ struct nested_vmx {
*/
bool sync_shadow_vmcs;
- /* vmcs02_list cache of VMCSs recently used to run L2 guests */
- struct list_head vmcs02_pool;
- int vmcs02_num;
bool change_vmcs01_virtual_x2apic_mode;
/* L2 must run next, and mustn't decide to exit to L1. */
bool nested_run_pending;
+
+ struct loaded_vmcs vmcs02;
+
/*
- * Guest pages referred to in vmcs02 with host-physical pointers, so
- * we must keep them pinned while L2 runs.
+ * Guest pages referred to in the vmcs02 with host-physical
+ * pointers, so we must keep them pinned while L2 runs.
*/
struct page *apic_access_page;
struct page *virtual_apic_page;
@@ -6974,94 +6966,6 @@ static int handle_monitor(struct kvm_vcp
}
/*
- * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12.
- * We could reuse a single VMCS for all the L2 guests, but we also want the
- * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this
- * allows keeping them loaded on the processor, and in the future will allow
- * optimizations where prepare_vmcs02 doesn't need to set all the fields on
- * every entry if they never change.
- * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE
- * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first.
- *
- * The following functions allocate and free a vmcs02 in this pool.
- */
-
-/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */
-static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx)
-{
- struct vmcs02_list *item;
- list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
- if (item->vmptr == vmx->nested.current_vmptr) {
- list_move(&item->list, &vmx->nested.vmcs02_pool);
- return &item->vmcs02;
- }
-
- if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) {
- /* Recycle the least recently used VMCS. */
- item = list_last_entry(&vmx->nested.vmcs02_pool,
- struct vmcs02_list, list);
- item->vmptr = vmx->nested.current_vmptr;
- list_move(&item->list, &vmx->nested.vmcs02_pool);
- return &item->vmcs02;
- }
-
- /* Create a new VMCS */
- item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
- if (!item)
- return NULL;
- item->vmcs02.vmcs = alloc_vmcs();
- item->vmcs02.shadow_vmcs = NULL;
- if (!item->vmcs02.vmcs) {
- kfree(item);
- return NULL;
- }
- loaded_vmcs_init(&item->vmcs02);
- item->vmptr = vmx->nested.current_vmptr;
- list_add(&(item->list), &(vmx->nested.vmcs02_pool));
- vmx->nested.vmcs02_num++;
- return &item->vmcs02;
-}
-
-/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
-static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
-{
- struct vmcs02_list *item;
- list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
- if (item->vmptr == vmptr) {
- free_loaded_vmcs(&item->vmcs02);
- list_del(&item->list);
- kfree(item);
- vmx->nested.vmcs02_num--;
- return;
- }
-}
-
-/*
- * Free all VMCSs saved for this vcpu, except the one pointed by
- * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs
- * must be &vmx->vmcs01.
- */
-static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx)
-{
- struct vmcs02_list *item, *n;
-
- WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01);
- list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
- /*
- * Something will leak if the above WARN triggers. Better than
- * a use-after-free.
- */
- if (vmx->loaded_vmcs == &item->vmcs02)
- continue;
-
- free_loaded_vmcs(&item->vmcs02);
- list_del(&item->list);
- kfree(item);
- vmx->nested.vmcs02_num--;
- }
-}
-
-/*
* The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(),
* set the success or error code of an emulated VMX instruction, as specified
* by Vol 2B, VMX Instruction Reference, "Conventions".
@@ -7242,6 +7146,12 @@ static int enter_vmx_operation(struct kv
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs *shadow_vmcs;
+ vmx->nested.vmcs02.vmcs = alloc_vmcs();
+ vmx->nested.vmcs02.shadow_vmcs = NULL;
+ if (!vmx->nested.vmcs02.vmcs)
+ goto out_vmcs02;
+ loaded_vmcs_init(&vmx->nested.vmcs02);
+
if (cpu_has_vmx_msr_bitmap()) {
vmx->nested.msr_bitmap =
(unsigned long *)__get_free_page(GFP_KERNEL);
@@ -7264,9 +7174,6 @@ static int enter_vmx_operation(struct kv
vmx->vmcs01.shadow_vmcs = shadow_vmcs;
}
- INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
- vmx->nested.vmcs02_num = 0;
-
hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL_PINNED);
vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
@@ -7281,6 +7188,9 @@ out_cached_vmcs12:
free_page((unsigned long)vmx->nested.msr_bitmap);
out_msr_bitmap:
+ free_loaded_vmcs(&vmx->nested.vmcs02);
+
+out_vmcs02:
return -ENOMEM;
}
@@ -7434,7 +7344,7 @@ static void free_nested(struct vcpu_vmx
vmx->vmcs01.shadow_vmcs = NULL;
}
kfree(vmx->nested.cached_vmcs12);
- /* Unpin physical memory we referred to in current vmcs02 */
+ /* Unpin physical memory we referred to in the vmcs02 */
if (vmx->nested.apic_access_page) {
kvm_release_page_dirty(vmx->nested.apic_access_page);
vmx->nested.apic_access_page = NULL;
@@ -7450,7 +7360,7 @@ static void free_nested(struct vcpu_vmx
vmx->nested.pi_desc = NULL;
}
- nested_free_all_saved_vmcss(vmx);
+ free_loaded_vmcs(&vmx->nested.vmcs02);
}
/* Emulate the VMXOFF instruction */
@@ -7493,8 +7403,6 @@ static int handle_vmclear(struct kvm_vcp
vmptr + offsetof(struct vmcs12, launch_state),
&zero, sizeof(zero));
- nested_free_vmcs02(vmx, vmptr);
-
nested_vmx_succeed(vcpu);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -8406,10 +8314,11 @@ static bool nested_vmx_exit_reflected(st
/*
* The host physical addresses of some pages of guest memory
- * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU
- * may write to these pages via their host physical address while
- * L2 is running, bypassing any address-translation-based dirty
- * tracking (e.g. EPT write protection).
+ * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC
+ * Page). The CPU may write to these pages via their host
+ * physical address while L2 is running, bypassing any
+ * address-translation-based dirty tracking (e.g. EPT write
+ * protection).
*
* Mark them dirty on every exit from L2 to prevent them from
* getting out of sync with dirty tracking.
@@ -10903,20 +10812,15 @@ static int enter_vmx_non_root_mode(struc
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
- struct loaded_vmcs *vmcs02;
u32 msr_entry_idx;
u32 exit_qual;
- vmcs02 = nested_get_current_vmcs02(vmx);
- if (!vmcs02)
- return -ENOMEM;
-
enter_guest_mode(vcpu);
if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
- vmx_switch_vmcs(vcpu, vmcs02);
+ vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
vmx_segment_cache_clear(vmx);
if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &exit_qual)) {
@@ -11534,10 +11438,6 @@ static void nested_vmx_vmexit(struct kvm
vm_exit_controls_reset_shadow(vmx);
vmx_segment_cache_clear(vmx);
- /* if no vmcs02 cache requested, remove the one we used */
- if (VMCS02_POOL_SIZE == 0)
- nested_free_vmcs02(vmx, vmx->nested.current_vmptr);
-
/* Update any VMCS fields that might have changed while L2 ran */
vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
Patches currently in stable-queue which might be from jmattson(a)google.com are
queue-4.15/x86kvm_Update_spectre-v1_mitigation.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVM_nVMX_Eliminate_vmcs02_pool.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch
queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch
This is a note to let you know that I've just added the patch titled
KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
From: KarimAllah Ahmed karahmed(a)amazon.de
Date: Thu Feb 1 22:59:45 2018 +0100
From: KarimAllah Ahmed karahmed(a)amazon.de
commit d28b387fb74da95d69d2615732f50cceb38e9a4d
[ Based on a patch from Ashok Raj <ashok.raj(a)intel.com> ]
Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.
To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.
No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.
[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny(a)oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Jun Nakajima <jun.nakajima(a)intel.com>
Cc: kvm(a)vger.kernel.org
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Asit Mallick <asit.k.mallick(a)intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Ashok Raj <ashok.raj(a)intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/cpuid.c | 9 ++--
arch/x86/kvm/vmx.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 2
3 files changed, 110 insertions(+), 6 deletions(-)
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct
/* cpuid 0x80000008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
- F(IBPB);
+ F(IBPB) | F(IBRS);
/* cpuid 0xC0000001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
- F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+ F(ARCH_CAPABILITIES);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
- /* IBPB isn't necessarily present in hardware cpuid */
+ /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+ if (boot_cpu_has(X86_FEATURE_IBRS))
+ entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -595,6 +595,7 @@ struct vcpu_vmx {
#endif
u64 arch_capabilities;
+ u64 spec_ctrl;
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -1911,6 +1912,29 @@ static void update_exception_bitmap(stru
}
/*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+ unsigned long *msr_bitmap;
+ int f = sizeof(unsigned long);
+
+ if (!cpu_has_vmx_msr_bitmap())
+ return true;
+
+ msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+ if (msr <= 0x1fff) {
+ return !!test_bit(msr, msr_bitmap + 0x800 / f);
+ } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+ msr &= 0x1fff;
+ return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+ }
+
+ return true;
+}
+
+/*
* Check if MSR is intercepted for L01 MSR bitmap.
*/
static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
@@ -3262,6 +3286,14 @@ static int vmx_get_msr(struct kvm_vcpu *
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+ return 1;
+
+ msr_info->data = to_vmx(vcpu)->spec_ctrl;
+ break;
case MSR_IA32_ARCH_CAPABILITIES:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -3375,6 +3407,37 @@ static int vmx_set_msr(struct kvm_vcpu *
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+ return 1;
+
+ /* The STIBP bit doesn't fault even if it's not advertised */
+ if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+ return 1;
+
+ vmx->spec_ctrl = data;
+
+ if (!data)
+ break;
+
+ /*
+ * For non-nested:
+ * When it's written (to non-zero) for the first time, pass
+ * it through.
+ *
+ * For nested:
+ * The handling of the MSR bitmap for L2 guests is done in
+ * nested_vmx_merge_msr_bitmap. We should not touch the
+ * vmcs02.msr_bitmap here since it gets completely overwritten
+ * in the merging. We update the vmcs01 here for L1 as well
+ * since it will end up touching the MSR anyway now.
+ */
+ vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
+ MSR_IA32_SPEC_CTRL,
+ MSR_TYPE_RW);
+ break;
case MSR_IA32_PRED_CMD:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
@@ -5700,6 +5763,7 @@ static void vmx_vcpu_reset(struct kvm_vc
u64 cr0;
vmx->rmode.vm86_active = 0;
+ vmx->spec_ctrl = 0;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vcpu, 0);
@@ -9371,6 +9435,15 @@ static void __noclone vmx_vcpu_run(struc
vmx_arm_hv_timer(vcpu);
+ /*
+ * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+ * it's non-zero. Since vmentry is serialising on affected CPUs, there
+ * is no need to worry about the conditional branch over the wrmsr
+ * being speculatively taken.
+ */
+ if (vmx->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
vmx->__launched = vmx->loaded_vmcs->launched;
asm(
/* Store host registers */
@@ -9489,6 +9562,27 @@ static void __noclone vmx_vcpu_run(struc
#endif
);
+ /*
+ * We do not use IBRS in the kernel. If this vCPU has used the
+ * SPEC_CTRL MSR it may have left it on; save the value and
+ * turn it off. This is much more efficient than blindly adding
+ * it to the atomic save/restore list. Especially as the former
+ * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+ *
+ * For non-nested case:
+ * If the L01 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ *
+ * For nested case:
+ * If the L02 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ */
+ if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
+ rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
+ if (vmx->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
/* Eliminate branch target predictions from guest mode */
vmexit_fill_RSB();
@@ -10113,7 +10207,7 @@ static inline bool nested_vmx_merge_msr_
unsigned long *msr_bitmap_l1;
unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
/*
- * pred_cmd is trying to verify two things:
+ * pred_cmd & spec_ctrl are trying to verify two things:
*
* 1. L0 gave a permission to L1 to actually passthrough the MSR. This
* ensures that we do not accidentally generate an L02 MSR bitmap
@@ -10126,9 +10220,10 @@ static inline bool nested_vmx_merge_msr_
* the MSR.
*/
bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+ bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
- !pred_cmd)
+ !pred_cmd && !spec_ctrl)
return false;
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10162,6 +10257,12 @@ static inline bool nested_vmx_merge_msr_
}
}
+ if (spec_ctrl)
+ nested_vmx_disable_intercept_for_msr(
+ msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_SPEC_CTRL,
+ MSR_TYPE_R | MSR_TYPE_W);
+
if (pred_cmd)
nested_vmx_disable_intercept_for_msr(
msr_bitmap_l1, msr_bitmap_l0,
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = {
#endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_ARCH_CAPABILITIES
+ MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
};
static unsigned num_msrs_to_save;
Patches currently in stable-queue which might be from ashok.raj(a)intel.com are
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
This is a note to let you know that I've just added the patch titled
KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
From: KarimAllah Ahmed karahmed(a)amazon.de
Date: Thu Feb 1 22:59:44 2018 +0100
From: KarimAllah Ahmed karahmed(a)amazon.de
commit 28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.
[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com>
Reviewed-by: Darren Kenny <darren.kenny(a)oracle.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Jun Nakajima <jun.nakajima(a)intel.com>
Cc: kvm(a)vger.kernel.org
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Asit Mallick <asit.k.mallick(a)intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Ashok Raj <ashok.raj(a)intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx.c | 15 +++++++++++++++
arch/x86/kvm/x86.c | 1 +
3 files changed, 17 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
- F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,8 @@ struct vcpu_vmx {
u64 msr_guest_kernel_gs_base;
#endif
+ u64 arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3260,6 +3262,12 @@ static int vmx_get_msr(struct kvm_vcpu *
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+ return 1;
+ msr_info->data = to_vmx(vcpu)->arch_capabilities;
+ break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3395,6 +3403,11 @@ static int vmx_set_msr(struct kvm_vcpu *
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
MSR_TYPE_W);
break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!msr_info->host_initiated)
+ return 1;
+ vmx->arch_capabilities = data;
+ break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5657,6 +5670,8 @@ static void vmx_vcpu_setup(struct vcpu_v
++vmx->nmsrs;
}
+ if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+ rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
#endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+ MSR_IA32_ARCH_CAPABILITIES
};
static unsigned num_msrs_to_save;
Patches currently in stable-queue which might be from karahmed(a)amazon.de are
queue-4.15/x86spectre_Simplify_spectre_v2_command_line_parsing.patch
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch
queue-4.15/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch
This is a note to let you know that I've just added the patch titled
KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
From: KarimAllah Ahmed karahmed(a)amazon.de
Date: Sat Feb 3 15:56:23 2018 +0100
From: KarimAllah Ahmed karahmed(a)amazon.de
commit b2ac58f90540e39324e7a29a7ad471407ae0bf48
[ Based on a patch from Paolo Bonzini <pbonzini(a)redhat.com> ]
... basically doing exactly what we do for VMX:
- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
actually used it.
Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny(a)oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Jun Nakajima <jun.nakajima(a)intel.com>
Cc: kvm(a)vger.kernel.org
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Asit Mallick <asit.k.mallick(a)intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Ashok Raj <ashok.raj(a)intel.com>
Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/svm.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,8 @@ struct vcpu_svm {
u64 gs_base;
} host;
+ u64 spec_ctrl;
+
u32 *msrpm;
ulong nmi_iret_rip;
@@ -249,6 +251,7 @@ static const struct svm_direct_access_ms
{ .index = MSR_CSTAR, .always = true },
{ .index = MSR_SYSCALL_MASK, .always = true },
#endif
+ { .index = MSR_IA32_SPEC_CTRL, .always = false },
{ .index = MSR_IA32_PRED_CMD, .always = false },
{ .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
@@ -882,6 +885,25 @@ static bool valid_msr_intercept(u32 inde
return false;
}
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
+{
+ u8 bit_write;
+ unsigned long tmp;
+ u32 offset;
+ u32 *msrpm;
+
+ msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
+ to_svm(vcpu)->msrpm;
+
+ offset = svm_msrpm_offset(msr);
+ bit_write = 2 * (msr & 0x0f) + 1;
+ tmp = msrpm[offset];
+
+ BUG_ON(offset == MSR_INVALID);
+
+ return !!test_bit(bit_write, &tmp);
+}
+
static void set_msr_interception(u32 *msrpm, unsigned msr,
int read, int write)
{
@@ -1584,6 +1606,8 @@ static void svm_vcpu_reset(struct kvm_vc
u32 dummy;
u32 eax = 1;
+ svm->spec_ctrl = 0;
+
if (!init_event) {
svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
MSR_IA32_APICBASE_ENABLE;
@@ -3605,6 +3629,13 @@ static int svm_get_msr(struct kvm_vcpu *
case MSR_VM_CR:
msr_info->data = svm->nested.vm_cr_msr;
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+ return 1;
+
+ msr_info->data = svm->spec_ctrl;
+ break;
case MSR_IA32_UCODE_REV:
msr_info->data = 0x01000065;
break;
@@ -3696,6 +3727,33 @@ static int svm_set_msr(struct kvm_vcpu *
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+ return 1;
+
+ /* The STIBP bit doesn't fault even if it's not advertised */
+ if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+ return 1;
+
+ svm->spec_ctrl = data;
+
+ if (!data)
+ break;
+
+ /*
+ * For non-nested:
+ * When it's written (to non-zero) for the first time, pass
+ * it through.
+ *
+ * For nested:
+ * The handling of the MSR bitmap for L2 guests is done in
+ * nested_svm_vmrun_msrpm.
+ * We update the L1 MSR bit as well since it will end up
+ * touching the MSR anyway now.
+ */
+ set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+ break;
case MSR_IA32_PRED_CMD:
if (!msr->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4964,6 +5022,15 @@ static void svm_vcpu_run(struct kvm_vcpu
local_irq_enable();
+ /*
+ * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+ * it's non-zero. Since vmentry is serialising on affected CPUs, there
+ * is no need to worry about the conditional branch over the wrmsr
+ * being speculatively taken.
+ */
+ if (svm->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
asm volatile (
"push %%" _ASM_BP "; \n\t"
"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -5056,6 +5123,27 @@ static void svm_vcpu_run(struct kvm_vcpu
#endif
);
+ /*
+ * We do not use IBRS in the kernel. If this vCPU has used the
+ * SPEC_CTRL MSR it may have left it on; save the value and
+ * turn it off. This is much more efficient than blindly adding
+ * it to the atomic save/restore list. Especially as the former
+ * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+ *
+ * For non-nested case:
+ * If the L01 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ *
+ * For nested case:
+ * If the L02 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ */
+ if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
+ rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
+ if (svm->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
/* Eliminate branch target predictions from guest mode */
vmexit_fill_RSB();
Patches currently in stable-queue which might be from pbonzini(a)redhat.com are
queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.15/x86kvm_Update_spectre-v1_mitigation.patch
queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.15/KVM_VMX_introduce_alloc_loaded_vmcs.patch
queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.15/KVM_nVMX_Eliminate_vmcs02_pool.patch
queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.15/KVMx86_Add_IBPB_support.patch
queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.15/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch
queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch
This is a note to let you know that I've just added the patch titled
Documentation: Document array_index_nospec
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
Documentation_Document_array_index_nospec.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: Documentation: Document array_index_nospec
From: Mark Rutland mark.rutland(a)arm.com
Date: Mon Jan 29 17:02:16 2018 -0800
From: Mark Rutland mark.rutland(a)arm.com
commit f84a56f73dddaeac1dba8045b007f742f61cd2da
Document the rationale and usage of the new array_index_nospec() helper.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Cc: linux-arch(a)vger.kernel.org
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: gregkh(a)linuxfoundation.org
Cc: kernel-hardening(a)lists.openwall.com
Cc: torvalds(a)linux-foundation.org
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727413645.33451.15878817161436755393.stgit@dwi…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/speculation.txt | 90 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 90 insertions(+)
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,90 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+ int load_array(int *array, unsigned int index)
+ {
+ if (index >= MAX_ARRAY_ELEMS)
+ return 0;
+ else
+ return array[index];
+ }
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+ CMP <index>, #MAX_ARRAY_ELEMS
+ B.LT less
+ MOV <returnval>, #0
+ RET
+ less:
+ LDR <returnval>, [<array>, <index>]
+ RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This
+value will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may
+result in sensitive information being leaked. Consider the following
+code, building on the prior example:
+
+ int load_dependent_arrays(int *arr1, int *arr2, int index)
+ {
+ int val1, val2,
+
+ val1 = load_array(arr1, index);
+ val2 = load_array(arr2, val1);
+
+ return val2;
+ }
+
+Under speculation, the first call to load_array() may return the value
+of an out-of-bounds address, while the second call will influence
+microarchitectural state dependent on this value. This may provide an
+arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are
+respected even under speculation. Architectures which are affected by
+speculation-based side-channels are expected to implement these
+primitives.
+
+The array_index_nospec() helper in <linux/nospec.h> can be used to
+prevent information from being leaked via side-channels.
+
+A call to array_index_nospec(index, size) returns a sanitized index
+value that is bounded to [0, size) even under cpu speculation
+conditions.
+
+This can be used to protect the earlier load_array() example:
+
+ int load_array(int *array, unsigned int index)
+ {
+ if (index >= MAX_ARRAY_ELEMS)
+ return 0;
+ else {
+ index = array_index_nospec(index, MAX_ARRAY_ELEMS);
+ return array[index];
+ }
+ }
Patches currently in stable-queue which might be from mark.rutland(a)arm.com are
queue-4.15/Documentation_Document_array_index_nospec.patch
This is a note to let you know that I've just added the patch titled
x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
Subject: x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
From: Dan Williams dan.j.williams(a)intel.com
Date: Mon Jan 29 17:02:49 2018 -0800
From: Dan Williams dan.j.williams(a)intel.com
commit 304ec1b050310548db33063e567123fae8fd0301
Quoting Linus:
I do think that it would be a good idea to very expressly document
the fact that it's not that the user access itself is unsafe. I do
agree that things like "get_user()" want to be protected, but not
because of any direct bugs or problems with get_user() and friends,
but simply because get_user() is an excellent source of a pointer
that is obviously controlled from a potentially attacking user
space. So it's a prime candidate for then finding _subsequent_
accesses that can then be used to perturb the cache.
__uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the
limit check is far away from the user pointer de-reference. In those cases
a barrier_nospec() prevents speculation with a potential pointer to
privileged memory. uaccess_try_nospec covers get_user_try.
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Suggested-by: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-arch(a)vger.kernel.org
Cc: Kees Cook <keescook(a)chromium.org>
Cc: kernel-hardening(a)lists.openwall.com
Cc: gregkh(a)linuxfoundation.org
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: alan(a)linux.intel.com
Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwi…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/uaccess.h | 6 +++---
arch/x86/include/asm/uaccess_32.h | 6 +++---
arch/x86/include/asm/uaccess_64.h | 12 ++++++------
arch/x86/lib/usercopy_32.c | 4 ++--
4 files changed, 14 insertions(+), 14 deletions(-)
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do { \
({ \
int __gu_err; \
__inttype(*(ptr)) __gu_val; \
- __uaccess_begin(); \
+ __uaccess_begin_nospec(); \
__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \
__uaccess_end(); \
(x) = (__force __typeof__(*(ptr)))__gu_val; \
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long bu
* get_user_ex(...);
* } get_user_catch(err)
*/
-#define get_user_try uaccess_try
+#define get_user_try uaccess_try_nospec
#define get_user_catch(err) uaccess_catch(err)
#define get_user_ex(x, ptr) do { \
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
__typeof__(ptr) __uval = (uval); \
__typeof__(*(ptr)) __old = (old); \
__typeof__(*(ptr)) __new = (new); \
- __uaccess_begin(); \
+ __uaccess_begin_nospec(); \
switch (size) { \
case 1: \
{ \
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void
switch (n) {
case 1:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u8 *)to, from, ret,
"b", "b", "=q", 1);
__uaccess_end();
return ret;
case 2:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u16 *)to, from, ret,
"w", "w", "=r", 2);
__uaccess_end();
return ret;
case 4:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u32 *)to, from, ret,
"l", "k", "=r", 4);
__uaccess_end();
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void
return copy_user_generic(dst, (__force void *)src, size);
switch (size) {
case 1:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
ret, "b", "b", "=q", 1);
__uaccess_end();
return ret;
case 2:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
ret, "w", "w", "=r", 2);
__uaccess_end();
return ret;
case 4:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
ret, "l", "k", "=r", 4);
__uaccess_end();
return ret;
case 8:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 8);
__uaccess_end();
return ret;
case 10:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 10);
if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void
__uaccess_end();
return ret;
case 16:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 16);
if (likely(!ret))
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,7 +331,7 @@ do { \
unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
{
- __uaccess_begin();
+ __uaccess_begin_nospec();
if (movsl_is_ok(to, from, n))
__copy_user(to, from, n);
else
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
unsigned long n)
{
- __uaccess_begin();
+ __uaccess_begin_nospec();
#ifdef CONFIG_X86_INTEL_USERCOPY
if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
n = __copy_user_intel_nocache(to, from, n);
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.14/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
queue-4.14/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch
queue-4.14/x86_Introduce_barrier_nospec.patch
queue-4.14/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
queue-4.14/x86get_user_Use_pointer_masking_to_limit_speculation.patch
queue-4.14/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.14/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch
queue-4.14/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch
queue-4.14/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
queue-4.14/KVM_VMX_Make_indirect_call_speculation_safe.patch
queue-4.14/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch
queue-4.14/x86alternative_Print_unadorned_pointers.patch
queue-4.14/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.14/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch
queue-4.14/array_index_nospec_Sanitize_speculative_array_de-references.patch
queue-4.14/Documentation_Document_array_index_nospec.patch
queue-4.14/x86entry64_Remove_the_SYSCALL64_fast_path.patch
queue-4.14/x86retpoline_Remove_the_esprsp_thunk.patch
queue-4.14/x86bugs_Drop_one_mitigation_from_dmesg.patch
queue-4.14/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch
queue-4.14/scripts-faddr2line-fix-cross_compile-unset-error.patch
queue-4.14/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
queue-4.14/x86asm_Move_status_from_thread_struct_to_thread_info.patch
queue-4.14/KVMx86_Add_IBPB_support.patch
queue-4.14/x86_Implement_array_index_mask_nospec.patch
queue-4.14/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
queue-4.14/nl80211_Sanitize_array_index_in_parse_txq_params.patch
queue-4.14/moduleretpoline_Warn_about_missing_retpoline_in_module.patch
queue-4.14/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch
queue-4.14/x86speculation_Simplify_indirect_branch_prediction_barrier().patch
queue-4.14/x86nospec_Fix_header_guards_names.patch
queue-4.14/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch
queue-4.14/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
queue-4.14/x86entry64_Push_extra_regs_right_away.patch
queue-4.14/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
queue-4.14/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
queue-4.14/x86retpoline_Simplify_vmexit_fill_RSB().patch
queue-4.14/objtool_Warn_on_stripped_section_symbol.patch
queue-4.14/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
queue-4.14/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch
queue-4.14/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
queue-4.14/objtool_Improve_retpoline_alternative_handling.patch