These patches fix some issues with the way KVM manages FPSIMD/SVE/SME state. The series supersedes my earlier attempt at fixing the host SVE state corruption issue:
https://lore.kernel.org/linux-arm-kernel/20250121100026.3974971-1-mark.rutla...
Patch 1 addresses the host SVE state corruption issue by always saving and unbinding the host state when loading a vCPU, as discussed on the earlier patch:
https://lore.kernel.org/linux-arm-kernel/Z4--YuG5SWrP_pW7@J2N7QTR9R3/ https://lore.kernel.org/linux-arm-kernel/86plkful48.wl-maz@kernel.org/
Patches 2 to 4 remove code made redundant by patch 1. These probably warrant backporting along with patch 1 as there is some historical brokenness in the code they remove.
Patches 5 to 7 are preparatory refactoring for patch 8, and are not intended to have any functional impact.
Patch 8 addresses some mismanagement of ZCR_EL{1,2} which can result in the host VMM unexpectedly receiving a SIGKILL. To fix this, we eagerly switch ZCR_EL{1,2} at guest<->host transitions, as discussed on another series:
https://lore.kernel.org/linux-arm-kernel/Z4pAMaEYvdLpmbg2@J2N7QTR9R3/ https://lore.kernel.org/linux-arm-kernel/86o6zzukwr.wl-maz@kernel.org/ https://lore.kernel.org/linux-arm-kernel/Z5Dc-WMu2azhTuMn@J2N7QTR9R3/
The end result is that KVM loses ~100 lines of code, and becomes a bit simpler to reason about.
I've pushed these patches to the arm64-kvm-fpsimd-fixes-20250206 tag on my kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
The (unstable) arm64/kvm/fpsimd-fixes branch in that repo contains the fixes plus additional debug patches I've used for testing. I've given this some basic testing on a virtual platform, booting a host and a guest with and without constraining the guest's max SVE VL, with:
* kvm_arm.mode=vhe * kvm_arm.mode=nvhe * kvm_arm.mode=protected (IIUC this will default to hVHE)
Since v1 [1]: * Address some additional compiler warnings in patch 7 * Use ZCR_EL1 alias in VHE code * Fold in Tested-by and Reviewed-by tags * Fix typos
[1] https://lore.kernel.org/linux-arm-kernel/20250204152100.705610-1-mark.rutlan...
Mark.
Mark Rutland (8): KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Refactor exit handlers KVM: arm64: Mark some header functions as inline KVM: arm64: Eagerly switch ZCR_EL{1,2}
arch/arm64/include/asm/kvm_emulate.h | 42 -------- arch/arm64/include/asm/kvm_host.h | 22 +---- arch/arm64/kernel/fpsimd.c | 25 ----- arch/arm64/kvm/arm.c | 8 -- arch/arm64/kvm/fpsimd.c | 100 ++----------------- arch/arm64/kvm/hyp/include/hyp/switch.h | 125 +++++++++++++++++------- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ++- arch/arm64/kvm/hyp/nvhe/switch.c | 91 ++++++++--------- arch/arm64/kvm/hyp/vhe/switch.c | 33 ++++--- 9 files changed, 174 insertions(+), 287 deletions(-)
There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including:
* Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger:
https://issues.redhat.com/browse/RHEL-68997
* Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state.
* The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory.
Avoid these by eagerly saving and "flushing" the host's FPSIMD/SVE/SME state when loading a vCPU such that KVM does not need to save any of the host's FPSIMD/SVE/SME state. For clarity, fpsimd_kvm_prepare() is removed and the necessary call to fpsimd_save_and_flush_cpu_state() is placed in kvm_arch_vcpu_load_fp(). As 'fpsimd_state' and 'fpmr_ptr' should not be used, they are set to NULL; all uses of these will be removed in subsequent patches.
Historical problems go back at least as far as v5.17, e.g. erroneous assumptions about TIF_SVE being clear in commit:
8383741ab2e773a9 ("KVM: arm64: Get rid of host SVE tracking/saving")
... and so this eager save+flush probably needs to be backported to ALL stable trees.
Fixes: 93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests") Fixes: 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Fixes: ef3be86021c3bdf3 ("KVM: arm64: Add save/restore support for FPMR") Reported-by: Eric Auger eauger@redhat.com Reported-by: Wilco Dijkstra wilco.dijkstra@arm.com Reviewed-by: Mark Brown broonie@kernel.org Tested-by: Eric Auger eric.auger@redhat.com Cc: stable@vger.kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Florian Weimer fweimer@redhat.com Cc: Fuad Tabba tabba@google.com Cc: Jeremy Linton jeremy.linton@arm.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Paolo Bonzini pbonzini@redhat.com Cc: Will Deacon will@kernel.org Signed-off-by: Mark Rutland mark.rutland@arm.com --- arch/arm64/kernel/fpsimd.c | 25 ------------------------- arch/arm64/kvm/fpsimd.c | 35 ++++++++++------------------------- 2 files changed, 10 insertions(+), 50 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 2b601d88762d4..8370d55f03533 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1694,31 +1694,6 @@ void fpsimd_signal_preserve_current_state(void) sve_to_fpsimd(current); }
-/* - * Called by KVM when entering the guest. - */ -void fpsimd_kvm_prepare(void) -{ - if (!system_supports_sve()) - return; - - /* - * KVM does not save host SVE state since we can only enter - * the guest from a syscall so the ABI means that only the - * non-saved SVE state needs to be saved. If we have left - * SVE enabled for performance reasons then update the task - * state to be FPSIMD only. - */ - get_cpu_fpsimd_context(); - - if (test_and_clear_thread_flag(TIF_SVE)) { - sve_to_fpsimd(current); - current->thread.fp_type = FP_STATE_FPSIMD; - } - - put_cpu_fpsimd_context(); -} - /* * Associate current's FPSIMD context with this cpu * The caller must have ownership of the cpu FPSIMD context before calling diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 4d3d1a2eb1570..ceeb0a4893aa7 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -54,16 +54,18 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) if (!system_supports_fpsimd()) return;
- fpsimd_kvm_prepare(); - /* - * We will check TIF_FOREIGN_FPSTATE just before entering the - * guest in kvm_arch_vcpu_ctxflush_fp() and override this to - * FP_STATE_FREE if the flag set. + * Ensure that any host FPSIMD/SVE/SME state is saved and unbound such + * that the host kernel is responsible for restoring this state upon + * return to userspace, and the hyp code doesn't need to save anything. + * + * When the host may use SME, fpsimd_save_and_flush_cpu_state() ensures + * that PSTATE.{SM,ZA} == {0,0}. */ - *host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED; - *host_data_ptr(fpsimd_state) = kern_hyp_va(¤t->thread.uw.fpsimd_state); - *host_data_ptr(fpmr_ptr) = kern_hyp_va(¤t->thread.uw.fpmr); + fpsimd_save_and_flush_cpu_state(); + *host_data_ptr(fp_owner) = FP_STATE_FREE; + *host_data_ptr(fpsimd_state) = NULL; + *host_data_ptr(fpmr_ptr) = NULL;
host_data_clear_flag(HOST_SVE_ENABLED); if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN) @@ -73,23 +75,6 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) host_data_clear_flag(HOST_SME_ENABLED); if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN) host_data_set_flag(HOST_SME_ENABLED); - - /* - * If PSTATE.SM is enabled then save any pending FP - * state and disable PSTATE.SM. If we leave PSTATE.SM - * enabled and the guest does not enable SME via - * CPACR_EL1.SMEN then operations that should be valid - * may generate SME traps from EL1 to EL1 which we - * can't intercept and which would confuse the guest. - * - * Do the same for PSTATE.ZA in the case where there - * is state in the registers which has not already - * been saved, this is very unlikely to happen. - */ - if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) { - *host_data_ptr(fp_owner) = FP_STATE_FREE; - fpsimd_save_and_flush_cpu_state(); - } }
/*
On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including:
Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger:
Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state.
The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory.
How hard would it be to write tests for these three scenarios? If we had something to exercise the relevant paths then...
... and so this eager save+flush probably needs to be backported to ALL stable trees.
... this backporting might be a little easier to be sure about?
Will
On Fri, Feb 07, 2025 at 12:27:51PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including:
Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger:
Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state.
The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory.
How hard would it be to write tests for these three scenarios? If we had something to exercise the relevant paths then...
... and so this eager save+flush probably needs to be backported to ALL stable trees.
... this backporting might be a little easier to be sure about?
For the first case I have a quick and dirty test, which I've pushed to my arm64/kvm/fpsimd-tests branch in my kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
For the last case it should be possible to do something similar, but I hadn't had the time to dig in to the KVM selftests infrastructure and figure out how to confiugre the guest appropriately.
For the ptrace case, the same symptoms can be provoked outside of KVM (and I'm currently working to fix that). From my PoV the important thing is that this fix happens to remove KVM from the set of cases the other fixes need to care about.
FWIW I was assuming that I'd be handling the upstream backports, and I'd be testing with the test above and some additional assertions hacked into the kernel for testing.
Mark.
On Fri, 07 Feb 2025 13:21:44 +0000, Mark Rutland mark.rutland@arm.com wrote:
On Fri, Feb 07, 2025 at 12:27:51PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including:
Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger:
Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state.
The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory.
How hard would it be to write tests for these three scenarios? If we had something to exercise the relevant paths then...
... and so this eager save+flush probably needs to be backported to ALL stable trees.
... this backporting might be a little easier to be sure about?
For the first case I have a quick and dirty test, which I've pushed to my arm64/kvm/fpsimd-tests branch in my kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
For the last case it should be possible to do something similar, but I hadn't had the time to dig in to the KVM selftests infrastructure and figure out how to confiugre the guest appropriately.
For the ptrace case, the same symptoms can be provoked outside of KVM (and I'm currently working to fix that). From my PoV the important thing is that this fix happens to remove KVM from the set of cases the other fixes need to care about.
FWIW I was assuming that I'd be handling the upstream backports, and I'd be testing with the test above and some additional assertions hacked into the kernel for testing.
I agree that having the tests around would be great, if only to catch potential repressions.
However, I really don't want to gate the fixes on these tests. So unless someone shouts, I intend to take this series in very shortly. We can always merge the tests as a subsequent improvement.
Thanks,
M.
On Fri, Feb 07, 2025 at 01:21:44PM +0000, Mark Rutland wrote:
On Fri, Feb 07, 2025 at 12:27:51PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including:
Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger:
Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state.
The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory.
How hard would it be to write tests for these three scenarios? If we had something to exercise the relevant paths then...
... and so this eager save+flush probably needs to be backported to ALL stable trees.
... this backporting might be a little easier to be sure about?
For the first case I have a quick and dirty test, which I've pushed to my arm64/kvm/fpsimd-tests branch in my kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
For the last case it should be possible to do something similar, but I hadn't had the time to dig in to the KVM selftests infrastructure and figure out how to confiugre the guest appropriately.
For the ptrace case, the same symptoms can be provoked outside of KVM (and I'm currently working to fix that). From my PoV the important thing is that this fix happens to remove KVM from the set of cases the other fixes need to care about.
FWIW I was assuming that I'd be handling the upstream backports, and I'd be testing with the test above and some additional assertions hacked into the kernel for testing.
The backporting for Android will probably diverge slightly from upstream, so if I'm able to run your tests as well then it would really handy. Thanks for sharing the stuff you currently have.
The patch looks fine:
Acked-by: Will Deacon will@kernel.org
Will
Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained.
Remove the unused code and data structures.
To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/kvm_host.h | 20 +++++------------- arch/arm64/kvm/arm.c | 8 ------- arch/arm64/kvm/fpsimd.c | 2 -- arch/arm64/kvm/hyp/include/hyp/switch.h | 25 ++++++++++++++++++++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 +- arch/arm64/kvm/hyp/nvhe/switch.c | 28 ------------------------- arch/arm64/kvm/hyp/vhe/switch.c | 8 ------- 7 files changed, 29 insertions(+), 64 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 7cfa024de4e34..f56c07568591f 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -624,23 +624,13 @@ struct kvm_host_data { struct kvm_cpu_context host_ctxt;
/* - * All pointers in this union are hyp VA. + * Hyp VA. * sve_state is only used in pKVM and if system_supports_sve(). */ - union { - struct user_fpsimd_state *fpsimd_state; - struct cpu_sve_state *sve_state; - }; - - union { - /* HYP VA pointer to the host storage for FPMR */ - u64 *fpmr_ptr; - /* - * Used by pKVM only, as it needs to provide storage - * for the host - */ - u64 fpmr; - }; + struct cpu_sve_state *sve_state; + + /* Used by pKVM only. */ + u64 fpmr;
/* Ownership of the FP regs */ enum { diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 646e806c6ca69..027c8e9c4741f 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2461,14 +2461,6 @@ static void finalize_init_hyp_mode(void) per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state = kern_hyp_va(sve_state); } - } else { - for_each_possible_cpu(cpu) { - struct user_fpsimd_state *fpsimd_state; - - fpsimd_state = &per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->host_ctxt.fp_regs; - per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->fpsimd_state = - kern_hyp_va(fpsimd_state); - } } }
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index ceeb0a4893aa7..332cb3904e68b 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -64,8 +64,6 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) */ fpsimd_save_and_flush_cpu_state(); *host_data_ptr(fp_owner) = FP_STATE_FREE; - *host_data_ptr(fpsimd_state) = NULL; - *host_data_ptr(fpmr_ptr) = NULL;
host_data_clear_flag(HOST_SVE_ENABLED); if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN) diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index f838a45665f26..c5b8a11ac4f50 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -375,7 +375,28 @@ static inline void __hyp_sve_save_host(void) true); }
-static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu); +static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu) +{ + /* + * Non-protected kvm relies on the host restoring its sve state. + * Protected kvm restores the host's sve state as not to reveal that + * fpsimd was used by a guest nor leak upper sve bits. + */ + if (system_supports_sve()) { + __hyp_sve_save_host(); + + /* Re-enable SVE traps if not supported for the guest vcpu. */ + if (!vcpu_has_sve(vcpu)) + cpacr_clear_set(CPACR_EL1_ZEN, 0); + + } else { + __fpsimd_save_state(host_data_ptr(host_ctxt.fp_regs)); + } + + if (kvm_has_fpmr(kern_hyp_va(vcpu->kvm))) + *host_data_ptr(fpmr) = read_sysreg_s(SYS_FPMR); +} +
/* * We trap the first access to the FP/SIMD to save the host context and @@ -425,7 +446,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) isb();
/* Write out the host state if it's in the registers */ - if (host_owns_fp_regs()) + if (is_protected_kvm_enabled() && host_owns_fp_regs()) kvm_hyp_save_fpsimd_host(vcpu);
/* Restore the guest state */ diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index 5c134520e1805..ad1abd5493862 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -83,7 +83,7 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu) if (system_supports_sve()) __hyp_sve_restore_host(); else - __fpsimd_restore_state(*host_data_ptr(fpsimd_state)); + __fpsimd_restore_state(host_data_ptr(host_ctxt.fp_regs));
if (has_fpmr) write_sysreg_s(*host_data_ptr(fpmr), SYS_FPMR); diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 6c846d033d24a..7a2d189176249 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -192,34 +192,6 @@ static bool kvm_handle_pvm_sys64(struct kvm_vcpu *vcpu, u64 *exit_code) kvm_handle_pvm_sysreg(vcpu, exit_code)); }
-static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu) -{ - /* - * Non-protected kvm relies on the host restoring its sve state. - * Protected kvm restores the host's sve state as not to reveal that - * fpsimd was used by a guest nor leak upper sve bits. - */ - if (unlikely(is_protected_kvm_enabled() && system_supports_sve())) { - __hyp_sve_save_host(); - - /* Re-enable SVE traps if not supported for the guest vcpu. */ - if (!vcpu_has_sve(vcpu)) - cpacr_clear_set(CPACR_EL1_ZEN, 0); - - } else { - __fpsimd_save_state(*host_data_ptr(fpsimd_state)); - } - - if (kvm_has_fpmr(kern_hyp_va(vcpu->kvm))) { - u64 val = read_sysreg_s(SYS_FPMR); - - if (unlikely(is_protected_kvm_enabled())) - *host_data_ptr(fpmr) = val; - else - **host_data_ptr(fpmr_ptr) = val; - } -} - static const exit_handler_fn hyp_exit_handlers[] = { [0 ... ESR_ELx_EC_MAX] = NULL, [ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32, diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index b5b9dbaf1fdd6..e8a07d4bb546b 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -413,14 +413,6 @@ static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code) return true; }
-static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu) -{ - __fpsimd_save_state(*host_data_ptr(fpsimd_state)); - - if (kvm_has_fpmr(vcpu->kvm)) - **host_data_ptr(fpmr_ptr) = read_sysreg_s(SYS_FPMR); -} - static bool kvm_hyp_handle_tlbi_el2(struct kvm_vcpu *vcpu, u64 *exit_code) { int ret = -EINVAL;
On Thu, Feb 06, 2025 at 02:10:56PM +0000, Mark Rutland wrote:
Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained.
Remove the unused code and data structures.
To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/kvm_host.h | 20 +++++------------- arch/arm64/kvm/arm.c | 8 ------- arch/arm64/kvm/fpsimd.c | 2 -- arch/arm64/kvm/hyp/include/hyp/switch.h | 25 ++++++++++++++++++++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 +- arch/arm64/kvm/hyp/nvhe/switch.c | 28 ------------------------- arch/arm64/kvm/hyp/vhe/switch.c | 8 ------- 7 files changed, 29 insertions(+), 64 deletions(-)
[...]
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index f838a45665f26..c5b8a11ac4f50 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -375,7 +375,28 @@ static inline void __hyp_sve_save_host(void) true); } -static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu); +static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu) +{
- /*
* Non-protected kvm relies on the host restoring its sve state.
* Protected kvm restores the host's sve state as not to reveal that
* fpsimd was used by a guest nor leak upper sve bits.
*/
- if (system_supports_sve()) {
__hyp_sve_save_host();
/* Re-enable SVE traps if not supported for the guest vcpu. */
if (!vcpu_has_sve(vcpu))
cpacr_clear_set(CPACR_EL1_ZEN, 0);
- } else {
__fpsimd_save_state(host_data_ptr(host_ctxt.fp_regs));
- }
- if (kvm_has_fpmr(kern_hyp_va(vcpu->kvm)))
*host_data_ptr(fpmr) = read_sysreg_s(SYS_FPMR);
+}
/*
- We trap the first access to the FP/SIMD to save the host context and
@@ -425,7 +446,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) isb(); /* Write out the host state if it's in the registers */
- if (host_owns_fp_regs())
- if (is_protected_kvm_enabled() && host_owns_fp_regs()) kvm_hyp_save_fpsimd_host(vcpu);
I wondered briefly whether this would allow us to clean up the CPACR handling a little and avoid the conditional SVE trap re-enabling inside kvm_hyp_save_fpsimd_host() but I couldn't come up with a clean way to do it without an additional ISB. Hrm.
Anyway, as far as the patch goes:
Acked-by: Will Deacon will@kernel.org
Will
On Mon, Feb 10, 2025 at 04:12:43PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:10:56PM +0000, Mark Rutland wrote:
Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained.
Remove the unused code and data structures.
To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/kvm_host.h | 20 +++++------------- arch/arm64/kvm/arm.c | 8 ------- arch/arm64/kvm/fpsimd.c | 2 -- arch/arm64/kvm/hyp/include/hyp/switch.h | 25 ++++++++++++++++++++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 +- arch/arm64/kvm/hyp/nvhe/switch.c | 28 ------------------------- arch/arm64/kvm/hyp/vhe/switch.c | 8 ------- 7 files changed, 29 insertions(+), 64 deletions(-)
[...]
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index f838a45665f26..c5b8a11ac4f50 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -375,7 +375,28 @@ static inline void __hyp_sve_save_host(void) true); } -static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu); +static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu) +{
- /*
* Non-protected kvm relies on the host restoring its sve state.
* Protected kvm restores the host's sve state as not to reveal that
* fpsimd was used by a guest nor leak upper sve bits.
*/
- if (system_supports_sve()) {
__hyp_sve_save_host();
/* Re-enable SVE traps if not supported for the guest vcpu. */
if (!vcpu_has_sve(vcpu))
cpacr_clear_set(CPACR_EL1_ZEN, 0);
- } else {
__fpsimd_save_state(host_data_ptr(host_ctxt.fp_regs));
- }
- if (kvm_has_fpmr(kern_hyp_va(vcpu->kvm)))
*host_data_ptr(fpmr) = read_sysreg_s(SYS_FPMR);
+}
/*
- We trap the first access to the FP/SIMD to save the host context and
@@ -425,7 +446,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) isb(); /* Write out the host state if it's in the registers */
- if (host_owns_fp_regs())
- if (is_protected_kvm_enabled() && host_owns_fp_regs()) kvm_hyp_save_fpsimd_host(vcpu);
I wondered briefly whether this would allow us to clean up the CPACR handling a little and avoid the conditional SVE trap re-enabling inside kvm_hyp_save_fpsimd_host() but I couldn't come up with a clean way to do it without an additional ISB. Hrm.
Anyway, as far as the patch goes:
Acked-by: Will Deacon will@kernel.org
Thanks!
FWIW, I'd also considered that, and I'd concluded that if anything we could do a subsequent simplification by pulling that out of kvm_hyp_save_fpsimd_host() and have kvm_hyp_handle_fpsimd() do something like:
| static inline bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | { | ... | | /* Valid trap */ | | /* | * Enable everything EL2 might need to save/restore state. | * Maybe each of the bits should depend on system_has_xxx() | */ | cpacr_clear_set(0, CPACR_EL1_FPEN | CPACR_EL1_ZEN | CPACR_EL1_SMEN */ | isb(); | | ... | | /* Write out the host state if it's in the registers */ | if (is_protected_kvm_enabled() && host_owns_fp_regs()) | kvm_hyp_save_fpsimd_host(vcpu); | | /* Restore guest state */ | | ... | | /* | * Enable traps for the VCPU. The ERET will cause the traps to | * take effect in the guest, so no ISB is necessary. | */ | cpacr_guest = CPACR_EL1_FPEN; | if (vcpu_has_sve(vcpu)) | cpacr_guest |= CPACR_EL1_ZEN; | if (vcpu_has_sme(vcpu)) // whenever we add this | cpacr_guest |= CPACR_EL1_SMEN; | cpacr_clear_set(CPACR_EL1_FPEN | CPACR_EL1_ZEN | CPACR_EL1_SMEN, | cpacr_guest); | | return true; | }
... where we'd still have the CPACR write to re-enable traps, but it'd be unconditional, and wouldn't need an extra ISB.
If that makes sense to you, I can go spin that as a subsequent cleanup atop this series.
Mark.
On Mon, Feb 10, 2025 at 04:59:56PM +0000, Mark Rutland wrote:
On Mon, Feb 10, 2025 at 04:12:43PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:10:56PM +0000, Mark Rutland wrote:
| static inline bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | { | ... | | /* Valid trap */ | | /* | * Enable everything EL2 might need to save/restore state. | * Maybe each of the bits should depend on system_has_xxx() | */ | cpacr_clear_set(0, CPACR_EL1_FPEN | CPACR_EL1_ZEN | CPACR_EL1_SMEN */ | isb(); | | ... | | /* Write out the host state if it's in the registers */ | if (is_protected_kvm_enabled() && host_owns_fp_regs()) | kvm_hyp_save_fpsimd_host(vcpu); | | /* Restore guest state */ | | ... | | /* | * Enable traps for the VCPU. The ERET will cause the traps to | * take effect in the guest, so no ISB is necessary. | */ | cpacr_guest = CPACR_EL1_FPEN; | if (vcpu_has_sve(vcpu)) | cpacr_guest |= CPACR_EL1_ZEN; | if (vcpu_has_sme(vcpu)) // whenever we add this | cpacr_guest |= CPACR_EL1_SMEN; | cpacr_clear_set(CPACR_EL1_FPEN | CPACR_EL1_ZEN | CPACR_EL1_SMEN, | cpacr_guest); | | return true; | }
... where we'd still have the CPACR write to re-enable traps, but it'd be unconditional, and wouldn't need an extra ISB.
If that makes sense to you, I can go spin that as a subsequent cleanup atop this series.
That looks very clean, yes please! Don't forget to drop the part from kvm_hyp_save_fpsimd_host() too.
Will
On Mon, Feb 10, 2025 at 06:06:38PM +0000, Will Deacon wrote:
On Mon, Feb 10, 2025 at 04:59:56PM +0000, Mark Rutland wrote:
On Mon, Feb 10, 2025 at 04:12:43PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:10:56PM +0000, Mark Rutland wrote:
| static inline bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | { | ... | | /* Valid trap */ | | /* | * Enable everything EL2 might need to save/restore state. | * Maybe each of the bits should depend on system_has_xxx() | */ | cpacr_clear_set(0, CPACR_EL1_FPEN | CPACR_EL1_ZEN | CPACR_EL1_SMEN */ | isb(); | | ... | | /* Write out the host state if it's in the registers */ | if (is_protected_kvm_enabled() && host_owns_fp_regs()) | kvm_hyp_save_fpsimd_host(vcpu); | | /* Restore guest state */ | | ... | | /* | * Enable traps for the VCPU. The ERET will cause the traps to | * take effect in the guest, so no ISB is necessary. | */ | cpacr_guest = CPACR_EL1_FPEN; | if (vcpu_has_sve(vcpu)) | cpacr_guest |= CPACR_EL1_ZEN; | if (vcpu_has_sme(vcpu)) // whenever we add this | cpacr_guest |= CPACR_EL1_SMEN; | cpacr_clear_set(CPACR_EL1_FPEN | CPACR_EL1_ZEN | CPACR_EL1_SMEN, | cpacr_guest); | | return true; | }
... where we'd still have the CPACR write to re-enable traps, but it'd be unconditional, and wouldn't need an extra ISB.
If that makes sense to you, I can go spin that as a subsequent cleanup atop this series.
That looks very clean, yes please! Don't forget to drop the part from kvm_hyp_save_fpsimd_host() too.
Yep, that was the idea!
To avoid confusion: I've sent out v3 of this series *without* the change, and I'll prepare that as a follow-up.
Mark.
When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.ZEN (i.e. CPTR_EL2.ZEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic is currently redundant.
The VHE hyp code unconditionally configures CPTR_EL2.ZEN to 0b01 when returning to the host, permitting host kernel usage of SVE.
Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME state, there's no need to save/restore the state of the EL0 SVE trap. The kernel can safely save/restore state without trapping, as described above, and will restore userspace state (including trap controls) before returning to userspace.
Remove the redundant logic.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/kvm_host.h | 1 - arch/arm64/kvm/fpsimd.c | 16 ---------------- 2 files changed, 17 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f56c07568591f..ed6841bf21b22 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -615,7 +615,6 @@ struct cpu_sve_state { struct kvm_host_data { #define KVM_HOST_DATA_FLAG_HAS_SPE 0 #define KVM_HOST_DATA_FLAG_HAS_TRBE 1 -#define KVM_HOST_DATA_FLAG_HOST_SVE_ENABLED 2 #define KVM_HOST_DATA_FLAG_HOST_SME_ENABLED 3 #define KVM_HOST_DATA_FLAG_TRBE_ENABLED 4 #define KVM_HOST_DATA_FLAG_EL1_TRACING_CONFIGURED 5 diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 332cb3904e68b..4ff0dee1a403f 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -65,10 +65,6 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) fpsimd_save_and_flush_cpu_state(); *host_data_ptr(fp_owner) = FP_STATE_FREE;
- host_data_clear_flag(HOST_SVE_ENABLED); - if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN) - host_data_set_flag(HOST_SVE_ENABLED); - if (system_supports_sme()) { host_data_clear_flag(HOST_SME_ENABLED); if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN) @@ -202,18 +198,6 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) * when needed. */ fpsimd_save_and_flush_cpu_state(); - } else if (has_vhe() && system_supports_sve()) { - /* - * The FPSIMD/SVE state in the CPU has not been touched, and we - * have SVE (and VHE): CPACR_EL1 (alias CPTR_EL2) has been - * reset by kvm_reset_cptr_el2() in the Hyp code, disabling SVE - * for EL0. To avoid spurious traps, restore the trap state - * seen by kvm_arch_vcpu_load_fp(): - */ - if (host_data_test_flag(HOST_SVE_ENABLED)) - sysreg_clear_set(CPACR_EL1, 0, CPACR_EL1_ZEN_EL0EN); - else - sysreg_clear_set(CPACR_EL1, CPACR_EL1_ZEN_EL0EN, 0); }
local_irq_restore(flags);
On Thu, Feb 06, 2025 at 02:10:57PM +0000, Mark Rutland wrote:
When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.ZEN (i.e. CPTR_EL2.ZEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic is currently redundant.
The VHE hyp code unconditionally configures CPTR_EL2.ZEN to 0b01 when returning to the host, permitting host kernel usage of SVE.
Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME state, there's no need to save/restore the state of the EL0 SVE trap. The kernel can safely save/restore state without trapping, as described above, and will restore userspace state (including trap controls) before returning to userspace.
Remove the redundant logic.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/kvm_host.h | 1 - arch/arm64/kvm/fpsimd.c | 16 ---------------- 2 files changed, 17 deletions(-)
Acked-by: Will Deacon will@kernel.org
Will
When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.SMEN (i.e. CPTR_EL2.SMEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic has historically been broken, and is currently redundant.
This logic was originally introduced in commit:
861262ab86270206 ("KVM: arm64: Handle SME host state when running guests")
At the time, the VHE hyp code would reset CPTR_EL2.SMEN to 0b00 when returning to the host, trapping host access to SME state. Unfortunately, this was unsafe as the host could take a softirq before calling kvm_arch_vcpu_put_fp(), and if a softirq handler were to use kernel mode NEON the resulting attempt to save the live FPSIMD/SVE/SME state would result in a fatal trap.
That issue was limited to VHE mode. For nVHE/hVHE modes, KVM always saved/restored the host kernel's CPACR_EL1 value, and configured CPTR_EL2.TSM to 0b0, ensuring that host usage of SME would not be trapped.
The issue above was incidentally fixed by commit:
375110ab51dec5dc ("KVM: arm64: Fix resetting SME trap values on reset for (h)VHE")
That commit changed the VHE hyp code to configure CPTR_EL2.SMEN to 0b01 when returning to the host, permitting host kernel usage of SME, avoiding the issue described above. At the time, this was not identified as a fix for commit 861262ab86270206.
Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME state, there's no need to save/restore the state of the EL0 SME trap. The kernel can safely save/restore state without trapping, as described above, and will restore userspace state (including trap controls) before returning to userspace.
Remove the redundant logic.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/kvm_host.h | 1 - arch/arm64/kvm/fpsimd.c | 21 --------------------- 2 files changed, 22 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index ed6841bf21b22..c77acc9904576 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -615,7 +615,6 @@ struct cpu_sve_state { struct kvm_host_data { #define KVM_HOST_DATA_FLAG_HAS_SPE 0 #define KVM_HOST_DATA_FLAG_HAS_TRBE 1 -#define KVM_HOST_DATA_FLAG_HOST_SME_ENABLED 3 #define KVM_HOST_DATA_FLAG_TRBE_ENABLED 4 #define KVM_HOST_DATA_FLAG_EL1_TRACING_CONFIGURED 5 unsigned long flags; diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 4ff0dee1a403f..f64724197958e 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -65,12 +65,6 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) fpsimd_save_and_flush_cpu_state(); *host_data_ptr(fp_owner) = FP_STATE_FREE;
- if (system_supports_sme()) { - host_data_clear_flag(HOST_SME_ENABLED); - if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN) - host_data_set_flag(HOST_SME_ENABLED); - } - /* * If normal guests gain SME support, maintain this behavior for pKVM * guests, which don't support SME. @@ -141,21 +135,6 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
local_irq_save(flags);
- /* - * If we have VHE then the Hyp code will reset CPACR_EL1 to - * the default value and we need to reenable SME. - */ - if (has_vhe() && system_supports_sme()) { - /* Also restore EL0 state seen on entry */ - if (host_data_test_flag(HOST_SME_ENABLED)) - sysreg_clear_set(CPACR_EL1, 0, CPACR_EL1_SMEN); - else - sysreg_clear_set(CPACR_EL1, - CPACR_EL1_SMEN_EL0EN, - CPACR_EL1_SMEN_EL1EN); - isb(); - } - if (guest_owns_fp_regs()) { if (vcpu_has_sve(vcpu)) { u64 zcr = read_sysreg_el1(SYS_ZCR);
On Thu, Feb 06, 2025 at 02:10:58PM +0000, Mark Rutland wrote:
When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.SMEN (i.e. CPTR_EL2.SMEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic has historically been broken, and is currently redundant.
This logic was originally introduced in commit:
861262ab86270206 ("KVM: arm64: Handle SME host state when running guests")
At the time, the VHE hyp code would reset CPTR_EL2.SMEN to 0b00 when returning to the host, trapping host access to SME state. Unfortunately, this was unsafe as the host could take a softirq before calling kvm_arch_vcpu_put_fp(), and if a softirq handler were to use kernel mode NEON the resulting attempt to save the live FPSIMD/SVE/SME state would result in a fatal trap.
That issue was limited to VHE mode. For nVHE/hVHE modes, KVM always saved/restored the host kernel's CPACR_EL1 value, and configured CPTR_EL2.TSM to 0b0, ensuring that host usage of SME would not be trapped.
The issue above was incidentally fixed by commit:
375110ab51dec5dc ("KVM: arm64: Fix resetting SME trap values on reset for (h)VHE")
That commit changed the VHE hyp code to configure CPTR_EL2.SMEN to 0b01 when returning to the host, permitting host kernel usage of SME, avoiding the issue described above. At the time, this was not identified as a fix for commit 861262ab86270206.
Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME state, there's no need to save/restore the state of the EL0 SME trap. The kernel can safely save/restore state without trapping, as described above, and will restore userspace state (including trap controls) before returning to userspace.
Remove the redundant logic.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/kvm_host.h | 1 - arch/arm64/kvm/fpsimd.c | 21 --------------------- 2 files changed, 22 deletions(-)
Acked-by: Will Deacon will@kernel.org
Will
For historical reasons, the VHE and nVHE/hVHE implementations of __activate_cptr_traps() pair with a common implementation of __kvm_reset_cptr_el2(), which ideally would be named __deactivate_cptr_traps().
Rename __kvm_reset_cptr_el2() to __deactivate_cptr_traps(), and split it into separate VHE and nVHE/hVHE variants so that each can be paired with its corresponding implementation of __activate_cptr_traps().
At the same time, fold kvm_write_cptr_el2() into its callers. This makes it clear in-context whether a write is made to the CPACR_EL1 encoding or the CPTR_EL2 encoding, and removes the possibility of confusion as to whether kvm_write_cptr_el2() reformats the sysreg fields as cpacr_clear_set() does.
In the nVHE/hVHE implementation of __activate_cptr_traps(), placing the sysreg writes within the if-else blocks requires that the call to __activate_traps_fpsimd32() is moved earlier, but as this was always called before writing to CPTR_EL2/CPACR_EL1, this should not result in a functional change.
Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/kvm_emulate.h | 42 ---------------------------- arch/arm64/kvm/hyp/nvhe/switch.c | 35 ++++++++++++++++++++--- arch/arm64/kvm/hyp/vhe/switch.c | 12 +++++++- 3 files changed, 42 insertions(+), 47 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 47f2cf408eeda..78ec1ef2cfe82 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -605,48 +605,6 @@ static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu) __cpacr_to_cptr_set(clr, set));\ } while (0)
-static __always_inline void kvm_write_cptr_el2(u64 val) -{ - if (has_vhe() || has_hvhe()) - write_sysreg(val, cpacr_el1); - else - write_sysreg(val, cptr_el2); -} - -/* Resets the value of cptr_el2 when returning to the host. */ -static __always_inline void __kvm_reset_cptr_el2(struct kvm *kvm) -{ - u64 val; - - if (has_vhe()) { - val = (CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN); - if (cpus_have_final_cap(ARM64_SME)) - val |= CPACR_EL1_SMEN_EL1EN; - } else if (has_hvhe()) { - val = CPACR_EL1_FPEN; - - if (!kvm_has_sve(kvm) || !guest_owns_fp_regs()) - val |= CPACR_EL1_ZEN; - if (cpus_have_final_cap(ARM64_SME)) - val |= CPACR_EL1_SMEN; - } else { - val = CPTR_NVHE_EL2_RES1; - - if (kvm_has_sve(kvm) && guest_owns_fp_regs()) - val |= CPTR_EL2_TZ; - if (!cpus_have_final_cap(ARM64_SME)) - val |= CPTR_EL2_TSM; - } - - kvm_write_cptr_el2(val); -} - -#ifdef __KVM_NVHE_HYPERVISOR__ -#define kvm_reset_cptr_el2(v) __kvm_reset_cptr_el2(kern_hyp_va((v)->kvm)) -#else -#define kvm_reset_cptr_el2(v) __kvm_reset_cptr_el2((v)->kvm) -#endif - /* * Returns a 'sanitised' view of CPTR_EL2, translating from nVHE to the VHE * format if E2H isn't set. diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 7a2d189176249..5d79f63a4f861 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -39,6 +39,9 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu) { u64 val = CPTR_EL2_TAM; /* Same bit irrespective of E2H */
+ if (!guest_owns_fp_regs()) + __activate_traps_fpsimd32(vcpu); + if (has_hvhe()) { val |= CPACR_EL1_TTA;
@@ -47,6 +50,8 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu) if (vcpu_has_sve(vcpu)) val |= CPACR_EL1_ZEN; } + + write_sysreg(val, cpacr_el1); } else { val |= CPTR_EL2_TTA | CPTR_NVHE_EL2_RES1;
@@ -61,12 +66,34 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu)
if (!guest_owns_fp_regs()) val |= CPTR_EL2_TFP; + + write_sysreg(val, cptr_el2); } +}
- if (!guest_owns_fp_regs()) - __activate_traps_fpsimd32(vcpu); +static void __deactivate_cptr_traps(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
- kvm_write_cptr_el2(val); + if (has_hvhe()) { + u64 val = CPACR_EL1_FPEN; + + if (!kvm_has_sve(kvm) || !guest_owns_fp_regs()) + val |= CPACR_EL1_ZEN; + if (cpus_have_final_cap(ARM64_SME)) + val |= CPACR_EL1_SMEN; + + write_sysreg(val, cpacr_el1); + } else { + u64 val = CPTR_NVHE_EL2_RES1; + + if (kvm_has_sve(kvm) && guest_owns_fp_regs()) + val |= CPTR_EL2_TZ; + if (!cpus_have_final_cap(ARM64_SME)) + val |= CPTR_EL2_TSM; + + write_sysreg(val, cptr_el2); + } }
static void __activate_traps(struct kvm_vcpu *vcpu) @@ -119,7 +146,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
- kvm_reset_cptr_el2(vcpu); + __deactivate_cptr_traps(vcpu); write_sysreg(__kvm_hyp_host_vector, vbar_el2); }
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index e8a07d4bb546b..4748b1947ffa0 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -136,6 +136,16 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu) write_sysreg(val, cpacr_el1); }
+static void __deactivate_cptr_traps(struct kvm_vcpu *vcpu) +{ + u64 val = CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN; + + if (cpus_have_final_cap(ARM64_SME)) + val |= CPACR_EL1_SMEN_EL1EN; + + write_sysreg(val, cpacr_el1); +} + static void __activate_traps(struct kvm_vcpu *vcpu) { u64 val; @@ -207,7 +217,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu) */ asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
- kvm_reset_cptr_el2(vcpu); + __deactivate_cptr_traps(vcpu);
if (!arm64_kernel_unmapped_at_el0()) host_vectors = __this_cpu_read(this_cpu_vector);
On Thu, Feb 06, 2025 at 02:10:59PM +0000, Mark Rutland wrote:
For historical reasons, the VHE and nVHE/hVHE implementations of __activate_cptr_traps() pair with a common implementation of __kvm_reset_cptr_el2(), which ideally would be named __deactivate_cptr_traps().
Rename __kvm_reset_cptr_el2() to __deactivate_cptr_traps(), and split it into separate VHE and nVHE/hVHE variants so that each can be paired with its corresponding implementation of __activate_cptr_traps().
At the same time, fold kvm_write_cptr_el2() into its callers. This makes it clear in-context whether a write is made to the CPACR_EL1 encoding or the CPTR_EL2 encoding, and removes the possibility of confusion as to whether kvm_write_cptr_el2() reformats the sysreg fields as cpacr_clear_set() does.
In the nVHE/hVHE implementation of __activate_cptr_traps(), placing the sysreg writes within the if-else blocks requires that the call to __activate_traps_fpsimd32() is moved earlier, but as this was always called before writing to CPTR_EL2/CPACR_EL1, this should not result in a functional change.
Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
[...]
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 7a2d189176249..5d79f63a4f861 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -39,6 +39,9 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu) { u64 val = CPTR_EL2_TAM; /* Same bit irrespective of E2H */
- if (!guest_owns_fp_regs())
__activate_traps_fpsimd32(vcpu);
- if (has_hvhe()) { val |= CPACR_EL1_TTA;
@@ -47,6 +50,8 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu) if (vcpu_has_sve(vcpu)) val |= CPACR_EL1_ZEN; }
} else { val |= CPTR_EL2_TTA | CPTR_NVHE_EL2_RES1;write_sysreg(val, cpacr_el1);
@@ -61,12 +66,34 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu) if (!guest_owns_fp_regs()) val |= CPTR_EL2_TFP;
}write_sysreg(val, cptr_el2);
+}
- if (!guest_owns_fp_regs())
__activate_traps_fpsimd32(vcpu);
+static void __deactivate_cptr_traps(struct kvm_vcpu *vcpu) +{
- struct kvm *kvm = kern_hyp_va(vcpu->kvm);
nit: You could lose the local if you used vcpu_has_sve(vcpu) instead.
However, given that this gets removed _anyway_ when we eagerly switch ZCR later on:
Acked-by: Will Deacon will@kernel.org
Will
The hyp exit handling logic is largely shared between VHE and nVHE/hVHE, with common logic in arch/arm64/kvm/hyp/include/hyp/switch.h. The code in the header depends on function definitions provided by arch/arm64/kvm/hyp/vhe/switch.c and arch/arm64/kvm/hyp/nvhe/switch.c when they include the header.
This is an unusual header dependency, and prevents the use of arch/arm64/kvm/hyp/include/hyp/switch.h in other files as this would result in compiler warnings regarding missing definitions, e.g.
| In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:733:31: warning: 'kvm_get_exit_handler_array' used but never defined | 733 | static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:735:13: warning: 'early_exit_filter' used but never defined | 735 | static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code); | | ^~~~~~~~~~~~~~~~~
Refactor the logic such that the header doesn't depend on anything from the C files. There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org --- arch/arm64/kvm/hyp/include/hyp/switch.h | 30 +++++-------------------- arch/arm64/kvm/hyp/nvhe/switch.c | 30 ++++++++++++++----------- arch/arm64/kvm/hyp/vhe/switch.c | 9 ++++---- 3 files changed, 27 insertions(+), 42 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index c5b8a11ac4f50..46df5c2eeaf57 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -679,23 +679,16 @@ static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
typedef bool (*exit_handler_fn)(struct kvm_vcpu *, u64 *);
-static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); - -static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code); - /* * Allow the hypervisor to handle the exit with an exit handler if it has one. * * Returns true if the hypervisor handled the exit, and control should go back * to the guest, or false if it hasn't. */ -static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code, + const exit_handler_fn *handlers) { - const exit_handler_fn *handlers = kvm_get_exit_handler_array(vcpu); - exit_handler_fn fn; - - fn = handlers[kvm_vcpu_trap_get_class(vcpu)]; - + exit_handler_fn fn = handlers[kvm_vcpu_trap_get_class(vcpu)]; if (fn) return fn(vcpu, exit_code);
@@ -725,20 +718,9 @@ static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu, u64 *exit_code * the guest, false when we should restore the host state and return to the * main run loop. */ -static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool __fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code, + const exit_handler_fn *handlers) { - /* - * Save PSTATE early so that we can evaluate the vcpu mode - * early on. - */ - synchronize_vcpu_pstate(vcpu, exit_code); - - /* - * Check whether we want to repaint the state one way or - * another. - */ - early_exit_filter(vcpu, exit_code); - if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ) vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
@@ -768,7 +750,7 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) goto exit;
/* Check if there's an exit handler and allow it to handle the exit. */ - if (kvm_hyp_handle_exit(vcpu, exit_code)) + if (kvm_hyp_handle_exit(vcpu, exit_code, handlers)) goto guest; exit: /* Return to the host kernel and handle the exit */ diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 5d79f63a4f861..324b62329c10b 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -250,20 +250,22 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu) return hyp_exit_handlers; }
-/* - * Some guests (e.g., protected VMs) are not be allowed to run in AArch32. - * The ARMv8 architecture does not give the hypervisor a mechanism to prevent a - * guest from dropping to AArch32 EL0 if implemented by the CPU. If the - * hypervisor spots a guest in such a state ensure it is handled, and don't - * trust the host to spot or fix it. The check below is based on the one in - * kvm_arch_vcpu_ioctl_run(). - * - * Returns false if the guest ran in AArch32 when it shouldn't have, and - * thus should exit to the host, or true if a the guest run loop can continue. - */ -static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) { - if (unlikely(vcpu_is_protected(vcpu) && vcpu_mode_is_32bit(vcpu))) { + const exit_handler_fn *handlers = kvm_get_exit_handler_array(vcpu); + + synchronize_vcpu_pstate(vcpu, exit_code); + + /* + * Some guests (e.g., protected VMs) are not be allowed to run in + * AArch32. The ARMv8 architecture does not give the hypervisor a + * mechanism to prevent a guest from dropping to AArch32 EL0 if + * implemented by the CPU. If the hypervisor spots a guest in such a + * state ensure it is handled, and don't trust the host to spot or fix + * it. The check below is based on the one in + * kvm_arch_vcpu_ioctl_run(). + */ + if (unlikely(vcpu_is_protected(vcpu) && vcpu_mode_is_32bit(vcpu))) { /* * As we have caught the guest red-handed, decide that it isn't * fit for purpose anymore by making the vcpu invalid. The VMM @@ -275,6 +277,8 @@ static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code) *exit_code &= BIT(ARM_EXIT_WITH_SERROR_BIT); *exit_code |= ARM_EXCEPTION_IL; } + + return __fixup_guest_exit(vcpu, exit_code, handlers); }
/* Switch to the guest for legacy non-VHE systems */ diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index 4748b1947ffa0..c854d84458892 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -540,13 +540,10 @@ static const exit_handler_fn hyp_exit_handlers[] = { [ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops, };
-static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu) +static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) { - return hyp_exit_handlers; -} + synchronize_vcpu_pstate(vcpu, exit_code);
-static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code) -{ /* * If we were in HYP context on entry, adjust the PSTATE view * so that the usual helpers work correctly. @@ -566,6 +563,8 @@ static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code) *vcpu_cpsr(vcpu) &= ~(PSR_MODE_MASK | PSR_MODE32_BIT); *vcpu_cpsr(vcpu) |= mode; } + + return __fixup_guest_exit(vcpu, exit_code, hyp_exit_handlers); }
/* Switch to the guest for VHE systems running in EL2 */
On Thu, Feb 06, 2025 at 02:11:00PM +0000, Mark Rutland wrote:
The hyp exit handling logic is largely shared between VHE and nVHE/hVHE, with common logic in arch/arm64/kvm/hyp/include/hyp/switch.h. The code in the header depends on function definitions provided by arch/arm64/kvm/hyp/vhe/switch.c and arch/arm64/kvm/hyp/nvhe/switch.c when they include the header.
This is an unusual header dependency, and prevents the use of arch/arm64/kvm/hyp/include/hyp/switch.h in other files as this would result in compiler warnings regarding missing definitions, e.g.
| In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:733:31: warning: 'kvm_get_exit_handler_array' used but never defined | 733 | static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:735:13: warning: 'early_exit_filter' used but never defined | 735 | static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code); | | ^~~~~~~~~~~~~~~~~
Refactor the logic such that the header doesn't depend on anything from the C files. There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
arch/arm64/kvm/hyp/include/hyp/switch.h | 30 +++++-------------------- arch/arm64/kvm/hyp/nvhe/switch.c | 30 ++++++++++++++----------- arch/arm64/kvm/hyp/vhe/switch.c | 9 ++++---- 3 files changed, 27 insertions(+), 42 deletions(-)
Acked-by: Will Deacon will@kernel.org
Will
The shared hyp switch header has a number of static functions which might not be used by all files that include the header, and when unused they will provoke compiler warnings, e.g.
| In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:703:13: warning: 'kvm_hyp_handle_dabt_low' defined but not used [-Wunused-function] | 703 | static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:682:13: warning: 'kvm_hyp_handle_cp15_32' defined but not used [-Wunused-function] | 682 | static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:662:13: warning: 'kvm_hyp_handle_sysreg' defined but not used [-Wunused-function] | 662 | static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:458:13: warning: 'kvm_hyp_handle_fpsimd' defined but not used [-Wunused-function] | 458 | static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:329:13: warning: 'kvm_hyp_handle_mops' defined but not used [-Wunused-function] | 329 | static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~
Mark these functions as 'inline' to suppress this warning. This shouldn't result in any functional change.
At the same time, avoid the use of __alias() in the header and alias kvm_hyp_handle_iabt_low() and kvm_hyp_handle_watchpt_low() to kvm_hyp_handle_memory_fault() using CPP, matching the style in the rest of the kernel. For consistency, kvm_hyp_handle_memory_fault() is also marked as 'inline'.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org --- arch/arm64/kvm/hyp/include/hyp/switch.h | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 46df5c2eeaf57..163867f7f7c52 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -326,7 +326,7 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu) return __get_fault_info(vcpu->arch.fault.esr_el2, &vcpu->arch.fault); }
-static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) { *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR); arm64_mops_reset_regs(vcpu_gp_regs(vcpu), vcpu->arch.fault.esr_el2); @@ -404,7 +404,7 @@ static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu) * If FP/SIMD is not implemented, handle the trap and inject an undefined * instruction exception to the guest. Similarly for trapped SVE accesses. */ -static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) { bool sve_guest; u8 esr_ec; @@ -608,7 +608,7 @@ static bool handle_ampere1_tcr(struct kvm_vcpu *vcpu) return true; }
-static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) { if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) && handle_tx2_tvm(vcpu)) @@ -628,7 +628,7 @@ static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) return false; }
-static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) { if (static_branch_unlikely(&vgic_v3_cpuif_trap) && __vgic_v3_perform_cpuif_access(vcpu) == 1) @@ -637,19 +637,18 @@ static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) return false; }
-static bool kvm_hyp_handle_memory_fault(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool kvm_hyp_handle_memory_fault(struct kvm_vcpu *vcpu, + u64 *exit_code) { if (!__populate_fault_info(vcpu)) return true;
return false; } -static bool kvm_hyp_handle_iabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) - __alias(kvm_hyp_handle_memory_fault); -static bool kvm_hyp_handle_watchpt_low(struct kvm_vcpu *vcpu, u64 *exit_code) - __alias(kvm_hyp_handle_memory_fault); +#define kvm_hyp_handle_iabt_low kvm_hyp_handle_memory_fault +#define kvm_hyp_handle_watchpt_low kvm_hyp_handle_memory_fault
-static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) +static inline bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) { if (kvm_hyp_handle_memory_fault(vcpu, exit_code)) return true;
On Thu, Feb 06, 2025 at 02:11:01PM +0000, Mark Rutland wrote:
The shared hyp switch header has a number of static functions which might not be used by all files that include the header, and when unused they will provoke compiler warnings, e.g.
| In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:703:13: warning: 'kvm_hyp_handle_dabt_low' defined but not used [-Wunused-function] | 703 | static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:682:13: warning: 'kvm_hyp_handle_cp15_32' defined but not used [-Wunused-function] | 682 | static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:662:13: warning: 'kvm_hyp_handle_sysreg' defined but not used [-Wunused-function] | 662 | static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:458:13: warning: 'kvm_hyp_handle_fpsimd' defined but not used [-Wunused-function] | 458 | static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:329:13: warning: 'kvm_hyp_handle_mops' defined but not used [-Wunused-function] | 329 | static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~
Mark these functions as 'inline' to suppress this warning. This shouldn't result in any functional change.
At the same time, avoid the use of __alias() in the header and alias kvm_hyp_handle_iabt_low() and kvm_hyp_handle_watchpt_low() to kvm_hyp_handle_memory_fault() using CPP, matching the style in the rest of the kernel. For consistency, kvm_hyp_handle_memory_fault() is also marked as 'inline'.
Signed-off-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Mark Brown broonie@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
arch/arm64/kvm/hyp/include/hyp/switch.h | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
Acked-by: Will Deacon will@kernel.org
Will
In non-protected KVM modes, while the guest FPSIMD/SVE/SME state is live on the CPU, the host's active SVE VL may differ from the guest's maximum SVE VL:
* For VHE hosts, when a VM uses NV, ZCR_EL2 contains a value constrained by the guest hypervisor, which may be less than or equal to that guest's maximum VL.
Note: in this case the value of ZCR_EL1 is immaterial due to E2H.
* For nVHE/hVHE hosts, ZCR_EL1 contains a value written by the guest, which may be less than or greater than the guest's maximum VL.
Note: in this case hyp code traps host SVE usage and lazily restores ZCR_EL2 to the host's maximum VL, which may be greater than the guest's maximum VL.
This can be the case between exiting a guest and kvm_arch_vcpu_put_fp(). If a softirq is taken during this period and the softirq handler tries to use kernel-mode NEON, then the kernel will fail to save the guest's FPSIMD/SVE state, and will pend a SIGKILL for the current thread.
This happens because kvm_arch_vcpu_ctxsync_fp() binds the guest's live FPSIMD/SVE state with the guest's maximum SVE VL, and fpsimd_save_user_state() verifies that the live SVE VL is as expected before attempting to save the register state:
| if (WARN_ON(sve_get_vl() != vl)) { | force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); | return; | }
Fix this and make this a bit easier to reason about by always eagerly switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this happening, there's no need to trap host SVE usage, and the nVHE/nVHVE __deactivate_cptr_traps() logic can be simplified enable host access to all present FPSIMD/SVE/SME features.
In protected nVHE/hVHVE modes, the host's state is always saved/restored by hyp, and the guest's state is saved prior to exit to the host, so from the host's PoV the guest never has live FPSIMD/SVE/SME state, and the host's ZCR_EL1 is never clobbered by hyp.
Fixes: 8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE") Fixes: 2e3cf82063a00ea0 ("KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state") Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: stable@vger.kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org --- arch/arm64/kvm/fpsimd.c | 30 --------------- arch/arm64/kvm/hyp/include/hyp/switch.h | 51 +++++++++++++++++++++++++ arch/arm64/kvm/hyp/nvhe/hyp-main.c | 13 +++---- arch/arm64/kvm/hyp/nvhe/switch.c | 6 +-- arch/arm64/kvm/hyp/vhe/switch.c | 4 ++ 5 files changed, 63 insertions(+), 41 deletions(-)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index f64724197958e..3cbb999419af7 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -136,36 +136,6 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) local_irq_save(flags);
if (guest_owns_fp_regs()) { - if (vcpu_has_sve(vcpu)) { - u64 zcr = read_sysreg_el1(SYS_ZCR); - - /* - * If the vCPU is in the hyp context then ZCR_EL1 is - * loaded with its vEL2 counterpart. - */ - __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr; - - /* - * Restore the VL that was saved when bound to the CPU, - * which is the maximum VL for the guest. Because the - * layout of the data when saving the sve state depends - * on the VL, we need to use a consistent (i.e., the - * maximum) VL. - * Note that this means that at guest exit ZCR_EL1 is - * not necessarily the same as on guest entry. - * - * ZCR_EL2 holds the guest hypervisor's VL when running - * a nested guest, which could be smaller than the - * max for the vCPU. Similar to above, we first need to - * switch to a VL consistent with the layout of the - * vCPU's SVE state. KVM support for NV implies VHE, so - * using the ZCR_EL1 alias is safe. - */ - if (!has_vhe() || (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu))) - sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, - SYS_ZCR_EL1); - } - /* * Flush (save and invalidate) the fpsimd/sve state so that if * the host tries to use fpsimd/sve, it's not using stale data diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 163867f7f7c52..bbec7cd38da33 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -375,6 +375,57 @@ static inline void __hyp_sve_save_host(void) true); }
+static inline void fpsimd_lazy_switch_to_guest(struct kvm_vcpu *vcpu) +{ + u64 zcr_el1, zcr_el2; + + if (!guest_owns_fp_regs()) + return; + + if (vcpu_has_sve(vcpu)) { + /* A guest hypervisor may restrict the effective max VL. */ + if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) + zcr_el2 = __vcpu_sys_reg(vcpu, ZCR_EL2); + else + zcr_el2 = vcpu_sve_max_vq(vcpu) - 1; + + write_sysreg_el2(zcr_el2, SYS_ZCR); + + zcr_el1 = __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)); + write_sysreg_el1(zcr_el1, SYS_ZCR); + } +} + +static inline void fpsimd_lazy_switch_to_host(struct kvm_vcpu *vcpu) +{ + u64 zcr_el1, zcr_el2; + + if (!guest_owns_fp_regs()) + return; + + if (vcpu_has_sve(vcpu)) { + zcr_el1 = read_sysreg_el1(SYS_ZCR); + __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr_el1; + + /* + * The guest's state is always saved using the guest's max VL. + * Ensure that the host has the guest's max VL active such that + * the host can save the guest's state lazily, but don't + * artificially restrict the host to the guest's max VL. + */ + if (has_vhe()) { + zcr_el2 = vcpu_sve_max_vq(vcpu) - 1; + write_sysreg_el2(zcr_el2, SYS_ZCR); + } else { + zcr_el2 = sve_vq_from_vl(kvm_host_sve_max_vl) - 1; + write_sysreg_el2(zcr_el2, SYS_ZCR); + + zcr_el1 = vcpu_sve_max_vq(vcpu) - 1; + write_sysreg_el1(zcr_el1, SYS_ZCR); + } + } +} + static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu) { /* diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index ad1abd5493862..0c745a578aa7e 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -5,6 +5,7 @@ */
#include <hyp/adjust_pc.h> +#include <hyp/switch.h>
#include <asm/pgtable-types.h> #include <asm/kvm_asm.h> @@ -200,8 +201,12 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
sync_hyp_vcpu(hyp_vcpu); } else { + struct kvm_vcpu *vcpu = kern_hyp_va(host_vcpu); + /* The host is fully trusted, run its vCPU directly. */ - ret = __kvm_vcpu_run(kern_hyp_va(host_vcpu)); + fpsimd_lazy_switch_to_guest(vcpu); + ret = __kvm_vcpu_run(vcpu); + fpsimd_lazy_switch_to_host(vcpu); } out: cpu_reg(host_ctxt, 1) = ret; @@ -651,12 +656,6 @@ void handle_trap(struct kvm_cpu_context *host_ctxt) case ESR_ELx_EC_SMC64: handle_host_smc(host_ctxt); break; - case ESR_ELx_EC_SVE: - cpacr_clear_set(0, CPACR_EL1_ZEN); - isb(); - sve_cond_update_zcr_vq(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, - SYS_ZCR_EL2); - break; case ESR_ELx_EC_IABT_LOW: case ESR_ELx_EC_DABT_LOW: handle_host_mem_abort(host_ctxt); diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 324b62329c10b..eaeda9c8a1aa6 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -73,12 +73,10 @@ static void __activate_cptr_traps(struct kvm_vcpu *vcpu)
static void __deactivate_cptr_traps(struct kvm_vcpu *vcpu) { - struct kvm *kvm = kern_hyp_va(vcpu->kvm); - if (has_hvhe()) { u64 val = CPACR_EL1_FPEN;
- if (!kvm_has_sve(kvm) || !guest_owns_fp_regs()) + if (cpus_have_final_cap(ARM64_SVE)) val |= CPACR_EL1_ZEN; if (cpus_have_final_cap(ARM64_SME)) val |= CPACR_EL1_SMEN; @@ -87,7 +85,7 @@ static void __deactivate_cptr_traps(struct kvm_vcpu *vcpu) } else { u64 val = CPTR_NVHE_EL2_RES1;
- if (kvm_has_sve(kvm) && guest_owns_fp_regs()) + if (!cpus_have_final_cap(ARM64_SVE)) val |= CPTR_EL2_TZ; if (!cpus_have_final_cap(ARM64_SME)) val |= CPTR_EL2_TSM; diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index c854d84458892..647737d6e8d0b 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -579,6 +579,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_save_host_state_vhe(host_ctxt);
+ fpsimd_lazy_switch_to_guest(vcpu); + /* * Note that ARM erratum 1165522 requires us to configure both stage 1 * and stage 2 translation for the guest context before we clear @@ -603,6 +605,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__deactivate_traps(vcpu);
+ fpsimd_lazy_switch_to_host(vcpu); + sysreg_restore_host_state_vhe(host_ctxt);
if (guest_owns_fp_regs())
On Thu, Feb 06, 2025 at 02:11:02PM +0000, Mark Rutland wrote:
+static inline void fpsimd_lazy_switch_to_guest(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
/* A guest hypervisor may restrict the effective max VL. */
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu))
zcr_el2 = __vcpu_sys_reg(vcpu, ZCR_EL2);
else
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu));
write_sysreg_el1(zcr_el1, SYS_ZCR);
- }
+}
I don't think we should worry about it for this series but just for future reference:
These new functions do unconditional writes for EL2, the old code made use of sve_cond_update_zcr_vq() which suppresses the writes but didn't have the selection of actual sysreg that write_sysreg_el2() has. I believe this was done due to a concern about potential overheads from writes to the LEN value effective in the current EL. OTOH that also introduced an additional read to get the current value, and that was all done without practical systems to benchmark any actual impacts from noop writes so there's a reasonable chance it's just not a practical issue. We should check this on hardware, but that can be done separately.
On Thu, Feb 06, 2025 at 07:12:52PM +0000, Mark Brown wrote:
On Thu, Feb 06, 2025 at 02:11:02PM +0000, Mark Rutland wrote:
+static inline void fpsimd_lazy_switch_to_guest(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
/* A guest hypervisor may restrict the effective max VL. */
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu))
zcr_el2 = __vcpu_sys_reg(vcpu, ZCR_EL2);
else
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu));
write_sysreg_el1(zcr_el1, SYS_ZCR);
- }
+}
I don't think we should worry about it for this series but just for future reference:
These new functions do unconditional writes for EL2, the old code made use of sve_cond_update_zcr_vq() which suppresses the writes but didn't have the selection of actual sysreg that write_sysreg_el2() has. I believe this was done due to a concern about potential overheads from writes to the LEN value effective in the current EL. OTOH that also introduced an additional read to get the current value, and that was all done without practical systems to benchmark any actual impacts from noop writes so there's a reasonable chance it's just not a practical issue. We should check this on hardware, but that can be done separately.
Yep, I'm aware of that.
Mark.
On Thu, Feb 06, 2025 at 02:11:02PM +0000, Mark Rutland wrote:
In non-protected KVM modes, while the guest FPSIMD/SVE/SME state is live on the CPU, the host's active SVE VL may differ from the guest's maximum SVE VL:
For VHE hosts, when a VM uses NV, ZCR_EL2 contains a value constrained by the guest hypervisor, which may be less than or equal to that guest's maximum VL.
Note: in this case the value of ZCR_EL1 is immaterial due to E2H.
For nVHE/hVHE hosts, ZCR_EL1 contains a value written by the guest, which may be less than or greater than the guest's maximum VL.
Note: in this case hyp code traps host SVE usage and lazily restores ZCR_EL2 to the host's maximum VL, which may be greater than the guest's maximum VL.
This can be the case between exiting a guest and kvm_arch_vcpu_put_fp(). If a softirq is taken during this period and the softirq handler tries to use kernel-mode NEON, then the kernel will fail to save the guest's FPSIMD/SVE state, and will pend a SIGKILL for the current thread.
This happens because kvm_arch_vcpu_ctxsync_fp() binds the guest's live FPSIMD/SVE state with the guest's maximum SVE VL, and fpsimd_save_user_state() verifies that the live SVE VL is as expected before attempting to save the register state:
| if (WARN_ON(sve_get_vl() != vl)) { | force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); | return; | }
Fix this and make this a bit easier to reason about by always eagerly switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this happening, there's no need to trap host SVE usage, and the nVHE/nVHVE
nit: nVHVE?
(also, note to Fuad: I think we're trapping FPSIMD/SVE from the host with pKVM in Android, so we'll want to fix that when we take this patch via -stable)
__deactivate_cptr_traps() logic can be simplified enable host access to
nit: to enable
all present FPSIMD/SVE/SME features.
In protected nVHE/hVHVE modes, the host's state is always saved/restored
nit: hVHVE
(something tells me these acronyms aren't particularly friendly!)
by hyp, and the guest's state is saved prior to exit to the host, so from the host's PoV the guest never has live FPSIMD/SVE/SME state, and the host's ZCR_EL1 is never clobbered by hyp.
Fixes: 8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE") Fixes: 2e3cf82063a00ea0 ("KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state") Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: stable@vger.kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Fuad Tabba tabba@google.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Will Deacon will@kernel.org
arch/arm64/kvm/fpsimd.c | 30 --------------- arch/arm64/kvm/hyp/include/hyp/switch.h | 51 +++++++++++++++++++++++++ arch/arm64/kvm/hyp/nvhe/hyp-main.c | 13 +++---- arch/arm64/kvm/hyp/nvhe/switch.c | 6 +-- arch/arm64/kvm/hyp/vhe/switch.c | 4 ++ 5 files changed, 63 insertions(+), 41 deletions(-)
[...]
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 163867f7f7c52..bbec7cd38da33 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -375,6 +375,57 @@ static inline void __hyp_sve_save_host(void) true); } +static inline void fpsimd_lazy_switch_to_guest(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
/* A guest hypervisor may restrict the effective max VL. */
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu))
zcr_el2 = __vcpu_sys_reg(vcpu, ZCR_EL2);
else
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu));
write_sysreg_el1(zcr_el1, SYS_ZCR);
- }
+}
+static inline void fpsimd_lazy_switch_to_host(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
zcr_el1 = read_sysreg_el1(SYS_ZCR);
__vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr_el1;
/*
* The guest's state is always saved using the guest's max VL.
* Ensure that the host has the guest's max VL active such that
* the host can save the guest's state lazily, but don't
* artificially restrict the host to the guest's max VL.
*/
if (has_vhe()) {
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
} else {
zcr_el2 = sve_vq_from_vl(kvm_host_sve_max_vl) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el1(zcr_el1, SYS_ZCR);
Do we need an ISB before this to make sure that the CPTR traps have been deactivated properly?
Will
On Mon, Feb 10, 2025 at 04:53:27PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:11:02PM +0000, Mark Rutland wrote:
Fix this and make this a bit easier to reason about by always eagerly switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this happening, there's no need to trap host SVE usage, and the nVHE/nVHVE
nit: nVHVE?
(also, note to Fuad: I think we're trapping FPSIMD/SVE from the host with pKVM in Android, so we'll want to fix that when we take this patch via -stable)
__deactivate_cptr_traps() logic can be simplified enable host access to
nit: to enable
all present FPSIMD/SVE/SME features.
In protected nVHE/hVHVE modes, the host's state is always saved/restored
nit: hVHVE
(something tells me these acronyms aren't particularly friendly!)
Aargh, sorry about those -- I've fixed those up and I'll give the series another once-over.
[...]
+static inline void fpsimd_lazy_switch_to_guest(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
/* A guest hypervisor may restrict the effective max VL. */
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu))
zcr_el2 = __vcpu_sys_reg(vcpu, ZCR_EL2);
else
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = __vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu));
write_sysreg_el1(zcr_el1, SYS_ZCR);
- }
+}
+static inline void fpsimd_lazy_switch_to_host(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
zcr_el1 = read_sysreg_el1(SYS_ZCR);
__vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr_el1;
/*
* The guest's state is always saved using the guest's max VL.
* Ensure that the host has the guest's max VL active such that
* the host can save the guest's state lazily, but don't
* artificially restrict the host to the guest's max VL.
*/
if (has_vhe()) {
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
} else {
zcr_el2 = sve_vq_from_vl(kvm_host_sve_max_vl) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el1(zcr_el1, SYS_ZCR);
Do we need an ISB before this to make sure that the CPTR traps have been deactivated properly?
Sorry, I had meant to add a comment here that this relies upon a subtlety that avoids the need for the ISB.
When the guest owns the FP regs here, we know:
* If the guest doesn't have SVE, then we're not poking anything, and so no ISB is necessary.
* If the guest has SVE, then either:
- The guest owned the FP regs when it was entered.
- The guest *didn't* own the FP regs when it was entered, but acquired ownership via a trap which executed kvm_hyp_handle_fpsimd().
... and in either case, *after* disabling the traps there's been an ERET to the guest and an exception back to hyp, either of which provides the necessary context synchronization such that the traps are disabled here.
Does that make sense to you?
Mark.
On Mon, Feb 10, 2025 at 05:21:59PM +0000, Mark Rutland wrote:
On Mon, Feb 10, 2025 at 04:53:27PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:11:02PM +0000, Mark Rutland wrote:
+static inline void fpsimd_lazy_switch_to_host(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
zcr_el1 = read_sysreg_el1(SYS_ZCR);
__vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr_el1;
/*
* The guest's state is always saved using the guest's max VL.
* Ensure that the host has the guest's max VL active such that
* the host can save the guest's state lazily, but don't
* artificially restrict the host to the guest's max VL.
*/
if (has_vhe()) {
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
} else {
zcr_el2 = sve_vq_from_vl(kvm_host_sve_max_vl) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el1(zcr_el1, SYS_ZCR);
Do we need an ISB before this to make sure that the CPTR traps have been deactivated properly?
Sorry, I had meant to add a comment here that this relies upon a subtlety that avoids the need for the ISB.
Ah yes, it really all hinges on guest_owns_fp_regs() and so I think a comment would be helpful, thanks.
Just on this, though:
When the guest owns the FP regs here, we know:
If the guest doesn't have SVE, then we're not poking anything, and so no ISB is necessary.
If the guest has SVE, then either:
The guest owned the FP regs when it was entered.
The guest *didn't* own the FP regs when it was entered, but acquired ownership via a trap which executed kvm_hyp_handle_fpsimd().
... and in either case, *after* disabling the traps there's been an ERET to the guest and an exception back to hyp, either of which provides the necessary context synchronization such that the traps are disabled here.
What about the case where we find that there's an interrupt pending on return to the guest? In that case, I think we elide the ERET and go back to the host (see the check of isr_el1 in hyp/entry.S).
Will
On Mon, Feb 10, 2025 at 06:20:09PM +0000, Will Deacon wrote:
On Mon, Feb 10, 2025 at 05:21:59PM +0000, Mark Rutland wrote:
On Mon, Feb 10, 2025 at 04:53:27PM +0000, Will Deacon wrote:
On Thu, Feb 06, 2025 at 02:11:02PM +0000, Mark Rutland wrote:
+static inline void fpsimd_lazy_switch_to_host(struct kvm_vcpu *vcpu) +{
- u64 zcr_el1, zcr_el2;
- if (!guest_owns_fp_regs())
return;
- if (vcpu_has_sve(vcpu)) {
zcr_el1 = read_sysreg_el1(SYS_ZCR);
__vcpu_sys_reg(vcpu, vcpu_sve_zcr_elx(vcpu)) = zcr_el1;
/*
* The guest's state is always saved using the guest's max VL.
* Ensure that the host has the guest's max VL active such that
* the host can save the guest's state lazily, but don't
* artificially restrict the host to the guest's max VL.
*/
if (has_vhe()) {
zcr_el2 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
} else {
zcr_el2 = sve_vq_from_vl(kvm_host_sve_max_vl) - 1;
write_sysreg_el2(zcr_el2, SYS_ZCR);
zcr_el1 = vcpu_sve_max_vq(vcpu) - 1;
write_sysreg_el1(zcr_el1, SYS_ZCR);
Do we need an ISB before this to make sure that the CPTR traps have been deactivated properly?
Sorry, I had meant to add a comment here that this relies upon a subtlety that avoids the need for the ISB.
Ah yes, it really all hinges on guest_owns_fp_regs() and so I think a comment would be helpful, thanks.
Just on this, though:
When the guest owns the FP regs here, we know:
If the guest doesn't have SVE, then we're not poking anything, and so no ISB is necessary.
If the guest has SVE, then either:
The guest owned the FP regs when it was entered.
The guest *didn't* own the FP regs when it was entered, but acquired ownership via a trap which executed kvm_hyp_handle_fpsimd().
... and in either case, *after* disabling the traps there's been an ERET to the guest and an exception back to hyp, either of which provides the necessary context synchronization such that the traps are disabled here.
What about the case where we find that there's an interrupt pending on return to the guest? In that case, I think we elide the ERET and go back to the host (see the check of isr_el1 in hyp/entry.S).
Ah; I had missed that, and evidently I had not looked at the entry code.
Given that, I think the options are:
(a) Add an ISB after disabling the traps, before returning to the guest.
(b) Add an ISB in fpsimd_lazy_switch_to_host() above.
(c) Add an ISB in that sequence in hyp/entry.S, just before the ret, to ensure that __guest_enter() always provides a context synchronization event even when it doesn't enter the guest, regardless of ARM64_HAS_RAS_EXTN.
I think (c) is probably the nicest, since that avoids the need for redundant barriers in the common case, and those short-circuited exits are hopefully rare.
Obviously that would mean adding comments in both __guest_enter() and fpsimd_lazy_switch_to_host().
Mark.
On Thu, Feb 06, 2025 at 02:10:54PM +0000, Mark Rutland wrote:
These patches fix some issues with the way KVM manages FPSIMD/SVE/SME state. The series supersedes my earlier attempt at fixing the host SVE state corruption issue:
...
Patch 8 addresses some mismanagement of ZCR_EL{1,2} which can result in the host VMM unexpectedly receiving a SIGKILL. To fix this, we eagerly switch ZCR_EL{1,2} at guest<->host transitions, as discussed on another series:
https://lore.kernel.org/linux-arm-kernel/Z4pAMaEYvdLpmbg2@J2N7QTR9R3/ https://lore.kernel.org/linux-arm-kernel/86o6zzukwr.wl-maz@kernel.org/ https://lore.kernel.org/linux-arm-kernel/Z5Dc-WMu2azhTuMn@J2N7QTR9R3/
The end result is that KVM loses ~100 lines of code, and becomes a bit simpler to reason about.
Quite a bit, and much more clearly too which makes the whole thing more managable and discoverable.
Reviewed-by: Mark Brown broonie@kernel.org Tested-by: Mark Brown broonie@kernel.org
although I'm not sure I was actually testing anything you weren't.
linux-stable-mirror@lists.linaro.org