As promised on the list [0], this series aims to backport 3 upstream commits [1,2,3] into 5.12-stable tree.
Patch #1 is already in the queue and therefore not included. Patch #2 can be applied now by manually adding the __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc macro (please review). Patch #3 can be applied cleanly then (after #2).
I've slightly tested it on my 920 (boot test and the whole kvm-unit-tests), on top of the latest linux-stable-rc/linux-5.12.y. Please consider taking them for 5.12-stable.
[0] https://lore.kernel.org/r/0d9f123c-e9f7-7481-143d-efd488873082@huawei.com [1] https://git.kernel.org/torvalds/c/f5e30680616a [2] https://git.kernel.org/torvalds/c/26778aaa134a [3] https://git.kernel.org/torvalds/c/e3e880bb1518
Marc Zyngier (1): KVM: arm64: Commit pending PC adjustemnts before returning to userspace
Zenghui Yu (1): KVM: arm64: Resolve all pending PC updates before immediate exit
arch/arm64/include/asm/kvm_asm.h | 1 + arch/arm64/kvm/arm.c | 20 +++++++++++++++++--- arch/arm64/kvm/hyp/exception.c | 4 ++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++ 4 files changed, 28 insertions(+), 5 deletions(-)
From: Marc Zyngier maz@kernel.org
commit 26778aaa134a9aefdf5dbaad904054d7be9d656d upstream.
KVM currently updates PC (and the corresponding exception state) using a two phase approach: first by setting a set of flags, then by converting these flags into a state update when the vcpu is about to enter the guest.
However, this creates a disconnect with userspace if the vcpu thread returns there with any exception/PC flag set. In this case, the exposed context is wrong, as userspace doesn't have access to these flags (they aren't architectural). It also means that these flags are preserved across a reset, which isn't expected.
To solve this problem, force an explicit synchronisation of the exception state on vcpu exit to userspace. As an optimisation for nVHE systems, only perform this when there is something pending.
Reported-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Alexandru Elisei alexandru.elisei@arm.com Reviewed-by: Zenghui Yu yuzenghui@huawei.com Tested-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org # 5.11 [yuz: stable-5.12.y backport: add __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc macro manually and keep it consistent with mainline] Signed-off-by: Zenghui Yu yuzenghui@huawei.com --- arch/arm64/include/asm/kvm_asm.h | 1 + arch/arm64/kvm/arm.c | 11 +++++++++++ arch/arm64/kvm/hyp/exception.c | 4 ++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++ 4 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index a8578d650bb6..d7f769bb6c9c 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -57,6 +57,7 @@ #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2 12 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs 13 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs 14 +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc 21
#ifndef __ASSEMBLY__
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 84b5f79c9eab..c18740a1e541 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -892,6 +892,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
kvm_sigset_deactivate(vcpu);
+ /* + * In the unlikely event that we are returning to userspace + * with pending exceptions or PC adjustment, commit these + * adjustments in order to give userspace a consistent view of + * the vcpu state. Note that this relies on __kvm_adjust_pc() + * being preempt-safe on VHE. + */ + if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION | + KVM_ARM64_INCREMENT_PC))) + kvm_call_hyp(__kvm_adjust_pc, vcpu); + vcpu_put(vcpu); return ret; } diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c index 0812a496725f..11541b94b328 100644 --- a/arch/arm64/kvm/hyp/exception.c +++ b/arch/arm64/kvm/hyp/exception.c @@ -331,8 +331,8 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu) }
/* - * Adjust the guest PC on entry, depending on flags provided by EL1 - * for the purpose of emulation (MMIO, sysreg) or exception injection. + * Adjust the guest PC (and potentially exception state) depending on + * flags provided by the emulation code. */ void __kvm_adjust_pc(struct kvm_vcpu *vcpu) { diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index 936328207bde..e52582e14087 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -25,6 +25,13 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt) cpu_reg(host_ctxt, 1) = __kvm_vcpu_run(kern_hyp_va(vcpu)); }
+static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt) +{ + DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1); + + __kvm_adjust_pc(kern_hyp_va(vcpu)); +} + static void handle___kvm_flush_vm_context(struct kvm_cpu_context *host_ctxt) { __kvm_flush_vm_context(); @@ -112,6 +119,7 @@ typedef void (*hcall_t)(struct kvm_cpu_context *);
static const hcall_t host_hcall[] = { HANDLE_FUNC(__kvm_vcpu_run), + HANDLE_FUNC(__kvm_adjust_pc), HANDLE_FUNC(__kvm_flush_vm_context), HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa), HANDLE_FUNC(__kvm_tlb_flush_vmid),
Hi Zenghui,
Thanks for having a go at the backport.
On Tue, 01 Jun 2021 12:12:37 +0100, Zenghui Yu yuzenghui@huawei.com wrote:
From: Marc Zyngier maz@kernel.org
commit 26778aaa134a9aefdf5dbaad904054d7be9d656d upstream.
KVM currently updates PC (and the corresponding exception state) using a two phase approach: first by setting a set of flags, then by converting these flags into a state update when the vcpu is about to enter the guest.
However, this creates a disconnect with userspace if the vcpu thread returns there with any exception/PC flag set. In this case, the exposed context is wrong, as userspace doesn't have access to these flags (they aren't architectural). It also means that these flags are preserved across a reset, which isn't expected.
To solve this problem, force an explicit synchronisation of the exception state on vcpu exit to userspace. As an optimisation for nVHE systems, only perform this when there is something pending.
Reported-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Alexandru Elisei alexandru.elisei@arm.com Reviewed-by: Zenghui Yu yuzenghui@huawei.com Tested-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org # 5.11 [yuz: stable-5.12.y backport: add __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc macro manually and keep it consistent with mainline]
I'd rather you allocated a new number here, irrespective of what mainline has (rational below).
Signed-off-by: Zenghui Yu yuzenghui@huawei.com
arch/arm64/include/asm/kvm_asm.h | 1 + arch/arm64/kvm/arm.c | 11 +++++++++++ arch/arm64/kvm/hyp/exception.c | 4 ++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++ 4 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index a8578d650bb6..d7f769bb6c9c 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -57,6 +57,7 @@ #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2 12 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs 13 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs 14 +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc 21
This is going to generate a larger than necessary host_hcall array in hyp/nvhe/hyp-main.c, which we're trying to keep tightly packed for obvious reasons.
With this nit fixed:
Reviewed-by: Marc Zyngier maz@kernel.org
Thanks,
M.
Hi Marc,
On 2021/6/1 19:44, Marc Zyngier wrote:
Hi Zenghui,
Thanks for having a go at the backport.
On Tue, 01 Jun 2021 12:12:37 +0100, Zenghui Yu yuzenghui@huawei.com wrote:
From: Marc Zyngier maz@kernel.org
commit 26778aaa134a9aefdf5dbaad904054d7be9d656d upstream.
KVM currently updates PC (and the corresponding exception state) using a two phase approach: first by setting a set of flags, then by converting these flags into a state update when the vcpu is about to enter the guest.
However, this creates a disconnect with userspace if the vcpu thread returns there with any exception/PC flag set. In this case, the exposed context is wrong, as userspace doesn't have access to these flags (they aren't architectural). It also means that these flags are preserved across a reset, which isn't expected.
To solve this problem, force an explicit synchronisation of the exception state on vcpu exit to userspace. As an optimisation for nVHE systems, only perform this when there is something pending.
Reported-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Alexandru Elisei alexandru.elisei@arm.com Reviewed-by: Zenghui Yu yuzenghui@huawei.com Tested-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org # 5.11 [yuz: stable-5.12.y backport: add __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc macro manually and keep it consistent with mainline]
I'd rather you allocated a new number here, irrespective of what mainline has (rational below).
Signed-off-by: Zenghui Yu yuzenghui@huawei.com
arch/arm64/include/asm/kvm_asm.h | 1 + arch/arm64/kvm/arm.c | 11 +++++++++++ arch/arm64/kvm/hyp/exception.c | 4 ++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++ 4 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index a8578d650bb6..d7f769bb6c9c 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -57,6 +57,7 @@ #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2 12 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs 13 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs 14 +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc 21
This is going to generate a larger than necessary host_hcall array in hyp/nvhe/hyp-main.c, which we're trying to keep tightly packed for obvious reasons.
It isn't obvious to me ;-). But this creates some invalid entries (HVC handlers) in the host_hcall array, which is not good. I'll change __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc to 15. Thanks for your reminder.
With this nit fixed:
Reviewed-by: Marc Zyngier maz@kernel.org
Thanks!
Zenghui
commit e3e880bb1518eb10a4b4bb4344ed614d6856f190 upstream.
Commit 26778aaa134a ("KVM: arm64: Commit pending PC adjustemnts before returning to userspace") fixed the PC updating issue by forcing an explicit synchronisation of the exception state on vcpu exit to userspace.
However, we forgot to take into account the case where immediate_exit is set by userspace and KVM_RUN will exit immediately. Fix it by resolving all pending PC updates before returning to userspace.
Since __kvm_adjust_pc() relies on a loaded vcpu context, I moved the immediate_exit checking right after vcpu_load(). We will get some overhead if immediate_exit is true (which should hopefully be rare).
Fixes: 26778aaa134a ("KVM: arm64: Commit pending PC adjustemnts before returning to userspace") Signed-off-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20210526141831.1662-1-yuzenghui@huawei.com Cc: stable@vger.kernel.org # 5.11 --- arch/arm64/kvm/arm.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index c18740a1e541..7730b81aad6d 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -715,11 +715,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) return ret; }
- if (run->immediate_exit) - return -EINTR; - vcpu_load(vcpu);
+ if (run->immediate_exit) { + ret = -EINTR; + goto out; + } + kvm_sigset_activate(vcpu);
ret = 1; @@ -892,6 +894,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
kvm_sigset_deactivate(vcpu);
+out: /* * In the unlikely event that we are returning to userspace * with pending exceptions or PC adjustment, commit these
On Tue, 01 Jun 2021 12:12:38 +0100, Zenghui Yu yuzenghui@huawei.com wrote:
commit e3e880bb1518eb10a4b4bb4344ed614d6856f190 upstream.
Commit 26778aaa134a ("KVM: arm64: Commit pending PC adjustemnts before returning to userspace") fixed the PC updating issue by forcing an explicit synchronisation of the exception state on vcpu exit to userspace.
However, we forgot to take into account the case where immediate_exit is set by userspace and KVM_RUN will exit immediately. Fix it by resolving all pending PC updates before returning to userspace.
Since __kvm_adjust_pc() relies on a loaded vcpu context, I moved the immediate_exit checking right after vcpu_load(). We will get some overhead if immediate_exit is true (which should hopefully be rare).
Fixes: 26778aaa134a ("KVM: arm64: Commit pending PC adjustemnts before returning to userspace") Signed-off-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20210526141831.1662-1-yuzenghui@huawei.com Cc: stable@vger.kernel.org # 5.11
Reviewed-by: Marc Zyngier maz@kernel.org
M.
linux-stable-mirror@lists.linaro.org