The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 indicates that MTE_ASYNC is supported. On a host with ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the MTE capability enabled will incorrectly see MTE_ASYNC advertised as supported. This series fixes that.
This was found by inspection and the current behaviour is not known to break anything. Linux doesn't check MTE_frac, and wrongly, assumes MTE async faults can be generated whenever MTE is supported. This is a separate problem and not addressed here.
I am looking for feedback on whether this change is valuable or otherwise.
Ben Horgan (3): arm64/sysreg: Expose MTE_frac so that it is visible to KVM KVM: arm64: Make MTE_frac masking conditional on MTE capability KVM: selftests: Confirm exposing MTE_frac does not break migration
arch/arm64/kernel/cpufeature.c | 1 + arch/arm64/kvm/sys_regs.c | 26 ++++++- .../testing/selftests/kvm/arm64/set_id_regs.c | 77 ++++++++++++++++++- 3 files changed, 101 insertions(+), 3 deletions(-)
base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444
KVM exposes the sanitised ID registers to guests. Currently these ignore the ID_AA64PFR1_EL1.MTE_frac field, meaning guests always see a value of zero.
This is a problem for platforms without the MTE_ASYNC feature where ID_AA64PFR1_EL1.MTE==0x2 and ID_AA64PFR1_EL1.MTE_frac==0xf. KVM forces MTE_frac to zero, meaning the guest believes MTE_ASYNC is supported, when no async fault will ever occur.
Before KVM can fix this, the architecture needs to sanitise the ID register field for MTE_frac.
Linux itself does not use MTE_frac field and just assumes MTE async faults can be generated if MTE is supported.
Signed-off-by: Ben Horgan ben.horgan@arm.com --- arch/arm64/kernel/cpufeature.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9c4d6d552b25..e952f4b07ce1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -298,6 +298,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_GCS), FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_GCS_SHIFT, 4, 0), + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MTE_frac_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_SME_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MPAM_frac_SHIFT, 4, 0),
If MTE_frac is masked out unconditionally then the guest will always see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf (MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as supported whilst the host does not support it. Hence, expose the sanitised value of MTE_frac to the guest and user-space.
As MTE_frac was previously hidden, always 0, and KVM must accept values from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid incorrectly claiming hardware support for MTE_ASYNC in the guest.
Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and wrongly assumes that MTE async faults can be generated even on hardware that does nto support them. This issue is not addressed here.
Signed-off-by: Ben Horgan ben.horgan@arm.com --- arch/arm64/kvm/sys_regs.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 005ad28f7306..9ae647082684 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1600,13 +1600,14 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, val = sanitise_id_aa64pfr0_el1(vcpu, val); break; case SYS_ID_AA64PFR1_EL1: - if (!kvm_has_mte(vcpu->kvm)) + if (!kvm_has_mte(vcpu->kvm)) { val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac); + }
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_RNDR_trap); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_NMI); - val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_GCS); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_THE); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX); @@ -1953,11 +1954,32 @@ static int set_id_aa64pfr1_el1(struct kvm_vcpu *vcpu, { u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); u64 mpam_mask = ID_AA64PFR1_EL1_MPAM_frac_MASK; + u8 mte = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE, hw_val); + u8 user_mte_frac = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE_frac, user_val);
/* See set_id_aa64pfr0_el1 for comment about MPAM */ if ((hw_val & mpam_mask) == (user_val & mpam_mask)) user_val &= ~ID_AA64PFR1_EL1_MPAM_frac_MASK;
+ /* + * Previously MTE_frac was hidden from guest. However, if the + * hardware supports MTE2 but not MTE_ASYM_FAULT then a value + * of 0 for this field indicates that the hardware supports + * MTE_ASYNC. Whereas, 0xf indicates MTE_ASYNC is not supported. + * + * As KVM must accept values from KVM provided by user-space, + * when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set + * ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid + * incorrectly claiming hardware support for MTE_ASYNC in the + * guest. + */ + + if (mte == ID_AA64PFR1_EL1_MTE_MTE2 && + user_mte_frac == ID_AA64PFR1_EL1_MTE_frac_ASYNC) { + user_val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK; + user_val |= hw_val & ID_AA64PFR1_EL1_MTE_frac_MASK; + } + return set_id_reg(vcpu, rd, user_val); }
On Mon, 14 Apr 2025 13:40:58 +0100, Ben Horgan ben.horgan@arm.com wrote:
If MTE_frac is masked out unconditionally then the guest will always see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf (MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as supported whilst the host does not support it. Hence, expose the sanitised value of MTE_frac to the guest and user-space.
As MTE_frac was previously hidden, always 0, and KVM must accept values from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid incorrectly claiming hardware support for MTE_ASYNC in the guest.
Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and wrongly assumes that MTE async faults can be generated even on hardware that does nto support them. This issue is not addressed here.
Signed-off-by: Ben Horgan ben.horgan@arm.com
arch/arm64/kvm/sys_regs.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 005ad28f7306..9ae647082684 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1600,13 +1600,14 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, val = sanitise_id_aa64pfr0_el1(vcpu, val); break; case SYS_ID_AA64PFR1_EL1:
if (!kvm_has_mte(vcpu->kvm))
if (!kvm_has_mte(vcpu->kvm)) { val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac);
}
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_RNDR_trap); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_NMI);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_GCS); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_THE); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX);val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac);
@@ -1953,11 +1954,32 @@ static int set_id_aa64pfr1_el1(struct kvm_vcpu *vcpu, { u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); u64 mpam_mask = ID_AA64PFR1_EL1_MPAM_frac_MASK;
- u8 mte = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE, hw_val);
- u8 user_mte_frac = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE_frac, user_val);
/* See set_id_aa64pfr0_el1 for comment about MPAM */ if ((hw_val & mpam_mask) == (user_val & mpam_mask)) user_val &= ~ID_AA64PFR1_EL1_MPAM_frac_MASK;
- /*
* Previously MTE_frac was hidden from guest. However, if the
* hardware supports MTE2 but not MTE_ASYM_FAULT then a value
* of 0 for this field indicates that the hardware supports
* MTE_ASYNC. Whereas, 0xf indicates MTE_ASYNC is not supported.
*
* As KVM must accept values from KVM provided by user-space,
* when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set
* ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid
* incorrectly claiming hardware support for MTE_ASYNC in the
* guest.
*/
- if (mte == ID_AA64PFR1_EL1_MTE_MTE2 &&
The spec says that MTE_frac is valid if ID_AA64PFR1_EL1.MTE >= 0b0010. Not strictly equal to 0b0010 (which represents MTE2). Crucially, MTE3 should receive the same treatment.
user_mte_frac == ID_AA64PFR1_EL1_MTE_frac_ASYNC) {
user_val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK;
user_val |= hw_val & ID_AA64PFR1_EL1_MTE_frac_MASK;
This means you are unconditionally propagating what the HW supports, which feels dodgy, specially considering that we don't know how MTE_frac is going to evolve in the future.
I think you should limit the fix to the exact case we're mitigating here, not blindly overwrite the guest's view with the HW's capability.
Thanks,
M.
When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2) ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported and MTE_frac == 0 indicates it is supported.
As MTE_frac was previously unconditionally read as 0 from the guest and user-space, check that using SET_ONE_REG to set it to 0 succeeds but does not change MTE_frac from unsupported (0xF) to supported (0). This is required as values originating from KVM from user-space must be accepted to avoid breaking migration.
Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE for the set_id_regs test. No effect on existing tests is expected.
Signed-off-by: Ben Horgan ben.horgan@arm.com --- .../testing/selftests/kvm/arm64/set_id_regs.c | 77 ++++++++++++++++++- 1 file changed, 76 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/arm64/set_id_regs.c b/tools/testing/selftests/kvm/arm64/set_id_regs.c index 322b9d3b0125..34f4174e7285 100644 --- a/tools/testing/selftests/kvm/arm64/set_id_regs.c +++ b/tools/testing/selftests/kvm/arm64/set_id_regs.c @@ -15,6 +15,8 @@ #include "test_util.h" #include <linux/bitfield.h>
+bool have_cap_arm_mte; + enum ftr_type { FTR_EXACT, /* Use a predefined safe value */ FTR_LOWER_SAFE, /* Smaller value is safe */ @@ -543,6 +545,70 @@ static void test_user_set_mpam_reg(struct kvm_vcpu *vcpu) ksft_test_result_fail("ID_AA64PFR1_EL1.MPAM_frac value should not be ignored\n"); }
+#define MTE_IDREG_TEST 1 +static void test_user_set_mte_reg(struct kvm_vcpu *vcpu) +{ + uint64_t masks[KVM_ARM_FEATURE_ID_RANGE_SIZE]; + struct reg_mask_range range = { + .addr = (__u64)masks, + }; + uint64_t val; + uint64_t mte; + uint64_t mte_frac; + int idx, err; + + if (!have_cap_arm_mte) { + ksft_test_result_skip("MTE capability not supported, nothing to test\n"); + return; + } + + /* Get writable masks for feature ID registers */ + memset(range.reserved, 0, sizeof(range.reserved)); + vm_ioctl(vcpu->vm, KVM_ARM_GET_REG_WRITABLE_MASKS, &range); + + idx = encoding_to_range_idx(SYS_ID_AA64PFR1_EL1); + if ((masks[idx] & ID_AA64PFR1_EL1_MTE_frac_MASK) == ID_AA64PFR1_EL1_MTE_frac_MASK) { + ksft_test_result_skip("ID_AA64PFR1_EL1.MTE_frac is officially writable, nothing to test\n"); + return; + } + + /* + * When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2) + * ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported + * and MTE_frac == 0 indicates it is supported. + * + * As MTE_frac was previously unconditionally read as 0, check + * that the set to 0 succeeds but does not change MTE_frac + * from unsupported (0xF) to supported (0). + * + */ + val = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1)); + + mte = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE), val); + mte_frac = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac), val); + if (mte != ID_AA64PFR1_EL1_MTE_MTE2 || + mte_frac != ID_AA64PFR1_EL1_MTE_frac_NI) { + ksft_test_result_skip("MTE_ASYNC or MTE_ASYMM are supported, nothing to test\n"); + return; + } + + /* Try to set MTE_frac=0. */ + val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK; + val |= FIELD_PREP(ID_AA64PFR1_EL1_MTE_frac_MASK, 0); + err = __vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1), val); + if (err) { + ksft_test_result_fail("ID_AA64PFR1_EL1.MTE_frac=0 was not accepted\n"); + return; + } + + val = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1)); + mte_frac = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac), val); + if (mte_frac == ID_AA64PFR1_EL1_MTE_frac_NI) + ksft_test_result_pass("ID_AA64PFR1_EL1.MTE_frac=0 accepted and still 0xF\n"); + else + ksft_test_result_pass("ID_AA64PFR1_EL1.MTE_frac no longer 0xF\n"); +} + static void test_guest_reg_read(struct kvm_vcpu *vcpu) { bool done = false; @@ -673,6 +739,14 @@ static void test_reset_preserves_id_regs(struct kvm_vcpu *vcpu) ksft_test_result_pass("%s\n", __func__); }
+void kvm_arch_vm_post_create(struct kvm_vm *vm) +{ + if (vm_check_cap(vm, KVM_CAP_ARM_MTE)) { + vm_enable_cap(vm, KVM_CAP_ARM_MTE, 0); + have_cap_arm_mte = true; + } +} + int main(void) { struct kvm_vcpu *vcpu; @@ -701,7 +775,7 @@ int main(void) ARRAY_SIZE(ftr_id_aa64pfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + ARRAY_SIZE(ftr_id_aa64mmfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + ARRAY_SIZE(ftr_id_aa64zfr0_el1) - ARRAY_SIZE(test_regs) + 3 + - MPAM_IDREG_TEST; + MPAM_IDREG_TEST + MTE_IDREG_TEST;
ksft_set_plan(test_cnt);
@@ -709,6 +783,7 @@ int main(void) test_vcpu_ftr_id_regs(vcpu); test_vcpu_non_ftr_id_regs(vcpu); test_user_set_mpam_reg(vcpu); + test_user_set_mte_reg(vcpu);
test_guest_reg_read(vcpu);
linux-kselftest-mirror@lists.linaro.org