On Tue, 2 Sept 2025 at 12:45, Mark Brown broonie@kernel.org wrote:
SME, the Scalable Matrix Extension, is an arm64 extension which adds support for matrix operations, with core concepts patterned after SVE.
Hi; apologies for not having got round to looking at this earlier.
I haven't actually tried writing any code that uses this proposed ABI, but mostly it looks OK to me. I have a few nits below, but my main concern is the bits of text that say (or seem to say -- maybe I'm misinterpreting them) that various parts of how userspace accesses the guest state (e.g. the fp regs) depend on the current state of the vcpu, rather than being only a function of how the vcpu was configured. That seems to me like it's unnecessarily awkward. (More detail below.)
If SME is enabled for a guest without SVE then the FPSIMD Vn registers must be accessed via the low 128 bits of the SVE Zn registers as is the case when SVE is enabled. This is not ideal but allows access to SVCR and the registers in any order without duplication or ambiguity about which values should take effect. This may be an issue for VMMs that are unaware of SME on systems that implement it without SVE if they let SME be enabled, the lack of access to Vn may surprise them, but it seems like an unusual implementation choice.
For SME unware VMMs on systems with both SVE and SME support the SVE registers may be larger than expected, this should be less disruptive than on a system without SVE as they will simply ignore the high bits of the registers.
I think that since enabling SME is something the VMM has to actively do, it isn't a big deal that they also need to do something in the fp or sve register access codepaths to handle SME. You can't get SME by surprise (same as you can't get SVE by surprise).
Signed-off-by: Mark Brown broonie@kernel.org
Documentation/virt/kvm/api.rst | 115 +++++++++++++++++++++++++++++------------ 1 file changed, 81 insertions(+), 34 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 6aa40ee05a4a..94a22407a1d4 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -406,7 +406,7 @@ Errors: instructions from device memory (arm64) ENOSYS data abort outside memslots with no syndrome info and KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
- EPERM SVE feature set but not finalized (arm64)
- EPERM SVE or SME feature set but not finalized (arm64) ======= ==============================================================
This ioctl is used to run a guest virtual cpu. While there are no @@ -2601,11 +2601,11 @@ Specifically: ======================= ========= ===== =======================================
.. [1] These encodings are not accepted for SVE-enabled vcpus. See
:ref:`KVM_ARM_VCPU_INIT`.
:ref:`KVM_ARM_VCPU_INIT`. They are also not accepted when SME isenabled without SVE and the vcpu is in streaming mode.
Does this mean that on an SME-no-SVE VM the VMM needs to know if the vcpu is currently in streaming mode or not to determine whether to read the FP registers as fp_regs or sve regs? That seems unpleasant -- I was expecting this to be strictly a matter of how the VM was configured (as it is with SVE).
The equivalent register content can be accessed via bits [127:0] of
the corresponding SVE Zn registers instead for vcpus that have SVEenabled (see below).
the corresponding SVE Zn registers in these cases (see below).arm64 CCSIDR registers are demultiplexed by CSSELR value::
@@ -2636,24 +2636,34 @@ arm64 SVE registers have the following bit patterns:: 0x6050 0000 0015 060 slice:5 FFR bits[256*slice + 255 : 256*slice] 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register
-Access to register IDs where 2048 * slice >= 128 * max_vq will fail with -ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit -quadwords: see [2]_ below. +arm64 SME registers have the following bit patterns:
- 0x6080 0000 0017 00 <n:5> slice:5 ZA.H[n] bits[2048*slice + 2047 : 2048*slice]
- 0x60XX 0000 0017 0100 ZT0
What's the XX here ?
- 0x6060 0000 0017 fffe KVM_REG_ARM64_SME_VLS pseudo-register
+Access to Z, P or ZA register IDs where 2048 * slice >= 128 * max_vq +will fail with ENOENT. max_vq is the vcpu's maximum supported vector +length in 128-bit quadwords: see [2]_ below.
What about FFR registers ? Is their ENOENT condition the same, or different?
+Access to the ZA and ZT0 registers is only available if SVCR.ZA is set +to 1.
These registers are only accessible on vcpus for which SVE is enabled. See KVM_ARM_VCPU_INIT for details.
-In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not -accessible until the vcpu's SVE configuration has been finalized -using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT -and KVM_ARM_VCPU_FINALIZE for more information about this procedure. +In addition, except for KVM_REG_ARM64_SVE_VLS and +KVM_REG_ARM64_SME_VLS, these registers are not accessible until the +vcpu's SVE and SME configuration has been finalized using +KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_VEC). See KVM_ARM_VCPU_INIT and +KVM_ARM_VCPU_FINALIZE for more information about this procedure.
-KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector -lengths supported by the vcpu to be discovered and configured by -userspace. When transferred to or from user memory via KVM_GET_ONE_REG -or KVM_SET_ONE_REG, the value of this register is of type -__u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as -follows:: +KVM_REG_ARM64_SVE_VLS and KVM_ARM64_VCPU_SME_VLS are pseudo-registers +that allows the set of vector lengths supported by the vcpu to be +discovered and configured by userspace. When transferred to or from +user memory via KVM_GET_ONE_REG or KVM_SET_ONE_REG, the value of this +register is of type __u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the +set of vector lengths as follows::
__u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS];
@@ -2665,19 +2675,25 @@ follows:: /* Vector length vq * 16 bytes not supported */
.. [2] The maximum value vq for which the above condition is true is
max_vq. This is the maximum vector length available to the guest onthis vcpu, and determines which register slices are visible throughthis ioctl interface.
max_vq. This is the maximum vector length currently available tothe guest on this vcpu, and determines which register slices arevisible through this ioctl interface.If SME is supported then the max_vq used for the Z and P registerswhile SVCR.SM is 1 this vector length will be the maximum SMEvector length available for the guest, otherwise it will be themaximum SVE vector length available.
I can't figure out what this paragraph is trying to say, partly because it seems like it might be missing some text between "is 1" and "this vector length".
In any case, the "while SVCR.SM is 1" part seems odd -- I don't think this ABI should care about the runtime vcpu state, only what the vcpu's max vector lengths were configured as. My expectation would be that the max_vq for VMM register access would be the maximum of the SVE and SME vector lengths configured for the vcpu.
thanks -- PMM