On Mon, Feb 21, 2022 at 07:24:59PM +0000, Catalin Marinas wrote:
On Mon, Feb 21, 2022 at 03:01:03PM +0000, Mark Brown wrote:
There's going to be a ordering/clarity issue whatever way round we do it
- the FA64 feature bit is in a different feature register to the main
SME feature bitfield and it's not as abundantly clear as might be ideal that it will have been sanitised when we're getting callbacks for the main SME feature, there's an awful lot of sharp edges with this code. Having things this way round felt more idiomatic to me.
You may want to add a comment in the cpu_feature[] array that it should be placed after SME.
Sure.
We do run the kernel in streaming mode - entering the kernel through a syscall or preemption will not change the streaming mode state, and we need to be in streaming mode in order to save or restore the register state for streaming mode. In particular we need FA64 enabled for EL1 in order to context switch FFR when in streaming mode, without it we'll generate an exception when we execute the rdffr or wrffr. We don't do any real floating point work in streaming mode but we absolutely need to run in streaming mode and only exit streaming mode when restoring a context where it is disabled, when using floating point in the kernel or when idling the CPU.
So, IIUC, for Linux it is mandatory that FEAT_SME_FA64 is supported, otherwise we won't be able to enable SME. Does the architecture say
The feature is not mandatory and we do not require it for Linux. It is expected that many implementations will choose to not support FA64.
The only impact it has on the kernel is that if it's present then we need to enable it for each EL and then context switch FFR in streaming mode, the code is there to do that conditionally already. We'd also have to take it into account if we were to run streaming mode algorithms in the kernel but if we ever do so that's just an additional feature check when choosing to run such code.
this feature as optional? Which A64 instructions are not available if FA64 is disabled? I hope it's only the SVE ones but I thought we can still do load/store of the state even with FA64 disabled.
There's a rather large subset of mostly FPSIMD and some SVE instructions (including those for accessing FFR which is why we don't need to context switch it in streaming mode), you can see a full list in appendix F1 of the SME specification.
This is actually a bit awkward for not disabling streaming mode when we do a syscall since the disabled instructions include the FPSMID mov vector, vector instruction which we currently use to zero the high bits of the Z registers. That issue goes away if the optimisations I've got for relaxed flushing of the non-shared SVE state that we discussed in relation to syscall-abi get merged, though it'd still be there if we add a sysctl to force flushing. This is a solvable problem though, even if we have to use a less efficient sequence to flush in streaming mode.
Anyway, if we can't even context switch without FA64 while in streaming mode, I think we should move the check in the main SME .matches function and enable it in sme_kernel_enable(), no need for an additional feature.
Given that it's optional and we need to check for it at runtime in order to context switch it seems sensible to use the cpufeature infrastructure for the detection.
I think we should also update booting.rst to require that the FA64 is enabled at EL2 and EL3.
That's there already since d198c77b7fab13d4 ("arm64: Document boot requirements for FEAT_SME_FA64").