On Thu, Jan 14, 2021 at 08:31:30AM -0800, Andy Lutomirski wrote:
This is gross. I realize this is only used for old CPUs that we don't care about perf-wise
Performance might be still important for embedded systems (Geode LX seems to be supported "until at least 2021").
, but this code is nonsense -- it makes absolutely to sense to put this absurd condition here to work around kernel_fpu_begin() bugs.
Yes, it's nonsense. But I think that the kernel might have a silent assumption, that the FPU cannot be used too early.
If we want to use MMX, we should check MMX. And we should also check the correct condition re: irqs. So this code should be:
if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) { kernel_fpu_begin_mask(FPU_MASK_XMM); ...
This code does not use SSE (XMM). It uses only MMX and 3DNow!, so XMM check is not needed here. And this code is enabled only when a processor with 3DNow! is selected ("lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o" in Makefile). So:
if (!irq_fpu_usable()) fallback_to_non_mmx_code();
is sufficient.
But the original:
if (!in_interrupt()) fallback_to_non_mmx_code();
is also valid. Changing it to irq_fpu_usable() might allow using MMX variant in more cases and even improve performance. But it can also introduce some regressions.
or we could improve code gen by adding a try_kernel_fpu_begin_mask() so we can do a single call instead of two. [...] What do you all think? If you're generally in favor, I can write the code and do a quick audit for other weird cases in the kernel.
I think that the cleanest approach would be introducing fpu_usable(), which returns 0 (or false) if FPU is not initialized yet. Then irq_fpu_usable() could be changed to:
bool irq_fpu_usable(void) { return fpu_usable() && (!in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle()); }
and in the _mmx_memcpy():
- if (unlikely(in_interrupt())) + if (unlikely(irq_fpu_usable())) return __memcpy(to, from, len);
try_kernel_fpu_begin(), even without mask, is also a good idea. If the FPU is not available yet (FPU not initizalized yet) it can fail and a fallback code would be used. However, in some cases fpu_usable() + kernel_fpu_begin() might be better, if between fpu_usable() check and kernel_fpu_begin() long preparation is required. (kernel_fpu_begin() disables preemption). In the mmx_32.c case try_kernel_fpu_begin() looks good, only two simple lines are added to a section with disabled preemption:
diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c index 4321fa02e18d..9c6dadd2b2ef 100644 --- a/arch/x86/lib/mmx_32.c +++ b/arch/x86/lib/mmx_32.c @@ -31,14 +31,12 @@ void *_mmx_memcpy(void *to, const void *from, size_t len) void *p; int i;
- if (unlikely(in_interrupt())) + if (unlikely(try_kernel_fpu_begin())) return __memcpy(to, from, len);
p = to; i = len >> 6; /* len/64 */
- kernel_fpu_begin(); - __asm__ __volatile__ ( "1: prefetch (%0)\n" /* This set is 28 bytes */ " prefetch 64(%0)\n"
And if try_kernel_fpu_begin_mask() with a mask will allow for some optimizations it also might be a good idea.
Or we could flip the OSFSXR bit very early, I suppose.
I think that my original static_branch_likely() workaround might be the simplest and touches only mmx_32.c. Such approach is already used in the kernel, for instance in the crypto code:
$ git grep static_branch arch/x86/crypto/
And maybe using FPU too early should be forbidden.
Best regards, Krzysiek