Re: [PATCH] x86/lib: don't use MMX before FPU initialization

15 Jan 2021

On Thu, Jan 14, 2021 at 08:31:30AM -0800, Andy Lutomirski wrote:
...
This is gross.  I realize this is only used for old CPUs that we don't
care about perf-wise
Performance might be still important for embedded systems (Geode LX
seems to be supported "until at least 2021").
...
, but this code is nonsense -- it makes absolutely
to sense to put this absurd condition here to work around
kernel_fpu_begin() bugs.
Yes, it's nonsense. But I think that the kernel might have a silent
assumption, that the FPU cannot be used too early.
...
If we want to use MMX, we should check MMX.
And we should also check the correct condition re: irqs.  So this code
should be:
if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) {
  kernel_fpu_begin_mask(FPU_MASK_XMM);
  ...
This code does not use SSE (XMM). It uses only MMX and 3DNow!, so XMM
check is not needed here. And this code is enabled only when a processor
with 3DNow! is selected ("lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o"
in Makefile). So:
if (!irq_fpu_usable())
    	fallback_to_non_mmx_code();
is sufficient.
But the original:
if (!in_interrupt())
    	fallback_to_non_mmx_code();
is also valid. Changing it to irq_fpu_usable() might allow using
MMX variant in more cases and even improve performance. But it
can also introduce some regressions.
...
or we could improve code gen by adding a try_kernel_fpu_begin_mask()
so we can do a single call instead of two.
[...]
What do you all think?  If you're generally in favor, I can write the
code and do a quick audit for other weird cases in the kernel.
I think that the cleanest approach would be introducing fpu_usable(),
which returns 0 (or false) if FPU is not initialized yet. Then
irq_fpu_usable() could be changed to:
bool irq_fpu_usable(void)
{
    return fpu_usable() && (!in_interrupt() ||
    	interrupted_user_mode() ||
    	interrupted_kernel_fpu_idle());
}
and in the _mmx_memcpy():
-	if (unlikely(in_interrupt()))
+	if (unlikely(irq_fpu_usable()))
    	return __memcpy(to, from, len);
try_kernel_fpu_begin(), even without mask, is also a good idea.
If the FPU is not available yet (FPU not initizalized yet) it can fail
and a fallback code would be used. However, in some cases fpu_usable()
+ kernel_fpu_begin() might be better, if between fpu_usable() check
and kernel_fpu_begin() long preparation is required. (kernel_fpu_begin()
disables preemption). In the mmx_32.c case try_kernel_fpu_begin()
looks good, only two simple lines are added to a section with disabled
preemption:

diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..9c6dadd2b2ef 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -31,14 +31,12 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
    void *p;
    int i;
-	if (unlikely(in_interrupt()))
+	if (unlikely(try_kernel_fpu_begin()))
    	return __memcpy(to, from, len);
p = to;
    i = len >> 6; /* len/64 */
-	kernel_fpu_begin();
-
    __asm__ __volatile__ (
    	"1: prefetch (%0)\n"		/* This set is 28 bytes */
    	"   prefetch 64(%0)\n"
And if try_kernel_fpu_begin_mask() with a mask will allow for some
optimizations it also might be a good idea.
...
Or we could flip the OSFSXR bit very early, I suppose.
I think that my original static_branch_likely() workaround might
be the simplest and touches only mmx_32.c. Such approach is already
used in the kernel, for instance in the crypto code:
$ git grep static_branch arch/x86/crypto/
And maybe using FPU too early should be forbidden.
Best regards,
Krzysiek

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] x86/lib: don't use MMX before FPU initialization