The x86-64 memmove code has two ALTERNATIVE statements in a row, one to handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced REP MOVSB"). If either of these features is present, the goal is to jump directly to a REP MOVSB; otherwise, some setup code that handles short lengths is executed. The first comparison of a sequence of specific small sizes is included in the first ALTERNATIVE, so it will be replaced by NOPs if FSRM is set, and then (assuming ERMS is also set) execution will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.
The two ALTERNATIVE invocations can be combined into a single instance of ALTERNATIVE_2 to simplify and slightly shorten the code. If either FSRM or ERMS is set, the first instruction in the memmove_begin_forward path will be replaced with a jump to the REP MOVSB.
This also prevents a problem when FSRM is set but ERMS is not; in this case, the previous code would have replaced both ALTERNATIVEs with NOPs and skipped the first check for sizes less than 0x20 bytes. This combination of CPU features is arguably a firmware bug, but this patch makes the function robust against this badness.
Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB") Signed-off-by: Daniel Verkamp dverkamp@chromium.org --- arch/x86/lib/memmove_64.S | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 724bbf83eb5b..1fc36dbd3bdc 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -38,8 +38,10 @@ SYM_FUNC_START(__memmove)
/* FSRM implies ERMS => no length checks, do the copy directly */ .Lmemmove_begin_forward: - ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM - ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS + ALTERNATIVE_2 \ + "cmp $0x20, %rdx; jb 1f", \ + "jmp .Lmemmove_erms", X86_FEATURE_FSRM, \ + "jmp .Lmemmove_erms", X86_FEATURE_ERMS
/* * movsq instruction have many startup latency
On Fri, Jan 13, 2023 at 12:34:27PM -0800, Daniel Verkamp wrote:
The x86-64 memmove code has two ALTERNATIVE statements in a row, one to handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced REP MOVSB"). If either of these features is present, the goal is to jump directly to a REP MOVSB; otherwise, some setup code that handles short lengths is executed. The first comparison of a sequence of specific small sizes is included in the first ALTERNATIVE, so it will be replaced by NOPs if FSRM is set, and then (assuming ERMS is also set) execution will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.
The two ALTERNATIVE invocations can be combined into a single instance of ALTERNATIVE_2 to simplify and slightly shorten the code. If either FSRM or ERMS is set, the first instruction in the memmove_begin_forward path will be replaced with a jump to the REP MOVSB.
This also prevents a problem when FSRM is set but ERMS is not; in this case, the previous code would have replaced both ALTERNATIVEs with NOPs and skipped the first check for sizes less than 0x20 bytes. This combination of CPU features is arguably a firmware bug, but this patch makes the function robust against this badness.
Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB") Signed-off-by: Daniel Verkamp dverkamp@chromium.org
arch/x86/lib/memmove_64.S | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 724bbf83eb5b..1fc36dbd3bdc 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -38,8 +38,10 @@ SYM_FUNC_START(__memmove) /* FSRM implies ERMS => no length checks, do the copy directly */ .Lmemmove_begin_forward:
- ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
- ALTERNATIVE "", "jmp .Lmemmove_erms", X86_FEATURE_ERMS
- ALTERNATIVE_2 \
"cmp $0x20, %rdx; jb 1f", \
"jmp .Lmemmove_erms", X86_FEATURE_FSRM, \
"jmp .Lmemmove_erms", X86_FEATURE_ERMS
/* * movsq instruction have many startup latency -- 2.39.0.314.g84b9a713c41-goog
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
* Daniel Verkamp dverkamp@chromium.org wrote:
The x86-64 memmove code has two ALTERNATIVE statements in a row, one to handle FSRM ("Fast Short REP MOVSB"), and one to handle ERMS ("Enhanced REP MOVSB"). If either of these features is present, the goal is to jump directly to a REP MOVSB; otherwise, some setup code that handles short lengths is executed. The first comparison of a sequence of specific small sizes is included in the first ALTERNATIVE, so it will be replaced by NOPs if FSRM is set, and then (assuming ERMS is also set) execution will fall through to the JMP to a REP MOVSB in the next ALTERNATIVE.
The two ALTERNATIVE invocations can be combined into a single instance of ALTERNATIVE_2 to simplify and slightly shorten the code. If either FSRM or ERMS is set, the first instruction in the memmove_begin_forward path will be replaced with a jump to the REP MOVSB.
This also prevents a problem when FSRM is set but ERMS is not; in this case, the previous code would have replaced both ALTERNATIVEs with NOPs and skipped the first check for sizes less than 0x20 bytes. This combination of CPU features is arguably a firmware bug, but this patch makes the function robust against this badness.
Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB") Signed-off-by: Daniel Verkamp dverkamp@chromium.org
Could you please extend the changelog with the information about the guest and baremetal firmware tthat Jiri Slaby pointed out?
Ie. acknowledging Boris's point that these are technically invalid CPUID combinations, but emphasizing that the x86 memcpy routines should not behave in an undefined fashion while other OSs boot fine under the same firmware environment ...
[ Also, please Cc: lkml and don't Cc: -stable. ]
Thanks,
Ingo
linux-stable-mirror@lists.linaro.org