The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
Possible dependencies:
50bcceb7724e ("x86/pm: Add enumeration check before spec MSRs save/restore setup") 2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume") e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50bcceb7724e471d9b591803889df45dcbb584bc Mon Sep 17 00:00:00 2001 From: Pawan Gupta pawan.kumar.gupta@linux.intel.com Date: Tue, 15 Nov 2022 11:17:06 -0800 Subject: [PATCH] x86/pm: Add enumeration check before spec MSRs save/restore setup
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need to be saved and restored at hibernate and resume. However, it has zero awareness of CPU support for these MSRs. It mostly works by unconditionally attempting to manipulate these MSRs and relying on rdmsrl_safe() being able to handle a #GP on CPUs where the support is unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR while writes (WRMSR) are not. In this case, msr_build_context() sees a successful read (RDMSR) and marks the MSR as valid. Then, later, a write (WRMSR) fails, producing a nasty (but harmless) error message. This causes restore_processor_state() to try and restore it, but writing this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \ at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20) Call Trace: <TASK> restore_processor_state x86_acpi_suspend_lowlevel acpi_suspend_enter suspend_devices_and_enter pm_suspend.cold state_store kernfs_fop_write_iter vfs_write ksys_write do_syscall_64 ? do_syscall_64 ? up_read ? lock_is_held_type ? asm_exc_page_fault ? lockdep_hardirqs_on entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid trying to manipulate the MSR when the feature bit is clear. This required adding a X86_FEATURE bit for MSRs that do not have one already, but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup") Reported-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.166853973...
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 4cd39f304e20..93ae33248f42 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -513,16 +513,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void) { - u32 spec_msr_id[] = { - MSR_IA32_SPEC_CTRL, - MSR_IA32_TSX_CTRL, - MSR_TSX_FORCE_ABORT, - MSR_IA32_MCU_OPT_CTRL, - MSR_AMD64_LS_CFG, - MSR_AMD64_DE_CFG, + struct msr_enumeration { + u32 msr_no; + u32 feature; + } msr_enum[] = { + { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL }, + { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL }, + { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT }, + { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL }, + { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD }, + { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC }, }; + int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id)); + for (i = 0; i < ARRAY_SIZE(msr_enum); i++) { + if (boot_cpu_has(msr_enum[i].feature)) + msr_build_context(&msr_enum[i].msr_no, 1); + } }
static int pm_check_save_msr(void)
On Wed, Nov 30, 2022 at 03:11:09PM -0800, Pawan Gupta wrote:
On Wed, Nov 30, 2022 at 01:31:56PM +0100, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.10-stable tree.
Patch is applying cleanly on v5.10.156. Is it not where it should be applied?
Did you try building with the patch applied? :(
Same observation for failed patch on 5.4.
Same on 5.4, breaks the build, right?
I don't have special emails for "this applied but did not build" as that's more rare.
thanks,
greg k-h
commit aaa65d17eec372c6a9756833f3964ba05b05ea14 upstream.
Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES. This is different from how other CPU features are enumerated i.e. via CPUID. Currently, a call to tsx_ctrl_is_supported() is required for enumerating the feature. In the absence of a feature bit for TSX control, any code that relies on checking feature bits directly will not work.
In preparation for adding a feature bit check in MSR save/restore during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when MSR_IA32_TSX_CTRL is present.
[ bp: Remove tsx_ctrl_is_supported()]
[Pawan: Resolved conflicts in backport; Removed parts of commit message referring to removed function tsx_ctrl_is_supported()]
Suggested-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.166853973... --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/tsx.c | 33 +++++++++++++----------------- 2 files changed, 15 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index f507ad7c7fd7..1fcda8263554 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -300,6 +300,7 @@ #define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */ #define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */ #define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */ +#define X86_FEATURE_MSR_TSX_CTRL (11*32+18) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c index e2ad30e474f8..da06bbb5d68e 100644 --- a/arch/x86/kernel/cpu/tsx.c +++ b/arch/x86/kernel/cpu/tsx.c @@ -58,24 +58,6 @@ void tsx_enable(void) wrmsrl(MSR_IA32_TSX_CTRL, tsx); }
-static bool __init tsx_ctrl_is_supported(void) -{ - u64 ia32_cap = x86_read_arch_cap_msr(); - - /* - * TSX is controlled via MSR_IA32_TSX_CTRL. However, support for this - * MSR is enumerated by ARCH_CAP_TSX_MSR bit in MSR_IA32_ARCH_CAPABILITIES. - * - * TSX control (aka MSR_IA32_TSX_CTRL) is only available after a - * microcode update on CPUs that have their MSR_IA32_ARCH_CAPABILITIES - * bit MDS_NO=1. CPUs with MDS_NO=0 are not planned to get - * MSR_IA32_TSX_CTRL support even after a microcode update. Thus, - * tsx= cmdline requests will do nothing on CPUs without - * MSR_IA32_TSX_CTRL support. - */ - return !!(ia32_cap & ARCH_CAP_TSX_CTRL_MSR); -} - static enum tsx_ctrl_states x86_get_tsx_auto_mode(void) { if (boot_cpu_has_bug(X86_BUG_TAA)) @@ -89,9 +71,22 @@ void __init tsx_init(void) char arg[5] = {}; int ret;
- if (!tsx_ctrl_is_supported()) + /* + * TSX is controlled via MSR_IA32_TSX_CTRL. However, support for this + * MSR is enumerated by ARCH_CAP_TSX_MSR bit in MSR_IA32_ARCH_CAPABILITIES. + * + * TSX control (aka MSR_IA32_TSX_CTRL) is only available after a + * microcode update on CPUs that have their MSR_IA32_ARCH_CAPABILITIES + * bit MDS_NO=1. CPUs with MDS_NO=0 are not planned to get + * MSR_IA32_TSX_CTRL support even after a microcode update. Thus, + * tsx= cmdline requests will do nothing on CPUs without + * MSR_IA32_TSX_CTRL support. + */ + if (!(x86_read_arch_cap_msr() & ARCH_CAP_TSX_CTRL_MSR)) return;
+ setup_force_cpu_cap(X86_FEATURE_MSR_TSX_CTRL); + ret = cmdline_find_option(boot_command_line, "tsx", arg, sizeof(arg)); if (ret >= 0) { if (!strcmp(arg, "on")) {
base-commit: 6d46ef50b123f2da3871690e619f5169eb97af92
commit 50bcceb7724e471d9b591803889df45dcbb584bc upstream.
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need to be saved and restored at hibernate and resume. However, it has zero awareness of CPU support for these MSRs. It mostly works by unconditionally attempting to manipulate these MSRs and relying on rdmsrl_safe() being able to handle a #GP on CPUs where the support is unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR while writes (WRMSR) are not. In this case, msr_build_context() sees a successful read (RDMSR) and marks the MSR as valid. Then, later, a write (WRMSR) fails, producing a nasty (but harmless) error message. This causes restore_processor_state() to try and restore it, but writing this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \ at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20) Call Trace: <TASK> restore_processor_state x86_acpi_suspend_lowlevel acpi_suspend_enter suspend_devices_and_enter pm_suspend.cold state_store kernfs_fop_write_iter vfs_write ksys_write do_syscall_64 ? do_syscall_64 ? up_read ? lock_is_held_type ? asm_exc_page_fault ? lockdep_hardirqs_on entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid trying to manipulate the MSR when the feature bit is clear. This required adding a X86_FEATURE bit for MSRs that do not have one already, but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ] [Pawan: Resolve build issue in backport]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup") Reported-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.166853973... --- arch/x86/power/cpu.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 61581c45788e..4e4e76ecd3ec 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -516,16 +516,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void) { - u32 spec_msr_id[] = { - MSR_IA32_SPEC_CTRL, - MSR_IA32_TSX_CTRL, - MSR_TSX_FORCE_ABORT, - MSR_IA32_MCU_OPT_CTRL, - MSR_AMD64_LS_CFG, - MSR_AMD64_DE_CFG, + struct msr_enumeration { + u32 msr_no; + u32 feature; + } msr_enum[] = { + { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL }, + { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL }, + { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT }, + { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL }, + { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD }, + { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC }, }; + int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id)); + for (i = 0; i < ARRAY_SIZE(msr_enum); i++) { + if (boot_cpu_has(msr_enum[i].feature)) + msr_build_context(&msr_enum[i].msr_no, 1); + } }
static int pm_check_save_msr(void)
On Thu, Dec 01, 2022 at 06:58:52AM +0100, Greg KH wrote:
On Wed, Nov 30, 2022 at 03:11:09PM -0800, Pawan Gupta wrote:
On Wed, Nov 30, 2022 at 01:31:56PM +0100, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.10-stable tree.
Patch is applying cleanly on v5.10.156. Is it not where it should be applied?
Did you try building with the patch applied? :(
No because the message said "does not apply" and I wanted to make sure I am applying at the right place. I see the build issue, its because the dependent patch aaa65d17eec3 ("x86/tsx: Add a feature bit for TSX control MSR support") was missing. Strangely the dependent patch got picked up for 5.15:
https://lore.kernel.org/stable/20221130180537.128827808@linuxfoundation.org/
but not for 5.10 and older. I sent the 5.10 backport for both the patches. They were compile and boot tested.
Same observation for failed patch on 5.4.
Same on 5.4, breaks the build, right?
Yes, fixing it.
I don't have special emails for "this applied but did not build" as that's more rare.
Ohk, will keep that in mind.
linux-stable-mirror@lists.linaro.org