[PATCH 0/3] x86: make pat and mtrr independent from each other

List overview All Threads
Download

newer

older

stable-rc/queue/4.14 build: 191...

stable-rc/queue/4.9 build: 182...

Juergen Gross

15 Jul 2022 15 Jul '22

2:25 p.m.

Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT to be disabled. This leads to some drivers believing that not all cache modes are available, resulting in failures or degraded functionality.

The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all cpus the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code.

Fix all of that by:

- moving the function needed by PAT from MTRR specific code one level up - adding a PAT indirection layer supporting the 3 cases "no or disabled PAT", "PAT under kernel control", and "PAT under Xen control" - removing the dependency of PAT on MTRR

Juergen Gross (3): x86: move some code out of arch/x86/kernel/cpu/mtrr x86: add wrapper functions for mtrr functions handling also pat x86: decouple pat and mtrr handling

arch/x86/include/asm/memtype.h | 13 ++- arch/x86/include/asm/mtrr.h | 27 ++++-- arch/x86/include/asm/processor.h | 10 +++ arch/x86/kernel/cpu/common.c | 123 +++++++++++++++++++++++++++- arch/x86/kernel/cpu/mtrr/generic.c | 90 ++------------------ arch/x86/kernel/cpu/mtrr/mtrr.c | 58 ++++--------- arch/x86/kernel/cpu/mtrr/mtrr.h | 1 - arch/x86/kernel/setup.c | 12 +-- arch/x86/kernel/smpboot.c | 8 +- arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++-------- arch/x86/power/cpu.c | 2 +- arch/x86/xen/enlighten_pv.c | 4 + 12 files changed, 289 insertions(+), 186 deletions(-)

-- 2.35.3

Show replies by date

Juergen Gross

15 Jul 15 Jul

2:25 p.m.

New subject: [PATCH 1/3] x86: move some code out of arch/x86/kernel/cpu/mtrr

Prepare making PAT and MTRR support independent from each other by moving some code needed by both out of the MTRR specific sources.

Cc: stable@vger.kernel.org # 5.17 Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/include/asm/mtrr.h | 4 ++ arch/x86/include/asm/processor.h | 3 ++ arch/x86/kernel/cpu/common.c | 76 ++++++++++++++++++++++++++++ arch/x86/kernel/cpu/mtrr/generic.c | 80 +++--------------------------- 4 files changed, 91 insertions(+), 72 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index 76d726074c16..12a16caed395 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -48,6 +48,8 @@ extern void mtrr_aps_init(void); extern void mtrr_bp_restore(void); extern int mtrr_trim_uncached_memory(unsigned long end_pfn); extern int amd_special_default_mtrr(void); +void mtrr_disable(void); +void mtrr_enable(void); # else static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform) { @@ -87,6 +89,8 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi) #define set_mtrr_aps_delayed_init() do {} while (0) #define mtrr_aps_init() do {} while (0) #define mtrr_bp_restore() do {} while (0) +#define mtrr_disable() do {} while (0) +#define mtrr_enable() do {} while (0) # endif

#ifdef CONFIG_COMPAT diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 356308c73951..5c934b922450 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -865,4 +865,7 @@ bool arch_is_platform_page(u64 paddr); #define arch_is_platform_page arch_is_platform_page #endif

+void cache_disable(void); +void cache_enable(void); + #endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 736262a76a12..e43322f8a4ef 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -61,6 +61,7 @@ #include <asm/sigframe.h> #include <asm/traps.h> #include <asm/sev.h> +#include <asm/mtrr.h>

#include "cpu.h"

@@ -2327,3 +2328,78 @@ void arch_smt_update(void) /* Check whether IPI broadcasting can be enabled */ apic_smt_update(); } + +/* + * Disable and enable caches. Needed for changing MTRRs and the PAT MSR. + * + * Since we are disabling the cache don't allow any interrupts, + * they would run extremely slow and would only increase the pain. + * + * The caller must ensure that local interrupts are disabled and + * are reenabled after cache_enable() has been called. + */ +static unsigned long saved_cr4; +static DEFINE_RAW_SPINLOCK(cache_disable_lock); + +void cache_disable(void) __acquires(cache_disable_lock) +{ + unsigned long cr0; + + /* + * Note that this is not ideal + * since the cache is only flushed/disabled for this CPU while the + * MTRRs are changed, but changing this requires more invasive + * changes to the way the kernel boots + */ + + raw_spin_lock(&cache_disable_lock); + + /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */ + cr0 = read_cr0() | X86_CR0_CD; + write_cr0(cr0); + + /* + * Cache flushing is the most time-consuming step when programming + * the MTRRs. Fortunately, as per the Intel Software Development + * Manual, we can skip it if the processor supports cache self- + * snooping. + */ + if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) + wbinvd(); + + /* Save value of CR4 and clear Page Global Enable (bit 7) */ + if (boot_cpu_has(X86_FEATURE_PGE)) { + saved_cr4 = __read_cr4(); + __write_cr4(saved_cr4 & ~X86_CR4_PGE); + } + + /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */ + count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL); + flush_tlb_local(); + + if (boot_cpu_has(X86_FEATURE_MTRR)) + mtrr_disable(); + + /* Again, only flush caches if we have to. */ + if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) + wbinvd(); +} + +void cache_enable(void) __releases(cache_disable_lock) +{ + /* Flush TLBs (no need to flush caches - they are disabled) */ + count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL); + flush_tlb_local(); + + if (boot_cpu_has(X86_FEATURE_MTRR)) + mtrr_enable(); + + /* Enable caches */ + write_cr0(read_cr0() & ~X86_CR0_CD); + + /* Restore value of CR4 */ + if (boot_cpu_has(X86_FEATURE_PGE)) + __write_cr4(saved_cr4); + + raw_spin_unlock(&cache_disable_lock); +} diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index 558108296f3c..84732215b61d 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -396,9 +396,6 @@ print_fixed(unsigned base, unsigned step, const mtrr_type *types) } }

-static void prepare_set(void); -static void post_set(void); - static void __init print_mtrr_state(void) { unsigned int i; @@ -450,11 +447,11 @@ void __init mtrr_bp_pat_init(void) unsigned long flags;

local_irq_save(flags); - prepare_set(); + cache_disable();

pat_init();

- post_set(); + cache_enable(); local_irq_restore(flags); }

@@ -715,80 +712,19 @@ static unsigned long set_mtrr_state(void) return change_mask; }

- -static unsigned long cr4; -static DEFINE_RAW_SPINLOCK(set_atomicity_lock); - -/* - * Since we are disabling the cache don't allow any interrupts, - * they would run extremely slow and would only increase the pain. - * - * The caller must ensure that local interrupts are disabled and - * are reenabled after post_set() has been called. - */ -static void prepare_set(void) __acquires(set_atomicity_lock) +void mtrr_disable(void) { - unsigned long cr0; - - /* - * Note that this is not ideal - * since the cache is only flushed/disabled for this CPU while the - * MTRRs are changed, but changing this requires more invasive - * changes to the way the kernel boots - */ - - raw_spin_lock(&set_atomicity_lock); - - /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */ - cr0 = read_cr0() | X86_CR0_CD; - write_cr0(cr0); - - /* - * Cache flushing is the most time-consuming step when programming - * the MTRRs. Fortunately, as per the Intel Software Development - * Manual, we can skip it if the processor supports cache self- - * snooping. - */ - if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) - wbinvd(); - - /* Save value of CR4 and clear Page Global Enable (bit 7) */ - if (boot_cpu_has(X86_FEATURE_PGE)) { - cr4 = __read_cr4(); - __write_cr4(cr4 & ~X86_CR4_PGE); - } - - /* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */ - count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL); - flush_tlb_local(); - /* Save MTRR state */ rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);

/* Disable MTRRs, and set the default type to uncached */ mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi); - - /* Again, only flush caches if we have to. */ - if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) - wbinvd(); }

-static void post_set(void) __releases(set_atomicity_lock) +void mtrr_enable(void) { - /* Flush TLBs (no need to flush caches - they are disabled) */ - count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL); - flush_tlb_local(); - /* Intel (P6) standard MTRRs */ mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi); - - /* Enable caches */ - write_cr0(read_cr0() & ~X86_CR0_CD); - - /* Restore value of CR4 */ - if (boot_cpu_has(X86_FEATURE_PGE)) - __write_cr4(cr4); - raw_spin_unlock(&set_atomicity_lock); }

static void generic_set_all(void) @@ -797,7 +733,7 @@ static void generic_set_all(void) unsigned long flags;

local_irq_save(flags); - prepare_set(); + cache_disable();

/* Actually set the state */ mask = set_mtrr_state(); @@ -805,7 +741,7 @@ static void generic_set_all(void) /* also set PAT */ pat_init();

- post_set(); + cache_enable(); local_irq_restore(flags);

/* Use the atomic bitops to update the global mask */ @@ -836,7 +772,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base, vr = &mtrr_state.var_ranges[reg];

local_irq_save(flags); - prepare_set(); + cache_disable();

if (size == 0) { /* @@ -855,7 +791,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base, mtrr_wrmsr(MTRRphysMask_MSR(reg), vr->mask_lo, vr->mask_hi); }

- post_set(); + cache_enable(); local_irq_restore(flags); }

-- 2.35.3

Borislav Petkov

18 Jul 18 Jul

12:20 p.m.

New subject: [PATCH 1/3] x86: move some code out of arch/x86/kernel/cpu/mtrr

On Fri, Jul 15, 2022 at 04:25:47PM +0200, Juergen Gross wrote:

...

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 736262a76a12..e43322f8a4ef 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c

I guess the move's ok but not into cpu/common.c pls. That thing is huuuge and is a dumping ground for everything.

arch/x86/kernel/cpu/cacheinfo.c looks like a more viable candidate for all things cache.

Rest looks trivial.

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Juergen Gross

15 Jul 15 Jul

2:25 p.m.

New subject: [PATCH 2/3] x86: add wrapper functions for mtrr functions handling also pat

There are several MTRR functions which also do PAT handling. In order to support PAT handling without MTRR in the future, add some wrappers for those functions.

Cc: stable@vger.kernel.org # 5.17 Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/include/asm/mtrr.h | 2 -- arch/x86/include/asm/processor.h | 7 +++++ arch/x86/kernel/cpu/common.c | 44 +++++++++++++++++++++++++++++++- arch/x86/kernel/cpu/mtrr/mtrr.c | 25 +++--------------- arch/x86/kernel/setup.c | 5 +--- arch/x86/kernel/smpboot.c | 8 +++--- arch/x86/power/cpu.c | 2 +- 7 files changed, 59 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index 12a16caed395..900083ac9f60 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -43,7 +43,6 @@ extern int mtrr_del(int reg, unsigned long base, unsigned long size); extern int mtrr_del_page(int reg, unsigned long base, unsigned long size); extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi); extern void mtrr_ap_init(void); -extern void set_mtrr_aps_delayed_init(void); extern void mtrr_aps_init(void); extern void mtrr_bp_restore(void); extern int mtrr_trim_uncached_memory(unsigned long end_pfn); @@ -86,7 +85,6 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi) { } #define mtrr_ap_init() do {} while (0) -#define set_mtrr_aps_delayed_init() do {} while (0) #define mtrr_aps_init() do {} while (0) #define mtrr_bp_restore() do {} while (0) #define mtrr_disable() do {} while (0) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 5c934b922450..e2140204fb7e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -865,7 +865,14 @@ bool arch_is_platform_page(u64 paddr); #define arch_is_platform_page arch_is_platform_page #endif

+extern bool cache_aps_delayed_init; + void cache_disable(void); void cache_enable(void); +void cache_bp_init(void); +void cache_ap_init(void); +void cache_set_aps_delayed_init(void); +void cache_aps_init(void); +void cache_bp_restore(void);

#endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e43322f8a4ef..0a1bd14f7966 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1929,7 +1929,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c) #ifdef CONFIG_X86_32 enable_sep_cpu(); #endif - mtrr_ap_init(); + cache_ap_init(); validate_apic_and_package_id(c); x86_spec_ctrl_setup_ap(); update_srbds_msr(); @@ -2403,3 +2403,45 @@ void cache_enable(void) __releases(cache_disable_lock)

raw_spin_unlock(&cache_disable_lock); } + +void __init cache_bp_init(void) +{ + if (IS_ENABLED(CONFIG_MTRR)) + mtrr_bp_init(); + else + pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel."); +} + +void cache_ap_init(void) +{ + if (cache_aps_delayed_init) + return; + + mtrr_ap_init(); +} + +bool cache_aps_delayed_init; + +void cache_set_aps_delayed_init(void) +{ + cache_aps_delayed_init = true; +} + +void cache_aps_init(void) +{ + /* + * Check if someone has requested the delay of AP cache initialization, + * by doing cache_set_aps_delayed_init(), prior to this point. If not, + * then we are done. + */ + if (!cache_aps_delayed_init) + return; + + mtrr_aps_init(); + cache_aps_delayed_init = false; +} + +void cache_bp_restore(void) +{ + mtrr_bp_restore(); +} diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c index 2746cac9d8a9..c1593cfae641 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.c +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c @@ -69,7 +69,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; static DEFINE_MUTEX(mtrr_mutex);

u64 size_or_mask, size_and_mask; -static bool mtrr_aps_delayed_init;

static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;

@@ -176,7 +175,8 @@ static int mtrr_rendezvous_handler(void *info) if (data->smp_reg != ~0U) { mtrr_if->set(data->smp_reg, data->smp_base, data->smp_size, data->smp_type); - } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) { + } else if ((use_intel() && cache_aps_delayed_init) || + !cpu_online(smp_processor_id())) { mtrr_if->set_all(); } return 0; @@ -789,7 +789,7 @@ void mtrr_ap_init(void) if (!mtrr_enabled()) return;

- if (!use_intel() || mtrr_aps_delayed_init) + if (!use_intel()) return;

/* @@ -823,16 +823,6 @@ void mtrr_save_state(void) smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); }

-void set_mtrr_aps_delayed_init(void) -{ - if (!mtrr_enabled()) - return; - if (!use_intel()) - return; - - mtrr_aps_delayed_init = true; -} - /* * Delayed MTRR initialization for all AP's */ @@ -841,16 +831,7 @@ void mtrr_aps_init(void) if (!use_intel() || !mtrr_enabled()) return;

- /* - * Check if someone has requested the delay of AP MTRR initialization, - * by doing set_mtrr_aps_delayed_init(), prior to this point. If not, - * then we are done. - */ - if (!mtrr_aps_delayed_init) - return; - set_mtrr(~0U, 0, 0, 0); - mtrr_aps_delayed_init = false; }

void mtrr_bp_restore(void) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index bd6c6fd373ae..27d61f73c68a 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1001,10 +1001,7 @@ void __init setup_arch(char **cmdline_p) max_pfn = e820__end_of_ram_pfn();

/* update e820 for memory not covered by WB MTRRs */ - if (IS_ENABLED(CONFIG_MTRR)) - mtrr_bp_init(); - else - pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel."); + cache_bp_init();

if (mtrr_trim_uncached_memory(max_pfn)) max_pfn = e820__end_of_ram_pfn(); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 5e7f9532a10d..535d73a47062 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1432,7 +1432,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

uv_system_init();

- set_mtrr_aps_delayed_init(); + cache_set_aps_delayed_init();

smp_quirk_init_udelay();

@@ -1443,12 +1443,12 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

void arch_thaw_secondary_cpus_begin(void) { - set_mtrr_aps_delayed_init(); + cache_set_aps_delayed_init(); }

void arch_thaw_secondary_cpus_end(void) { - mtrr_aps_init(); + cache_aps_init(); }

/* @@ -1491,7 +1491,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)

nmi_selftest(); impress_friends(); - mtrr_aps_init(); + cache_aps_init(); }

static int __initdata setup_possible_cpus = -1; diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index bb176c72891c..21e014715322 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -261,7 +261,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) do_fpu_end(); tsc_verify_tsc_adjust(true); x86_platform.restore_sched_clock_state(); - mtrr_bp_restore(); + cache_bp_restore(); perf_restore_debug_store();

c = &cpu_data(smp_processor_id());

-- 2.35.3

Rafael J. Wysocki

4:41 p.m.

New subject: [PATCH 2/3] x86: add wrapper functions for mtrr functions handling also pat

On Fri, Jul 15, 2022 at 4:25 PM Juergen Gross jgross@suse.com wrote:

...

There are several MTRR functions which also do PAT handling. In order to support PAT handling without MTRR in the future, add some wrappers for those functions.

Cc: stable@vger.kernel.org # 5.17 Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: Juergen Gross jgross@suse.com

Do I understand correctly that this particular patch doesn't change the behavior?

If so, it would be good to mention that in the changelog.

...

arch/x86/include/asm/mtrr.h | 2 -- arch/x86/include/asm/processor.h | 7 +++++ arch/x86/kernel/cpu/common.c | 44 +++++++++++++++++++++++++++++++- arch/x86/kernel/cpu/mtrr/mtrr.c | 25 +++--------------- arch/x86/kernel/setup.c | 5 +--- arch/x86/kernel/smpboot.c | 8 +++--- arch/x86/power/cpu.c | 2 +- 7 files changed, 59 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index 12a16caed395..900083ac9f60 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -43,7 +43,6 @@ extern int mtrr_del(int reg, unsigned long base, unsigned long size); extern int mtrr_del_page(int reg, unsigned long base, unsigned long size); extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi); extern void mtrr_ap_init(void); -extern void set_mtrr_aps_delayed_init(void); extern void mtrr_aps_init(void); extern void mtrr_bp_restore(void); extern int mtrr_trim_uncached_memory(unsigned long end_pfn); @@ -86,7 +85,6 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi) { } #define mtrr_ap_init() do {} while (0) -#define set_mtrr_aps_delayed_init() do {} while (0) #define mtrr_aps_init() do {} while (0) #define mtrr_bp_restore() do {} while (0) #define mtrr_disable() do {} while (0) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 5c934b922450..e2140204fb7e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -865,7 +865,14 @@ bool arch_is_platform_page(u64 paddr); #define arch_is_platform_page arch_is_platform_page #endif

+extern bool cache_aps_delayed_init;

void cache_disable(void); void cache_enable(void); +void cache_bp_init(void); +void cache_ap_init(void); +void cache_set_aps_delayed_init(void); +void cache_aps_init(void); +void cache_bp_restore(void);

#endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e43322f8a4ef..0a1bd14f7966 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1929,7 +1929,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c) #ifdef CONFIG_X86_32 enable_sep_cpu(); #endif
  mtrr_ap_init();
  cache_ap_init();
  validate_apic_and_package_id(c);
  x86_spec_ctrl_setup_ap();
  update_srbds_msr();
@@ -2403,3 +2403,45 @@ void cache_enable(void) __releases(cache_disable_lock)
    raw_spin_unlock(&cache_disable_lock);
}

+void __init cache_bp_init(void) +{
  if (IS_ENABLED(CONFIG_MTRR))
          mtrr_bp_init();
  else
          pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
+}

+void cache_ap_init(void) +{
  if (cache_aps_delayed_init)
          return;
  mtrr_ap_init();
+}

+bool cache_aps_delayed_init;

+void cache_set_aps_delayed_init(void) +{
  cache_aps_delayed_init = true;
+}

+void cache_aps_init(void) +{
  /*
   * Check if someone has requested the delay of AP cache initialization,
   * by doing cache_set_aps_delayed_init(), prior to this point. If not,
   * then we are done.
   */
  if (!cache_aps_delayed_init)
          return;
  mtrr_aps_init();
  cache_aps_delayed_init = false;
+}

+void cache_bp_restore(void) +{
  mtrr_bp_restore();
+} diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c index 2746cac9d8a9..c1593cfae641 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.c +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c @@ -69,7 +69,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; static DEFINE_MUTEX(mtrr_mutex);

u64 size_or_mask, size_and_mask; -static bool mtrr_aps_delayed_init;

static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;

@@ -176,7 +175,8 @@ static int mtrr_rendezvous_handler(void *info) if (data->smp_reg != ~0U) { mtrr_if->set(data->smp_reg, data->smp_base, data->smp_size, data->smp_type);
  } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
  } else if ((use_intel() && cache_aps_delayed_init) ||
             !cpu_online(smp_processor_id())) {
          mtrr_if->set_all();
  }
  return 0;
@@ -789,7 +789,7 @@ void mtrr_ap_init(void) if (!mtrr_enabled()) return;
  if (!use_intel() || mtrr_aps_delayed_init)
  if (!use_intel())
          return;

  /*
@@ -823,16 +823,6 @@ void mtrr_save_state(void) smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); }

-void set_mtrr_aps_delayed_init(void) -{
  if (!mtrr_enabled())
          return;
  if (!use_intel())
          return;
  mtrr_aps_delayed_init = true;
-}

/*

Delayed MTRR initialization for all AP's

*/ @@ -841,16 +831,7 @@ void mtrr_aps_init(void) if (!use_intel() || !mtrr_enabled()) return;
  /*
   * Check if someone has requested the delay of AP MTRR initialization,
   * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
   * then we are done.
   */
  if (!mtrr_aps_delayed_init)
          return;
  set_mtrr(~0U, 0, 0, 0);
  mtrr_aps_delayed_init = false;
}

void mtrr_bp_restore(void) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index bd6c6fd373ae..27d61f73c68a 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1001,10 +1001,7 @@ void __init setup_arch(char **cmdline_p) max_pfn = e820__end_of_ram_pfn();
    /* update e820 for memory not covered by WB MTRRs */
  if (IS_ENABLED(CONFIG_MTRR))
          mtrr_bp_init();
  else
          pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
  cache_bp_init();

  if (mtrr_trim_uncached_memory(max_pfn))
          max_pfn = e820__end_of_ram_pfn();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 5e7f9532a10d..535d73a47062 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1432,7 +1432,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
    uv_system_init();
  set_mtrr_aps_delayed_init();
  cache_set_aps_delayed_init();

  smp_quirk_init_udelay();
@@ -1443,12 +1443,12 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

void arch_thaw_secondary_cpus_begin(void) {
  set_mtrr_aps_delayed_init();
  cache_set_aps_delayed_init();
}

void arch_thaw_secondary_cpus_end(void) {
  mtrr_aps_init();
  cache_aps_init();
}

/* @@ -1491,7 +1491,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
    nmi_selftest();
    impress_friends();
  mtrr_aps_init();
  cache_aps_init();
}

static int __initdata setup_possible_cpus = -1; diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index bb176c72891c..21e014715322 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -261,7 +261,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) do_fpu_end(); tsc_verify_tsc_adjust(true); x86_platform.restore_sched_clock_state();
  mtrr_bp_restore();
  cache_bp_restore();
  perf_restore_debug_store();

  c = &cpu_data(smp_processor_id());
-- 2.35.3

Juergen Gross

2:25 p.m.

New subject: [PATCH 3/3] x86: decouple pat and mtrr handling

Today PAT is usable only with MTRR being active, with some nasty tweaks to make PAT usable when running as Xen PV guest, which doesn't support MTRR.

The reason for this coupling is, that both, PAT MSR changes and MTRR changes, require a similar sequence and so full PAT support was added using the already available MTRR handling.

Xen PV PAT handling can work without MTRR, as it just needs to consume the PAT MSR setting done by the hypervisor without the ability and need to change it. This in turn has resulted in a convoluted initialization sequence and wrong decisions regarding cache mode availability due to misguiding PAT availability flags.

Fix all of that by allowing to use PAT without MTRR and by adding an environment dependent PAT init function.

Cc: stable@vger.kernel.org # 5.17 Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/include/asm/memtype.h | 13 ++- arch/x86/include/asm/mtrr.h | 21 +++-- arch/x86/kernel/cpu/common.c | 13 +-- arch/x86/kernel/cpu/mtrr/generic.c | 14 ---- arch/x86/kernel/cpu/mtrr/mtrr.c | 33 ++++---- arch/x86/kernel/cpu/mtrr/mtrr.h | 1 - arch/x86/kernel/setup.c | 7 -- arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++-------- arch/x86/xen/enlighten_pv.c | 4 + 9 files changed, 146 insertions(+), 87 deletions(-)

diff --git a/arch/x86/include/asm/memtype.h b/arch/x86/include/asm/memtype.h index 9ca760e430b9..93ad980631dc 100644 --- a/arch/x86/include/asm/memtype.h +++ b/arch/x86/include/asm/memtype.h @@ -8,7 +8,18 @@ extern bool pat_enabled(void); extern void pat_disable(const char *reason); extern void pat_init(void); -extern void init_cache_modes(void); +#ifdef CONFIG_X86_PAT +void pat_init_set(void (*func)(void)); +void pat_init_noset(void); +void pat_cpu_init(void); +void pat_ap_init_nomtrr(void); +void pat_aps_init_nomtrr(void); +#else +#define pat_init_set(f) do { } while (0) +#define pat_cpu_init(f) do { } while (0) +#define pat_ap_init_nomtrr(f) do { } while (0) +#define pat_aps_init_nomtrr(f) do { } while (0) +#endif

extern int memtype_reserve(u64 start, u64 end, enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm); diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index 900083ac9f60..bb76e5c6e21d 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -42,9 +42,9 @@ extern int mtrr_add_page(unsigned long base, unsigned long size, extern int mtrr_del(int reg, unsigned long base, unsigned long size); extern int mtrr_del_page(int reg, unsigned long base, unsigned long size); extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi); -extern void mtrr_ap_init(void); -extern void mtrr_aps_init(void); -extern void mtrr_bp_restore(void); +extern bool mtrr_ap_init(void); +extern bool mtrr_aps_init(void); +extern bool mtrr_bp_restore(void); extern int mtrr_trim_uncached_memory(unsigned long end_pfn); extern int amd_special_default_mtrr(void); void mtrr_disable(void); @@ -84,9 +84,18 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn) static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi) { } -#define mtrr_ap_init() do {} while (0) -#define mtrr_aps_init() do {} while (0) -#define mtrr_bp_restore() do {} while (0) +static inline bool mtrr_ap_init(void) +{ + return false; +} +static inline bool mtrr_aps_init(void) +{ + return false; +} +static inline bool mtrr_bp_restore(void) +{ + return false; +} #define mtrr_disable() do {} while (0) #define mtrr_enable() do {} while (0) # endif diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0a1bd14f7966..3edfb779dab5 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2408,8 +2408,8 @@ void __init cache_bp_init(void) { if (IS_ENABLED(CONFIG_MTRR)) mtrr_bp_init(); - else - pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel."); + + pat_cpu_init(); }

void cache_ap_init(void) @@ -2417,7 +2417,8 @@ void cache_ap_init(void) if (cache_aps_delayed_init) return;

- mtrr_ap_init(); + if (!mtrr_ap_init()) + pat_ap_init_nomtrr(); }

bool cache_aps_delayed_init; @@ -2437,11 +2438,13 @@ void cache_aps_init(void) if (!cache_aps_delayed_init) return;

- mtrr_aps_init(); + if (!mtrr_aps_init()) + pat_aps_init_nomtrr(); cache_aps_delayed_init = false; }

void cache_bp_restore(void) { - mtrr_bp_restore(); + if (!mtrr_bp_restore()) + pat_cpu_init(); } diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index 84732215b61d..bb6dd96923a4 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -441,20 +441,6 @@ static void __init print_mtrr_state(void) pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20); }

-/* PAT setup for BP. We need to go through sync steps here */ -void __init mtrr_bp_pat_init(void) -{ - unsigned long flags; - - local_irq_save(flags); - cache_disable(); - - pat_init(); - - cache_enable(); - local_irq_restore(flags); -} - /* Grab all of the MTRR state for this CPU into *state */ bool __init get_mtrr_state(void) { diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c index c1593cfae641..76b1553d90f9 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.c +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c @@ -762,9 +762,6 @@ void __init mtrr_bp_init(void) /* BIOS may override */ __mtrr_enabled = get_mtrr_state();

- if (mtrr_enabled()) - mtrr_bp_pat_init(); - if (mtrr_cleanup(phys_addr)) { changed_by_mtrr_cleanup = 1; mtrr_if->set_all(); @@ -772,25 +769,17 @@ void __init mtrr_bp_init(void) } }

- if (!mtrr_enabled()) { + if (!mtrr_enabled()) pr_info("Disabled\n"); - - /* - * PAT initialization relies on MTRR's rendezvous handler. - * Skip PAT init until the handler can initialize both - * features independently. - */ - pat_disable("MTRRs disabled, skipping PAT initialization too."); - } }

-void mtrr_ap_init(void) +bool mtrr_ap_init(void) { if (!mtrr_enabled()) - return; + return false;

if (!use_intel()) - return; + return false;

/* * Ideally we should hold mtrr_mutex here to avoid mtrr entries @@ -806,6 +795,8 @@ void mtrr_ap_init(void) * lock to prevent mtrr entry changes */ set_mtrr_from_inactive_cpu(~0U, 0, 0, 0); + + return true; }

/** @@ -826,20 +817,24 @@ void mtrr_save_state(void) /* * Delayed MTRR initialization for all AP's */ -void mtrr_aps_init(void) +bool mtrr_aps_init(void) { if (!use_intel() || !mtrr_enabled()) - return; + return false;

set_mtrr(~0U, 0, 0, 0); + + return true; }

-void mtrr_bp_restore(void) +bool mtrr_bp_restore(void) { if (!use_intel() || !mtrr_enabled()) - return; + return false;

mtrr_if->set_all(); + + return true; }

static int __init mtrr_init_finialize(void) diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h index 2ac99e561181..f6135a539073 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.h +++ b/arch/x86/kernel/cpu/mtrr/mtrr.h @@ -53,7 +53,6 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt); void fill_mtrr_var_range(unsigned int index, u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi); bool get_mtrr_state(void); -void mtrr_bp_pat_init(void);

extern void __init set_mtrr_ops(const struct mtrr_ops *ops);

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 27d61f73c68a..14bb40cd22c6 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1008,13 +1008,6 @@ void __init setup_arch(char **cmdline_p)

max_possible_pfn = max_pfn;

- /* - * This call is required when the CPU does not support PAT. If - * mtrr_bp_init() invoked it already via pat_init() the call has no - * effect. - */ - init_cache_modes(); - /* * Define random base addresses for memory sections after max_pfn is * defined and before each memory section base is used. diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..3d4bc27ffebb 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -41,6 +41,7 @@ #include <linux/mm.h> #include <linux/fs.h> #include <linux/rbtree.h> +#include <linux/stop_machine.h>

#include <asm/cacheflush.h> #include <asm/processor.h> @@ -65,30 +66,23 @@ static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized;

-/* - * PAT support is enabled by default, but can be disabled for - * various user-requested or hardware-forced reasons: - */ -void pat_disable(const char *msg_reason) -{ - if (pat_disabled) - return; +static void init_cache_modes(void);

- if (pat_bp_initialized) { - WARN_ONCE(1, "x86/PAT: PAT cannot be disabled after initialization\n"); - return; - } - - pat_disabled = true; - pr_info("x86/PAT: %s\n", msg_reason); +#ifdef CONFIG_X86_PAT +static void pat_init_nopat(void) +{ + init_cache_modes(); }

static int __init nopat(char *str) { - pat_disable("PAT support disabled via boot option."); + pat_disabled = true; + pr_info("PAT support disabled via boot option."); + pat_init_set(pat_init_nopat); return 0; } early_param("nopat", nopat); +#endif

bool pat_enabled(void) { @@ -243,13 +237,17 @@ static void pat_bp_init(u64 pat) u64 tmp_pat;

if (!boot_cpu_has(X86_FEATURE_PAT)) { - pat_disable("PAT not supported by the CPU."); + pr_info("PAT not supported by the CPU."); + pat_disabled = true; + pat_init_set(pat_init_nopat); return; }

rdmsrl(MSR_IA32_CR_PAT, tmp_pat); if (!tmp_pat) { - pat_disable("PAT support disabled by the firmware."); + pr_info("PAT support disabled by the firmware."); + pat_disabled = true; + pat_init_set(pat_init_nopat); return; }

@@ -272,7 +270,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); }

-void init_cache_modes(void) +static void init_cache_modes(void) { u64 pat = 0;

@@ -318,25 +316,12 @@ void init_cache_modes(void) __init_cache_modes(pat); }

-/** - * pat_init - Initialize the PAT MSR and PAT table on the current CPU - * - * This function initializes PAT MSR and PAT table with an OS-defined value - * to enable additional cache attributes, WC, WT and WP. - * - * This function must be called on all CPUs using the specific sequence of - * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this - * procedure for PAT. - */ -void pat_init(void) +#ifdef CONFIG_X86_PAT +static void pat_init_native(void) { u64 pat; struct cpuinfo_x86 *c = &boot_cpu_data;

-#ifndef CONFIG_X86_PAT - pr_info_once("x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.\n"); -#endif - if (pat_disabled) return;

@@ -406,6 +391,80 @@ void pat_init(void)

#undef PAT

+void pat_init_noset(void) +{ + pat_bp_enabled = true; + init_cache_modes(); +} + +static void (*pat_init_func)(void) = pat_init_native; + +void __init pat_init_set(void (*func)(void)) +{ + pat_init_func = func; +} + +/** + * pat_init - Initialize the PAT MSR and PAT table on the current CPU + * + * This function initializes PAT MSR and PAT table with an OS-defined value + * to enable additional cache attributes, WC, WT and WP. + * + * This function must be called on all CPUs using the specific sequence of + * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this + * procedure for PAT. + */ +void pat_init(void) +{ + pat_init_func(); +} + +static int __pat_cpu_init(void *data) +{ + unsigned long flags; + + local_irq_save(flags); + cache_disable(); + + pat_init(); + + cache_enable(); + local_irq_restore(flags); + + return 0; +} + +void pat_cpu_init(void) +{ + if (pat_init_func != pat_init_native) + pat_init(); + else + __pat_cpu_init(NULL); +} + +void pat_ap_init_nomtrr(void) +{ + if (pat_init_func != pat_init_native) + return; + + stop_machine_from_inactive_cpu(__pat_cpu_init, NULL, cpu_callout_mask); +} + +void pat_aps_init_nomtrr(void) +{ + if (pat_init_func != pat_init_native) + return; + + stop_machine(__pat_cpu_init, NULL, cpu_online_mask); +} +#else +void pat_init(void) +{ + pr_info_once("x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.\n"); + init_cache_modes(); +} +#endif /* CONFIG_X86_PAT */ + static DEFINE_SPINLOCK(memtype_lock); /* protects memtype accesses */

/* diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 70fb2ea85e90..97831d822872 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -69,6 +69,7 @@ #include <asm/mwait.h> #include <asm/pci_x86.h> #include <asm/cpu.h> +#include <asm/memtype.h> #ifdef CONFIG_X86_IOPL_IOPERM #include <asm/io_bitmap.h> #endif @@ -1317,6 +1318,9 @@ asmlinkage __visible void __init xen_start_kernel(struct start_info *si) initrd_start = __pa(xen_start_info->mod_start); }

+ /* Don't try to modify PAT MSR. */ + pat_init_set(pat_init_noset); + /* Poke various useful things into boot_params */ boot_params.hdr.type_of_loader = (9 << 4) | 0; boot_params.hdr.ramdisk_image = initrd_start;

-- 2.35.3

Borislav Petkov

19 Jul 19 Jul

3:15 p.m.

New subject: [PATCH 3/3] x86: decouple pat and mtrr handling

On Fri, Jul 15, 2022 at 04:25:49PM +0200, Juergen Gross wrote:

...

Today PAT is usable only with MTRR being active, with some nasty tweaks to make PAT usable when running as Xen PV guest, which doesn't support MTRR.

The reason for this coupling is, that both, PAT MSR changes and MTRR changes, require a similar sequence and so full PAT support was added using the already available MTRR handling.

Xen PV PAT handling can work without MTRR, as it just needs to consume the PAT MSR setting done by the hypervisor without the ability and need to change it. This in turn has resulted in a convoluted initialization sequence and wrong decisions regarding cache mode availability due to misguiding PAT availability flags.

Fix all of that by allowing to use PAT without MTRR and by adding an environment dependent PAT init function.

Aha, there's the explanation I was looking for.

...

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0a1bd14f7966..3edfb779dab5 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2408,8 +2408,8 @@ void __init cache_bp_init(void) { if (IS_ENABLED(CONFIG_MTRR)) mtrr_bp_init();
else
pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
pat_cpu_init();

} void cache_ap_init(void) @@ -2417,7 +2417,8 @@ void cache_ap_init(void) if (cache_aps_delayed_init) return;

mtrr_ap_init();
if (!mtrr_ap_init())
pat_ap_init_nomtrr();
}

So I'm reading this as: if it couldn't init AP's MTRRs, init its PAT.

But currently, the code sets the MTRRs for the delayed case or when the CPU is not online by doing ->set_all and in there it sets first MTRRs and then PAT.

I think the code above should simply try the two things, one after the other, independently from one another.

And I see you've added another stomp machine call for PAT only.

Now, what I think the design of all this should be, is:

you have a bunch of things you need to do at each point:

* cache_ap_init

* cache_aps_init

* ...

Now, in each those, you look at whether PAT or MTRR is supported and you do only those which are supported.

Also, the rendezvous handler should do:

if MTRR: do MTRR specific stuff

if PAT: do PAT specific stuff

This way you have clean definitions of what needs to happen when and you also do *only* the things that the platform supports, by keeping the proper order of operations - I believe MTRRs first and then PAT.

This way we'll get rid of that crazy maze of who calls what and when.

But first we need to define those points where stuff needs to happen and then for each point define what stuff needs to happen.

How does that sound?

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Juergen Gross

17 Aug 17 Aug

9:17 a.m.

New subject: [PATCH 3/3] x86: decouple pat and mtrr handling

On 19.07.22 17:15, Borislav Petkov wrote:

...

On Fri, Jul 15, 2022 at 04:25:49PM +0200, Juergen Gross wrote:

...
Today PAT is usable only with MTRR being active, with some nasty tweaks to make PAT usable when running as Xen PV guest, which doesn't support MTRR.

The reason for this coupling is, that both, PAT MSR changes and MTRR changes, require a similar sequence and so full PAT support was added using the already available MTRR handling.

Xen PV PAT handling can work without MTRR, as it just needs to consume the PAT MSR setting done by the hypervisor without the ability and need to change it. This in turn has resulted in a convoluted initialization sequence and wrong decisions regarding cache mode availability due to misguiding PAT availability flags.

Fix all of that by allowing to use PAT without MTRR and by adding an environment dependent PAT init function.

Aha, there's the explanation I was looking for.

...
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0a1bd14f7966..3edfb779dab5 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2408,8 +2408,8 @@ void __init cache_bp_init(void) { if (IS_ENABLED(CONFIG_MTRR)) mtrr_bp_init();
else
pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
pat_cpu_init(); }

void cache_ap_init(void) @@ -2417,7 +2417,8 @@ void cache_ap_init(void) if (cache_aps_delayed_init) return;

mtrr_ap_init();
if (!mtrr_ap_init())
pat_ap_init_nomtrr();
}
So I'm reading this as: if it couldn't init AP's MTRRs, init its PAT.

But currently, the code sets the MTRRs for the delayed case or when the CPU is not online by doing ->set_all and in there it sets first MTRRs and then PAT.

I think the code above should simply try the two things, one after the other, independently from one another.

And I see you've added another stomp machine call for PAT only.

Now, what I think the design of all this should be, is:

you have a bunch of things you need to do at each point:

cache_ap_init

cache_aps_init

...

Now, in each those, you look at whether PAT or MTRR is supported and you do only those which are supported.

Also, the rendezvous handler should do:

if MTRR: do MTRR specific stuff

if PAT: do PAT specific stuff

This way you have clean definitions of what needs to happen when and you also do *only* the things that the platform supports, by keeping the proper order of operations - I believe MTRRs first and then PAT.

This way we'll get rid of that crazy maze of who calls what and when.

But first we need to define those points where stuff needs to happen and then for each point define what stuff needs to happen.

How does that sound?

This asks for some more cleanup in the MTRR code:

mtrr_if->set_all() is the relevant callback, and it will only ever be called for the generic case (use_intel() == true), so I think we want to:

- remove the cyrix specific set_all() function - split the set_all() callback case from mtrr_rendezvous_handler() into a dedicated rendezvous handler - remove the set_all() member from struct mtrr_ops and directly call generic_set_all() from the new rendezvous handler - optional: rename use_intel() to use_generic(), or even introduce just a static bool for that purpose

Then the new rendezvous handler can be modified as you suggested.

Are you okay with that route?

Juergen

Chuck Zmudzinski

16 Jul 16 Jul

11:32 a.m.

On 7/15/2022 10:25 AM, Juergen Gross wrote:

...

Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT to be disabled. This leads to some drivers believing that not all cache modes are available, resulting in failures or degraded functionality.

The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all cpus the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code.

Fix all of that by:

moving the function needed by PAT from MTRR specific code one level up

adding a PAT indirection layer supporting the 3 cases "no or disabled PAT", "PAT under kernel control", and "PAT under Xen control"

removing the dependency of PAT on MTRR

Juergen Gross (3): x86: move some code out of arch/x86/kernel/cpu/mtrr x86: add wrapper functions for mtrr functions handling also pat x86: decouple pat and mtrr handling

arch/x86/include/asm/memtype.h | 13 ++- arch/x86/include/asm/mtrr.h | 27 ++++-- arch/x86/include/asm/processor.h | 10 +++ arch/x86/kernel/cpu/common.c | 123 +++++++++++++++++++++++++++- arch/x86/kernel/cpu/mtrr/generic.c | 90 ++------------------ arch/x86/kernel/cpu/mtrr/mtrr.c | 58 ++++--------- arch/x86/kernel/cpu/mtrr/mtrr.h | 1 - arch/x86/kernel/setup.c | 12 +-- arch/x86/kernel/smpboot.c | 8 +- arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++-------- arch/x86/power/cpu.c | 2 +- arch/x86/xen/enlighten_pv.c | 4 + 12 files changed, 289 insertions(+), 186 deletions(-)

This patch series seems related to the regression reported here on May 5, 2022:

https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

I am experiencing that regression and could test this patch on my system.

Can you confirm that with this patch series you are trying to fix that regression?

Chuck

Borislav Petkov

11:42 a.m.

On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote:

...

Can you confirm that with this patch series you are trying to fix that regression?

Yes, this patchset is aimed to fix the whole situation but please don't do anything yet - I need to find time and look at the whole approach before you can test it. Just be patient and we'll ping you when the time comes.

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Chuck Zmudzinski

17 Jul 17 Jul

4:06 a.m.

On 7/16/2022 7:42 AM, Borislav Petkov wrote:

...

On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote:

...
Can you confirm that with this patch series you are trying to fix that regression?

Yes, this patchset is aimed to fix the whole situation but please don't do anything yet - I need to find time and look at the whole approach before you can test it. Just be patient and we'll ping you when the time comes.

Thx.

I will wait until I get the ping before trying it.

Thanks,

Chuck

Chuck Zmudzinski

16 Jul 16 Jul

12:01 p.m.

On 7/16/2022 7:32 AM, Chuck Zmudzinski wrote:

...

On 7/15/2022 10:25 AM, Juergen Gross wrote:

...
Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT to be disabled. This leads to some drivers believing that not all cache modes are available, resulting in failures or degraded functionality.

The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all cpus the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code.

Fix all of that by:

moving the function needed by PAT from MTRR specific code one level up

adding a PAT indirection layer supporting the 3 cases "no or disabled PAT", "PAT under kernel control", and "PAT under Xen control"

removing the dependency of PAT on MTRR

Juergen Gross (3): x86: move some code out of arch/x86/kernel/cpu/mtrr x86: add wrapper functions for mtrr functions handling also pat x86: decouple pat and mtrr handling

arch/x86/include/asm/memtype.h | 13 ++- arch/x86/include/asm/mtrr.h | 27 ++++-- arch/x86/include/asm/processor.h | 10 +++ arch/x86/kernel/cpu/common.c | 123 +++++++++++++++++++++++++++- arch/x86/kernel/cpu/mtrr/generic.c | 90 ++------------------ arch/x86/kernel/cpu/mtrr/mtrr.c | 58 ++++--------- arch/x86/kernel/cpu/mtrr/mtrr.h | 1 - arch/x86/kernel/setup.c | 12 +-- arch/x86/kernel/smpboot.c | 8 +- arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++-------- arch/x86/power/cpu.c | 2 +- arch/x86/xen/enlighten_pv.c | 4 + 12 files changed, 289 insertions(+), 186 deletions(-)

This patch series seems related to the regression reported here on May 5, 2022:

I'm sorry, the date of that report was May 4, 2022, not May 5, 2022 - just to avoid any doubt about which regression I am referring to.

Chuck

...

https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

I am experiencing that regression

or a very similar regression that is caused by the same commit:

bdd8b6c98239cad ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

...

and could test this patch on my system.

Can you confirm that with this patch series you are trying to fix that regression?

Chuck

Chuck

Thorsten Leemhuis

17 Jul 17 Jul

7:55 a.m.

Hi Juergen!

On 15.07.22 16:25, Juergen Gross wrote:

...

Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT to be disabled. This leads to some drivers believing that not all cache modes are available, resulting in failures or degraded functionality.

The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all cpus the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code.

Fix all of that by:

moving the function needed by PAT from MTRR specific code one level up

adding a PAT indirection layer supporting the 3 cases "no or disabled PAT", "PAT under kernel control", and "PAT under Xen control"

removing the dependency of PAT on MTRR

Thx for working on this. If you need to respin these patches for one reason or another, could you do me a favor and add proper 'Link:' tags pointing to all reports about this issue? e.g. like this:

Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

These tags are considered important by Linus[1] and others, as they allow anyone to look into the backstory weeks or years from now. That is why they should be placed in cases like this, as Documentation/process/submitting-patches.rst and Documentation/process/5.Posting.rst explain in more detail. I care personally, because these tags make my regression tracking efforts a whole lot easier, as they allow my tracking bot 'regzbot' to automatically connect reports with patches posted or committed to fix tracked regressions.

[1] see for example: https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-Dgvd... https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD... https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_D...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.

BTW, let me tell regzbot to monitor this thread:

#regzbot ^backmonitor: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

Chuck Zmudzinski

18 Jul 18 Jul

11:32 a.m.

On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote:

...

Hi Juergen!

On 15.07.22 16:25, Juergen Gross wrote:

...
Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT to be disabled. This leads to some drivers believing that not all cache modes are available, resulting in failures or degraded functionality.

The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all cpus the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code.

Fix all of that by:

moving the function needed by PAT from MTRR specific code one level up

adding a PAT indirection layer supporting the 3 cases "no or disabled PAT", "PAT under kernel control", and "PAT under Xen control"

removing the dependency of PAT on MTRR

Thx for working on this. If you need to respin these patches for one reason or another, could you do me a favor and add proper 'Link:' tags pointing to all reports about this issue? e.g. like this:

Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

These tags are considered important by Linus[1] and others, as they allow anyone to look into the backstory weeks or years from now. That is why they should be placed in cases like this, as Documentation/process/submitting-patches.rst and Documentation/process/5.Posting.rst explain in more detail. I care personally, because these tags make my regression tracking efforts a whole lot easier, as they allow my tracking bot 'regzbot' to automatically connect reports with patches posted or committed to fix tracked regressions.

[1] see for example: https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-Dgvd... https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD... https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_D...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

I echo Thorsten's thx for starting on this now instead of waiting until September which I think is when Juergen said he could start working on this last week. I agree with Thorsten that Link tags are needed. Since multiple patches have been proposed to fix this regression, perhaps a Link to each proposed patch, and a note that the original report identified a specific commit which when reverted also fixes it. IMO, this is all part of the backstory Thorsten refers to.

It looks like with this approach, a fix will not be coming real soon, and Borislav Petkov also discouraged me from testing this patch set until I receive a ping telling me it is ready for testing, which seems to confirm that this regression will not be fixed very soon. Please correct me if I am wrong about how long it will take to fix it with this approach.

Also, is there any guarantee this approach is endorsed by all the maintainers who will need to sign-off, especially Linus? I say this because some of the discussion on the earlier proposed patches makes me doubt this. I am especially referring to this discussion:

https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@leemhuis.i...

and also, here:

https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/

where Borislav Petkov argues that Linux should not be patched at all to fix this regression but instead the fix should come by patching the Xen hypervisor.

So I have several questions, presuming at least the fix is going to be delayed for some time, and also presuming this approach is not yet an approach that has the blessing of the maintainers who will need to sign-off:

1. Can you estimate when the patch series will be ready for testing and suitable for a prepatch or RC release?

2. Can you estimate when the patch series will be ready to be merged into the mainline release? Is there any hope it will be fixed before the next longterm release hosted on kernel.org?

3. Since a fix is likely not coming soon, can you explain why the commit that was mentioned in the original report cannot be reverted as a temporary solution while we wait for the full fix to come later? I can say that reverting that commit (It was a commit affecting drm/i915) does fix the issue on my system with no negative side effects at all. In such a case, it seems contrary to Linus' regression rule to not revert the offending commit, even if reverting the offending commit is not going to be the final solution. IOW, I am trying to argue that an important corollary to the Linus regression rule is that we revert commits that introduce regressions, especially when there are no negative effects when reverting the offending commit. Why are we not doing that in this case?

4. Can you explain why this patch series is superior to the other proposed patches that are much more simple and have been reported to fix the regression?

5. This approach seems way too aggressive for backporting to the stable releases. Is that correct? Or, will the patches be backported to the stable releases? I was told that backports to the stable releases are needed to keep things consistent across all the supported versions when I submitted a patch to fix this regression that identified a specific five year old commit that my proposed patch would fix.

Remember, this is a regression that is really bothering people now. For example, I am now in a position where I cannot install the updates of the Linux kernel that Debian pushes out to me without patching the kernel with my own private build that has one of the known fixes that have already been identified as ways to workaround this regression while we wait for the full solution that will hopefully come later.

Chuck

...

P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.

BTW, let me tell regzbot to monitor this thread:

#regzbot ^backmonitor: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

Chuck Zmudzinski

19 Jul 19 Jul

1:16 p.m.

On 7/18/2022 7:32 AM, Chuck Zmudzinski wrote:

...

On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote:

...
Hi Juergen!

On 15.07.22 16:25, Juergen Gross wrote:

...
Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT to be disabled. This leads to some drivers believing that not all cache modes are available, resulting in failures or degraded functionality.

The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all cpus the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code.

Fix all of that by:

moving the function needed by PAT from MTRR specific code one level up

adding a PAT indirection layer supporting the 3 cases "no or disabled PAT", "PAT under kernel control", and "PAT under Xen control"

removing the dependency of PAT on MTRR

Thx for working on this. If you need to respin these patches for one reason or another, could you do me a favor and add proper 'Link:' tags pointing to all reports about this issue? e.g. like this:

Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

These tags are considered important by Linus[1] and others, as they allow anyone to look into the backstory weeks or years from now. That is why they should be placed in cases like this, as Documentation/process/submitting-patches.rst and Documentation/process/5.Posting.rst explain in more detail. I care personally, because these tags make my regression tracking efforts a whole lot easier, as they allow my tracking bot 'regzbot' to automatically connect reports with patches posted or committed to fix tracked regressions.

[1] see for example: https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-Dgvd... https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD... https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_D...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

I echo Thorsten's thx for starting on this now instead of waiting until September which I think is when Juergen said he could start working on this last week. I agree with Thorsten that Link tags are needed. Since multiple patches have been proposed to fix this regression, perhaps a Link to each proposed patch, and a note that the original report identified a specific commit which when reverted also fixes it. IMO, this is all part of the backstory Thorsten refers to.

It looks like with this approach, a fix will not be coming real soon, and Borislav Petkov also discouraged me from testing this patch set until I receive a ping telling me it is ready for testing, which seems to confirm that this regression will not be fixed very soon. Please correct me if I am wrong about how long it will take to fix it with this approach.

Also, is there any guarantee this approach is endorsed by all the maintainers who will need to sign-off, especially Linus? I say this because some of the discussion on the earlier proposed patches makes me doubt this. I am especially referring to this discussion:

https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@leemhuis.i...

and also, here:

https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/

where Borislav Petkov argues that Linux should not be patched at all to fix this regression but instead the fix should come by patching the Xen hypervisor.

So I have several questions, presuming at least the fix is going to be delayed for some time, and also presuming this approach is not yet an approach that has the blessing of the maintainers who will need to sign-off:

Can you estimate when the patch series will be ready for

testing and suitable for a prepatch or RC release?

Can you estimate when the patch series will be ready to be

merged into the mainline release? Is there any hope it will be fixed before the next longterm release hosted on kernel.org?

Since a fix is likely not coming soon, can you explain

why the commit that was mentioned in the original report cannot be reverted as a temporary solution while we wait for the full fix to come later? I can say that reverting that commit (It was a commit affecting drm/i915) does fix the issue on my system with no negative side effects at all. In such a case, it seems contrary to Linus' regression rule to not revert the offending commit, even if reverting the offending commit is not going to be the final solution. IOW, I am trying to argue that an important corollary to the Linus regression rule is that we revert commits that introduce regressions, especially when there are no negative effects when reverting the offending commit. Why are we not doing that in this case?

Can you explain why this patch series is superior

to the other proposed patches that are much more simple and have been reported to fix the regression?

This approach seems way too aggressive for backporting

to the stable releases. Is that correct? Or, will the patches be backported to the stable releases? I was told that backports to the stable releases are needed to keep things consistent across all the supported versions when I submitted a patch to fix this regression that identified a specific five year old commit that my proposed patch would fix.

Remember, this is a regression that is really bothering people now. For example, I am now in a position where I cannot install the updates of the Linux kernel that Debian pushes out to me without patching the kernel with my own private build that has one of the known fixes that have already been identified as ways to workaround this regression while we wait for the full solution that will hopefully come later.

Chuck

...
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.

BTW, let me tell regzbot to monitor this thread:

#regzbot ^backmonitor: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

OK, the comments Boris made on the individual patches of this patch set answers most of my questions. Thx, Boris.

Chuck

1171

days inactive

1204

days old

linux-stable-mirror@lists.linaro.org

14 comments

participants

tags (0)

participants (5)

Borislav Petkov
Chuck Zmudzinski
Juergen Gross
Rafael J. Wysocki
Thorsten Leemhuis