From: Kunwu Chan chentao@kylinos.cn
[ Upstream commit f46c8a75263f97bda13c739ba1c90aced0d3b071 ]
kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Ensure the allocation was successful by checking the pointer validity.
Suggested-by: Christophe Leroy christophe.leroy@csgroup.eu Suggested-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Kunwu Chan chentao@kylinos.cn Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20231204023223.2447523-1-chentao@kylinos.cn Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/init-common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c index a84da92920f7..e7b9cc90fd9e 100644 --- a/arch/powerpc/mm/init-common.c +++ b/arch/powerpc/mm/init-common.c @@ -104,7 +104,7 @@ void pgtable_cache_add(unsigned int shift) * as to leave enough 0 bits in the address to contain it. */ unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1, HUGEPD_SHIFT_MASK + 1); - struct kmem_cache *new; + struct kmem_cache *new = NULL;
/* It would be nice if this was a BUILD_BUG_ON(), but at the * moment, gcc doesn't seem to recognize is_power_of_2 as a @@ -117,7 +117,8 @@ void pgtable_cache_add(unsigned int shift)
align = max_t(unsigned long, align, minalign); name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift); - new = kmem_cache_create(name, table_size, align, 0, ctor(shift)); + if (name) + new = kmem_cache_create(name, table_size, align, 0, ctor(shift)); if (!new) panic("Could not allocate pgtable cache for order %d", shift);
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit f8d3555355653848082c351fa90775214fb8a4fa ]
With CONFIG_GENERIC_BUG=n the build fails with:
arch/powerpc/kernel/traps.c:1442:5: error: no previous prototype for ‘is_valid_bugaddr’ [-Werror=missing-prototypes] 1442 | int is_valid_bugaddr(unsigned long addr) | ^~~~~~~~~~~~~~~~
The prototype is only defined, and the function is only needed, when CONFIG_GENERIC_BUG=y, so move the implementation under that.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20231130114433.3053544-2-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/traps.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 70b99246dec4..402a05f3a484 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1424,10 +1424,12 @@ static int emulate_instruction(struct pt_regs *regs) return -EINVAL; }
+#ifdef CONFIG_GENERIC_BUG int is_valid_bugaddr(unsigned long addr) { return is_kernel_addr(addr); } +#endif
#ifdef CONFIG_MATH_EMULATION static int emulate_math(struct pt_regs *regs)
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit d8c3f243d4db24675b653f0568bb65dae34e6455 ]
With NUMA=n and FA_DUMP=y or PRESERVE_FA_DUMP=y the build fails with:
arch/powerpc/kernel/fadump.c:1739:22: error: no previous prototype for ‘arch_reserved_kernel_pages’ [-Werror=missing-prototypes] 1739 | unsigned long __init arch_reserved_kernel_pages(void) | ^~~~~~~~~~~~~~~~~~~~~~~~~~
The prototype for arch_reserved_kernel_pages() is in include/linux/mm.h, but it's guarded by __HAVE_ARCH_RESERVED_KERNEL_PAGES. The powerpc headers define __HAVE_ARCH_RESERVED_KERNEL_PAGES in asm/mmzone.h, which is not included into the generic headers when NUMA=n.
Move the definition of __HAVE_ARCH_RESERVED_KERNEL_PAGES into asm/mmu.h which is included regardless of NUMA=n.
Additionally the ifdef around __HAVE_ARCH_RESERVED_KERNEL_PAGES needs to also check for CONFIG_PRESERVE_FA_DUMP.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20231130114433.3053544-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/mmu.h | 4 ++++ arch/powerpc/include/asm/mmzone.h | 3 --- 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 0699cfeeb8c9..2141ae7441a9 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -375,5 +375,9 @@ extern void *abatron_pteptrs[2]; #include <asm/nohash/mmu.h> #endif
+#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP) +#define __HAVE_ARCH_RESERVED_KERNEL_PAGES +#endif + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_MMU_H_ */ diff --git a/arch/powerpc/include/asm/mmzone.h b/arch/powerpc/include/asm/mmzone.h index 91c69ff53a8a..50c2198c2c76 100644 --- a/arch/powerpc/include/asm/mmzone.h +++ b/arch/powerpc/include/asm/mmzone.h @@ -42,9 +42,6 @@ u64 memory_hotplug_max(void); #else #define memory_hotplug_max() memblock_end_of_DRAM() #endif /* CONFIG_NEED_MULTIPLE_NODES */ -#ifdef CONFIG_FA_DUMP -#define __HAVE_ARCH_RESERVED_KERNEL_PAGES -#endif
#endif /* __KERNEL__ */ #endif /* _ASM_MMZONE_H_ */
From: Stephen Rothwell sfr@canb.auug.org.au
[ Upstream commit 0d555b57ee660d8a871781c0eebf006e855e918d ]
The linux-next build of powerpc64 allnoconfig fails with:
arch/powerpc/mm/book3s64/pgtable.c:557:5: error: no previous prototype for 'pmd_move_must_withdraw' 557 | int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, | ^~~~~~~~~~~~~~~~~~~~~~
Caused by commit:
c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
Fix it by moving the function definition under CONFIG_TRANSPARENT_HUGEPAGE like the prototype. The function is only called when CONFIG_TRANSPARENT_HUGEPAGE=y.
Signed-off-by: Stephen Rothwell sfr@canb.auug.org.au [mpe: Flesh out change log from linux-next patch] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20231127132809.45c2b398@canb.auug.org.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/book3s64/pgtable.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 2bf7e1b4fd82..c4890f4b0b6c 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -442,6 +442,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, set_pte_at(vma->vm_mm, addr, ptep, pte); }
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * For hash translation mode, we use the deposited table to store hash slot * information and they are stored at PTRS_PER_PMD offset from related pmd @@ -463,6 +464,7 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
return true; } +#endif
/* * Does the CPU support tlbie?
From: Naveen N Rao naveen@kernel.org
[ Upstream commit 8f9abaa6d7de0a70fc68acaedce290c1f96e2e59 ]
Some of the fp/vmx code in sstep.c assume a certain maximum size for the instructions being emulated. The size of those operations however is determined separately in analyse_instr().
Add a check to validate the assumption on the maximum size of the operations, so as to prevent any unintended kernel stack corruption.
Signed-off-by: Naveen N Rao naveen@kernel.org Reviewed-by: Gustavo A. R. Silva gustavoars@kernel.org Build-tested-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20231123071705.397625-1-naveen@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/lib/sstep.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 27650cd5857f..490162fb0893 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -469,6 +469,8 @@ static int do_fp_load(struct instruction_op *op, unsigned long ea, } u;
nb = GETSIZE(op->type); + if (nb > sizeof(u)) + return -EINVAL; if (!address_ok(regs, ea, nb)) return -EFAULT; rn = op->reg; @@ -519,6 +521,8 @@ static int do_fp_store(struct instruction_op *op, unsigned long ea, } u;
nb = GETSIZE(op->type); + if (nb > sizeof(u)) + return -EINVAL; if (!address_ok(regs, ea, nb)) return -EFAULT; rn = op->reg; @@ -563,6 +567,9 @@ static nokprobe_inline int do_vec_load(int rn, unsigned long ea, u8 b[sizeof(__vector128)]; } u = {};
+ if (size > sizeof(u)) + return -EINVAL; + if (!address_ok(regs, ea & ~0xfUL, 16)) return -EFAULT; /* align to multiple of size */ @@ -590,6 +597,9 @@ static nokprobe_inline int do_vec_store(int rn, unsigned long ea, u8 b[sizeof(__vector128)]; } u;
+ if (size > sizeof(u)) + return -EINVAL; + if (!address_ok(regs, ea & ~0xfUL, 16)) return -EFAULT; /* align to multiple of size */
From: "Borislav Petkov (AMD)" bp@alien8.de
[ Upstream commit 04c3024560d3a14acd18d0a51a1d0a89d29b7eb5 ]
AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there.
There will be a CPUID bit which explicitly states that a MFENCE is not needed. Once that bit is added to the APM, this will be extended with it.
While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day.
Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham Kishon.VijayAbraham@amd.com:
On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency.
In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done:
1) Disable Power States in BIOS (to prevent the system from going to lower power state)
2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more)
With the above configuration:
*) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU]
Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously]
*) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU]
Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously]
From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI().
With the fix to skip weak_wrmsr_fence()
*) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU]
Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously]
Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%.
Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU.
Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s
Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s
[1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/barrier.h | 18 ------------------ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/processor.h | 18 ++++++++++++++++++ arch/x86/kernel/cpu/amd.c | 3 +++ arch/x86/kernel/cpu/common.c | 7 +++++++ arch/x86/kernel/cpu/hygon.c | 3 +++ 6 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 4819d5e5a335..7f828fe49797 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -84,22 +84,4 @@ do { \
#include <asm-generic/barrier.h>
-/* - * Make previous memory operations globally visible before - * a WRMSR. - * - * MFENCE makes writes visible, but only affects load/store - * instructions. WRMSR is unfortunately not a load/store - * instruction and is unaffected by MFENCE. The LFENCE ensures - * that the WRMSR is not reordered. - * - * Most WRMSRs are full serializing instructions themselves and - * do not require this barrier. This is only required for the - * IA32_TSC_DEADLINE and X2APIC MSRs. - */ -static inline void weak_wrmsr_fence(void) -{ - asm volatile("mfence; lfence" : : : "memory"); -} - #endif /* _ASM_X86_BARRIER_H */ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index f42286e9a2b1..16ee2bd3ef47 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -291,6 +291,7 @@ #define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */ #define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */ #define X86_FEATURE_MSR_TSX_CTRL (11*32+18) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ +#define X86_FEATURE_APIC_MSRS_FENCE (11*32+27) /* "" IA32_TSC_DEADLINE and X2APIC MSRs need fencing */
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 9cbd86cf0deb..4c73655cd460 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -988,4 +988,22 @@ enum taa_mitigations {
extern bool gds_ucode_mitigated(void);
+/* + * Make previous memory operations globally visible before + * a WRMSR. + * + * MFENCE makes writes visible, but only affects load/store + * instructions. WRMSR is unfortunately not a load/store + * instruction and is unaffected by MFENCE. The LFENCE ensures + * that the WRMSR is not reordered. + * + * Most WRMSRs are full serializing instructions themselves and + * do not require this barrier. This is only required for the + * IA32_TSC_DEADLINE and X2APIC MSRs. + */ +static inline void weak_wrmsr_fence(void) +{ + alternative("mfence; lfence", "", ALT_NOT(X86_FEATURE_APIC_MSRS_FENCE)); +} + #endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index b2cdf1c07e56..3a9a44afa773 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -1125,6 +1125,9 @@ static void init_amd(struct cpuinfo_x86 *c) if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && cpu_has_amd_erratum(c, amd_erratum_1485)) msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT); + + /* AMD CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */ + clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); }
#ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 1592f309c3c1..dc531a1ac34a 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1639,6 +1639,13 @@ static void identify_cpu(struct cpuinfo_x86 *c) c->apicid = apic->phys_pkg_id(c->initial_apicid, 0); #endif
+ + /* + * Set default APIC and TSC_DEADLINE MSR fencing flag. AMD and + * Hygon will clear it in ->c_init() below. + */ + set_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); + /* * Vendor-specific initialization. In this section we * canonicalize the feature flags, meaning if there are diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c index 43586b7586c0..530c4d112dc0 100644 --- a/arch/x86/kernel/cpu/hygon.c +++ b/arch/x86/kernel/cpu/hygon.c @@ -356,6 +356,9 @@ static void init_hygon(struct cpuinfo_x86 *c) set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
check_null_seg_clears_base(c); + + /* Hygon CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */ + clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); }
static void cpu_detect_tlb_hygon(struct cpuinfo_x86 *c)
From: Zhiquan Li zhiquan1.li@intel.com
[ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ]
Memory errors don't happen very often, especially fatal ones. However, in large-scale scenarios such as data centers, that probability increases with the amount of machines present.
When a fatal machine check happens, mce_panic() is called based on the severity grading of that error. The page containing the error is not marked as poison.
However, when kexec is enabled, tools like makedumpfile understand when pages are marked as poison and do not touch them so as not to cause a fatal machine check exception again while dumping the previous kernel's memory.
Therefore, mark the page containing the error as poisoned so that the kexec'ed kernel can avoid accessing the page.
[ bp: Rewrite commit message and comment. ]
Co-developed-by: Youquan Song youquan.song@intel.com Signed-off-by: Youquan Song youquan.song@intel.com Signed-off-by: Zhiquan Li zhiquan1.li@intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Naoya Horiguchi naoya.horiguchi@nec.com Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/mce/core.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 9b98a7d8ac60..84c0e5c2518c 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -42,6 +42,7 @@ #include <linux/export.h> #include <linux/jump_label.h> #include <linux/set_memory.h> +#include <linux/kexec.h>
#include <asm/intel-family.h> #include <asm/processor.h> @@ -315,6 +316,7 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) struct llist_node *pending; struct mce_evt_llist *l; int apei_err = 0; + struct page *p;
/* * Allow instrumentation around external facilities usage. Not that it @@ -370,6 +372,20 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) if (!fake_panic) { if (panic_timeout == 0) panic_timeout = mca_cfg.panic_timeout; + + /* + * Kdump skips the poisoned page in order to avoid + * touching the error bits again. Poison the page even + * if the error is fatal and the machine is about to + * panic. + */ + if (kexec_crash_loaded()) { + if (final && (final->status & MCI_STATUS_ADDRV)) { + p = pfn_to_online_page(final->addr >> PAGE_SHIFT); + if (p) + SetPageHWPoison(p); + } + } panic(msg); } else pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
linux-stable-mirror@lists.linaro.org