February 2025 - Linux-stable-mirror

FAILED: patch "[PATCH] x86/sev: Fix broken SNP support with KVM module built-in" failed to apply to 6.13-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.13-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y git checkout FETCH_HEAD git cherry-pick -x 409f45387c937145adeeeebc6d6032c2ec232b35 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021856-heave-blade-985e@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 409f45387c937145adeeeebc6d6032c2ec232b35 Mon Sep 17 00:00:00 2001 From: Ashish Kalra <ashish.kalra(a)amd.com> Date: Mon, 10 Feb 2025 22:54:18 +0000 Subject: [PATCH] x86/sev: Fix broken SNP support with KVM module built-in Fix issues with enabling SNP host support and effectively SNP support which is broken with respect to the KVM module being built-in. SNP host support is enabled in snp_rmptable_init() which is invoked as device_initcall(). SNP check on IOMMU is done during IOMMU PCI init (IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is currently invoked via device_initcall() and cannot be invoked via subsys_initcall() as core IOMMU subsystem gets initialized via subsys_initcall(). Now, if kvm_amd module is built-in, it gets initialized before SNP host support is enabled in snp_rmptable_init() : [ 10.131811] kvm_amd: TSC scaling supported [ 10.136384] kvm_amd: Nested Virtualization enabled [ 10.141734] kvm_amd: Nested Paging enabled [ 10.146304] kvm_amd: LBR virtualization supported [ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509) [ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99) [ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99) [ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported [ 10.177052] kvm_amd: Virtual GIF supported ... ... [ 10.201648] kvm_amd: in svm_enable_virtualization_cpu And then svm_x86_ops->enable_virtualization_cpu() (svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following: wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa); So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs. snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu() as following : ... [ 11.256138] kvm_amd: in svm_enable_virtualization_cpu ... [ 11.264918] SEV-SNP: in snp_rmptable_init This triggers a #GP exception in snp_rmptable_init() when snp_enable() is invoked to set SNP_EN in SYSCFG MSR: [ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30) ... [ 11.294404] Call Trace: [ 11.294482] <IRQ> [ 11.294513] ? show_stack_regs+0x26/0x30 [ 11.294522] ? ex_handler_msr+0x10f/0x180 [ 11.294529] ? search_extable+0x2b/0x40 [ 11.294538] ? fixup_exception+0x2dd/0x340 [ 11.294542] ? exc_general_protection+0x14f/0x440 [ 11.294550] ? asm_exc_general_protection+0x2b/0x30 [ 11.294557] ? __pfx_snp_enable+0x10/0x10 [ 11.294567] ? native_write_msr+0x8/0x30 [ 11.294570] ? __snp_enable+0x5d/0x70 [ 11.294575] snp_enable+0x19/0x20 [ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0 [ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20 [ 11.294589] __sysvec_call_function+0x20/0x90 [ 11.294596] sysvec_call_function+0x80/0xb0 [ 11.294601] </IRQ> [ 11.294603] <TASK> [ 11.294605] asm_sysvec_call_function+0x1f/0x30 ... [ 11.294631] arch_cpu_idle+0xd/0x20 [ 11.294633] default_idle_call+0x34/0xd0 [ 11.294636] do_idle+0x1f1/0x230 [ 11.294643] ? complete+0x71/0x80 [ 11.294649] cpu_startup_entry+0x30/0x40 [ 11.294652] start_secondary+0x12d/0x160 [ 11.294655] common_startup_64+0x13e/0x141 [ 11.294662] </TASK> This #GP exception is getting triggered due to the following errata for AMD family 19h Models 10h-1Fh Processors: Processor may generate spurious #GP(0) Exception on WRMSR instruction: Description: The Processor will generate a spurious #GP(0) Exception on a WRMSR instruction if the following conditions are all met: - the target of the WRMSR is a SYSCFG register. - the write changes the value of SYSCFG.SNPEn from 0 to 1. - One of the threads that share the physical core has a non-zero value in the VM_HSAVE_PA MSR. The document being referred to above: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revisi… To summarize, with kvm_amd module being built-in, KVM/SVM initialization happens before host SNP is enabled and this SVM initialization sets VM_HSAVE_PA to non-zero, which then triggers a #GP when SYSCFG.SNPEn is being set and this will subsequently cause SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not set on all CPUs. Essentially SNP host enabling code should be invoked before KVM initialization, which is currently not the case when KVM is built-in. Add fix to call snp_rmptable_init() early from iommu_snp_enable() directly and not invoked via device_initcall() which enables SNP host support before KVM initialization with kvm_amd module built-in. Add additional handling for `iommu=off` or `amd_iommu=off` options. Note that IOMMUs need to be enabled for SNP initialization, therefore, if host SNP support is enabled but late IOMMU initialization fails then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity checks in SNP firmware will fail with invalid configuration error as below: [ 9.723114] ccp 0000:23:00.1: sev enabled [ 9.727602] ccp 0000:23:00.1: psp enabled [ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002) [ 9.739098] ccp 0000:a2:00.1: no command queues available [ 9.745167] ccp 0000:a2:00.1: psp enabled [ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3 [ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5 Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Co-developed-by: Sean Christopherson <seanjc(a)google.com> Signed-off-by: Sean Christopherson <seanjc(a)google.com> Co-developed-by: Vasant Hegde <vasant.hegde(a)amd.com> Signed-off-by: Vasant Hegde <vasant.hegde(a)amd.com> Cc: <Stable(a)vger.kernel.org> Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com> Acked-by: Joerg Roedel <jroedel(a)suse.de> Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra(a)amd.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 5d9685f92e5c..1581246491b5 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -531,6 +531,7 @@ static inline void __init snp_secure_tsc_init(void) { } #ifdef CONFIG_KVM_AMD_SEV bool snp_probe_rmptable_info(void); +int snp_rmptable_init(void); int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level); void snp_dump_hva_rmpentry(unsigned long address); int psmash(u64 pfn); @@ -541,6 +542,7 @@ void kdump_sev_callback(void); void snp_fixup_e820_tables(void); #else static inline bool snp_probe_rmptable_info(void) { return false; } +static inline int snp_rmptable_init(void) { return -ENOSYS; } static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; } static inline void snp_dump_hva_rmpentry(unsigned long address) {} static inline int psmash(u64 pfn) { return -ENODEV; } diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c index 1dcc027ec77e..42e74a5a7d78 100644 --- a/arch/x86/virt/svm/sev.c +++ b/arch/x86/virt/svm/sev.c @@ -505,19 +505,19 @@ static bool __init setup_rmptable(void) * described in the SNP_INIT_EX firmware command description in the SNP * firmware ABI spec. */ -static int __init snp_rmptable_init(void) +int __init snp_rmptable_init(void) { unsigned int i; u64 val; - if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) - return 0; + if (WARN_ON_ONCE(!cc_platform_has(CC_ATTR_HOST_SEV_SNP))) + return -ENOSYS; - if (!amd_iommu_snp_en) - goto nosnp; + if (WARN_ON_ONCE(!amd_iommu_snp_en)) + return -ENOSYS; if (!setup_rmptable()) - goto nosnp; + return -ENOSYS; /* * Check if SEV-SNP is already enabled, this can happen in case of @@ -530,7 +530,7 @@ static int __init snp_rmptable_init(void) /* Zero out the RMP bookkeeping area */ if (!clear_rmptable_bookkeeping()) { free_rmp_segment_table(); - goto nosnp; + return -ENOSYS; } /* Zero out the RMP entries */ @@ -562,17 +562,8 @@ static int __init snp_rmptable_init(void) crash_kexec_post_notifiers = true; return 0; - -nosnp: - cc_platform_clear(CC_ATTR_HOST_SEV_SNP); - return -ENOSYS; } -/* - * This must be called after the IOMMU has been initialized. - */ -device_initcall(snp_rmptable_init); - static void set_rmp_segment_info(unsigned int segment_shift) { rmp_segment_shift = segment_shift; diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index c5cd92edada0..2fecfed75e54 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3194,7 +3194,7 @@ static bool __init detect_ivrs(void) return true; } -static void iommu_snp_enable(void) +static __init void iommu_snp_enable(void) { #ifdef CONFIG_KVM_AMD_SEV if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) @@ -3219,6 +3219,14 @@ static void iommu_snp_enable(void) goto disable_snp; } + /* + * Enable host SNP support once SNP support is checked on IOMMU. + */ + if (snp_rmptable_init()) { + pr_warn("SNP: RMP initialization failed, SNP cannot be supported.\n"); + goto disable_snp; + } + pr_info("IOMMU SNP support enabled.\n"); return; @@ -3318,6 +3326,19 @@ static int __init iommu_go_to_state(enum iommu_init_state state) ret = state_next(); } + /* + * SNP platform initilazation requires IOMMUs to be fully configured. + * If the SNP support on IOMMUs has NOT been checked, simply mark SNP + * as unsupported. If the SNP support on IOMMUs has been checked and + * host SNP support enabled but RMP enforcement has not been enabled + * in IOMMUs, then the system is in a half-baked state, but can limp + * along as all memory should be Hypervisor-Owned in the RMP. WARN, + * but leave SNP as "supported" to avoid confusing the kernel. + */ + if (ret && cc_platform_has(CC_ATTR_HOST_SEV_SNP) && + !WARN_ON_ONCE(amd_iommu_snp_en)) + cc_platform_clear(CC_ATTR_HOST_SEV_SNP); + return ret; } @@ -3426,18 +3447,23 @@ void __init amd_iommu_detect(void) int ret; if (no_iommu || (iommu_detected && !gart_iommu_aperture)) - return; + goto disable_snp; if (!amd_iommu_sme_check()) - return; + goto disable_snp; ret = iommu_go_to_state(IOMMU_IVRS_DETECTED); if (ret) - return; + goto disable_snp; amd_iommu_detected = true; iommu_detected = 1; x86_init.iommu.iommu_init = amd_iommu_init; + return; + +disable_snp: + if (cc_platform_has(CC_ATTR_HOST_SEV_SNP)) + cc_platform_clear(CC_ATTR_HOST_SEV_SNP); } /****************************************************************************

6 months

1
0
0 0

FAILED: patch "[PATCH] KVM: nSVM: Enter guest mode before initializing nested NPT" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021837-volumes-twig-2514@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Wed, 29 Jan 2025 17:08:25 -0800 Subject: [PATCH] KVM: nSVM: Enter guest mode before initializing nested NPT MMU When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest mode prior to initializing the MMU for nested NPT so that guest_mode is set in the MMU's role. KVM's model is that all L2 MMUs are tagged with guest_mode, as the behavior of hypervisor MMUs tends to be significantly different than kernel MMUs. Practically speaking, the bug is relatively benign, as KVM only directly queries role.guest_mode in kvm_mmu_free_guest_mode_roots() and kvm_mmu_page_ad_need_write_protect(), which SVM doesn't use, and in paths that are optimizations (mmu_page_zap_pte() and shadow_mmu_try_split_huge_pages()). And while the role is incorprated into shadow page usage, because nested NPT requires KVM to be using NPT for L1, reusing shadow pages across L1 and L2 is impossible as L1 MMUs will always have direct=1, while L2 MMUs will have direct=0. Hoist the TLB processing and setting of HF_GUEST_MASK to the beginning of the flow instead of forcing guest_mode in the MMU, as nothing in nested_vmcb02_prepare_control() between the old and new locations touches TLB flush requests or HF_GUEST_MASK, i.e. there's no reason to present inconsistent vCPU state to the MMU. Fixes: 69cb877487de ("KVM: nSVM: move MMU setup to nested_prepare_vmcb_control") Cc: stable(a)vger.kernel.org Reported-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> Link: https://lore.kernel.org/r/20250130010825.220346-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 74c20dbb92da..d4ac4a1f8b81 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5540,7 +5540,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, union kvm_mmu_page_role root_role; /* NPT requires CR0.PG=1. */ - WARN_ON_ONCE(cpu_role.base.direct); + WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode); root_role = cpu_role.base; root_role.level = kvm_mmu_get_tdp_level(vcpu); diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index d77b094d9a4d..04c375bf1ac2 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -646,6 +646,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, u32 pause_count12; u32 pause_thresh12; + nested_svm_transition_tlb_flush(vcpu); + + /* Enter Guest-Mode */ + enter_guest_mode(vcpu); + /* * Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2, * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes. @@ -762,11 +767,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, } } - nested_svm_transition_tlb_flush(vcpu); - - /* Enter Guest-Mode */ - enter_guest_mode(vcpu); - /* * Merge guest and host intercepts - must be called with vcpu in * guest-mode to take effect.

6 months

1
0
0 0

FAILED: patch "[PATCH] KVM: nSVM: Enter guest mode before initializing nested NPT" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021824-strict-motivator-41ae@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Wed, 29 Jan 2025 17:08:25 -0800 Subject: [PATCH] KVM: nSVM: Enter guest mode before initializing nested NPT MMU When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest mode prior to initializing the MMU for nested NPT so that guest_mode is set in the MMU's role. KVM's model is that all L2 MMUs are tagged with guest_mode, as the behavior of hypervisor MMUs tends to be significantly different than kernel MMUs. Practically speaking, the bug is relatively benign, as KVM only directly queries role.guest_mode in kvm_mmu_free_guest_mode_roots() and kvm_mmu_page_ad_need_write_protect(), which SVM doesn't use, and in paths that are optimizations (mmu_page_zap_pte() and shadow_mmu_try_split_huge_pages()). And while the role is incorprated into shadow page usage, because nested NPT requires KVM to be using NPT for L1, reusing shadow pages across L1 and L2 is impossible as L1 MMUs will always have direct=1, while L2 MMUs will have direct=0. Hoist the TLB processing and setting of HF_GUEST_MASK to the beginning of the flow instead of forcing guest_mode in the MMU, as nothing in nested_vmcb02_prepare_control() between the old and new locations touches TLB flush requests or HF_GUEST_MASK, i.e. there's no reason to present inconsistent vCPU state to the MMU. Fixes: 69cb877487de ("KVM: nSVM: move MMU setup to nested_prepare_vmcb_control") Cc: stable(a)vger.kernel.org Reported-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> Link: https://lore.kernel.org/r/20250130010825.220346-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 74c20dbb92da..d4ac4a1f8b81 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5540,7 +5540,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, union kvm_mmu_page_role root_role; /* NPT requires CR0.PG=1. */ - WARN_ON_ONCE(cpu_role.base.direct); + WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode); root_role = cpu_role.base; root_role.level = kvm_mmu_get_tdp_level(vcpu); diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index d77b094d9a4d..04c375bf1ac2 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -646,6 +646,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, u32 pause_count12; u32 pause_thresh12; + nested_svm_transition_tlb_flush(vcpu); + + /* Enter Guest-Mode */ + enter_guest_mode(vcpu); + /* * Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2, * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes. @@ -762,11 +767,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, } } - nested_svm_transition_tlb_flush(vcpu); - - /* Enter Guest-Mode */ - enter_guest_mode(vcpu); - /* * Merge guest and host intercepts - must be called with vcpu in * guest-mode to take effect.

6 months

1
0
0 0

FAILED: patch "[PATCH] KVM: x86: Reject Hyper-V's SEND_IPI hypercalls if local APIC" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x a8de7f100bb5989d9c3627d3a223ee1c863f3b69 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021851-nautical-buffoon-d8b1@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a8de7f100bb5989d9c3627d3a223ee1c863f3b69 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Fri, 17 Jan 2025 16:34:51 -0800 Subject: [PATCH] KVM: x86: Reject Hyper-V's SEND_IPI hypercalls if local APIC isn't in-kernel Advertise support for Hyper-V's SEND_IPI and SEND_IPI_EX hypercalls if and only if the local API is emulated/virtualized by KVM, and explicitly reject said hypercalls if the local APIC is emulated in userspace, i.e. don't rely on userspace to opt-in to KVM_CAP_HYPERV_ENFORCE_CPUID. Rejecting SEND_IPI and SEND_IPI_EX fixes a NULL-pointer dereference if Hyper-V enlightenments are exposed to the guest without an in-kernel local APIC: dump_stack+0xbe/0xfd __kasan_report.cold+0x34/0x84 kasan_report+0x3a/0x50 __apic_accept_irq+0x3a/0x5c0 kvm_hv_send_ipi.isra.0+0x34e/0x820 kvm_hv_hypercall+0x8d9/0x9d0 kvm_emulate_hypercall+0x506/0x7e0 __vmx_handle_exit+0x283/0xb60 vmx_handle_exit+0x1d/0xd0 vcpu_enter_guest+0x16b0/0x24c0 vcpu_run+0xc0/0x550 kvm_arch_vcpu_ioctl_run+0x170/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscal1_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Note, checking the sending vCPU is sufficient, as the per-VM irqchip_mode can't be modified after vCPUs are created, i.e. if one vCPU has an in-kernel local APIC, then all vCPUs have an in-kernel local APIC. Reported-by: Dongjie Zou <zoudongjie(a)huawei.com> Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls") Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Cc: stable(a)vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com> Link: https://lore.kernel.org/r/20250118003454.2619573-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 6a6dd5a84f22..6ebeb6cea6c0 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -2226,6 +2226,9 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc) u32 vector; bool all_cpus; + if (!lapic_in_kernel(vcpu)) + return HV_STATUS_INVALID_HYPERCALL_INPUT; + if (hc->code == HVCALL_SEND_IPI) { if (!hc->fast) { if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi, @@ -2852,7 +2855,8 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED; ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED; ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED; - ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED; + if (!vcpu || lapic_in_kernel(vcpu)) + ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED; ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED; if (evmcs_ver) ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;

6 months

1
0
0 0

FAILED: patch "[PATCH] KVM: x86: Reject Hyper-V's SEND_IPI hypercalls if local APIC" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x a8de7f100bb5989d9c3627d3a223ee1c863f3b69 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021850-exes-cabana-e868@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a8de7f100bb5989d9c3627d3a223ee1c863f3b69 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Fri, 17 Jan 2025 16:34:51 -0800 Subject: [PATCH] KVM: x86: Reject Hyper-V's SEND_IPI hypercalls if local APIC isn't in-kernel Advertise support for Hyper-V's SEND_IPI and SEND_IPI_EX hypercalls if and only if the local API is emulated/virtualized by KVM, and explicitly reject said hypercalls if the local APIC is emulated in userspace, i.e. don't rely on userspace to opt-in to KVM_CAP_HYPERV_ENFORCE_CPUID. Rejecting SEND_IPI and SEND_IPI_EX fixes a NULL-pointer dereference if Hyper-V enlightenments are exposed to the guest without an in-kernel local APIC: dump_stack+0xbe/0xfd __kasan_report.cold+0x34/0x84 kasan_report+0x3a/0x50 __apic_accept_irq+0x3a/0x5c0 kvm_hv_send_ipi.isra.0+0x34e/0x820 kvm_hv_hypercall+0x8d9/0x9d0 kvm_emulate_hypercall+0x506/0x7e0 __vmx_handle_exit+0x283/0xb60 vmx_handle_exit+0x1d/0xd0 vcpu_enter_guest+0x16b0/0x24c0 vcpu_run+0xc0/0x550 kvm_arch_vcpu_ioctl_run+0x170/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscal1_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Note, checking the sending vCPU is sufficient, as the per-VM irqchip_mode can't be modified after vCPUs are created, i.e. if one vCPU has an in-kernel local APIC, then all vCPUs have an in-kernel local APIC. Reported-by: Dongjie Zou <zoudongjie(a)huawei.com> Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls") Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Cc: stable(a)vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com> Link: https://lore.kernel.org/r/20250118003454.2619573-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 6a6dd5a84f22..6ebeb6cea6c0 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -2226,6 +2226,9 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc) u32 vector; bool all_cpus; + if (!lapic_in_kernel(vcpu)) + return HV_STATUS_INVALID_HYPERCALL_INPUT; + if (hc->code == HVCALL_SEND_IPI) { if (!hc->fast) { if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi, @@ -2852,7 +2855,8 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED; ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED; ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED; - ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED; + if (!vcpu || lapic_in_kernel(vcpu)) + ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED; ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED; if (evmcs_ver) ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;

6 months

1
0
0 0

FAILED: patch "[PATCH] drm/xe: Carve out wopcm portion from the stolen memory" failed to apply to 6.12-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x e977499820782ab1c69f354d9f41b6d9ad1f43d9 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021815-sublease-unneeded-0077@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From e977499820782ab1c69f354d9f41b6d9ad1f43d9 Mon Sep 17 00:00:00 2001 From: Nirmoy Das <nirmoy.das(a)intel.com> Date: Mon, 10 Feb 2025 15:36:54 +0100 Subject: [PATCH] drm/xe: Carve out wopcm portion from the stolen memory The top of stolen memory is WOPCM, which shouldn't be accessed. Remove this portion from the stolen memory region for discrete platforms. This was already done for integrated, but was missing for discrete platforms. This also moves get_wopcm_size() so detect_bar2_dgfx() and detect_bar2_integrated can use the same function. v2: Improve commit message and suitable stable version tag(Lucas) Fixes: d8b52a02cb40 ("drm/xe: Implement stolen memory.") Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Matthew Auld <matthew.auld(a)intel.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> Cc: stable(a)vger.kernel.org # v6.11+ Reviewed-by: Lucas De Marchi <lucas.demarchi(a)intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250210143654.2076747-1-nirm… Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com> (cherry picked from commit 2c7f45cc7e197a792ce5c693e56ea48f60b312da) Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c index 423856cc18d4..d414421f8c13 100644 --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c @@ -57,38 +57,6 @@ bool xe_ttm_stolen_cpu_access_needs_ggtt(struct xe_device *xe) return GRAPHICS_VERx100(xe) < 1270 && !IS_DGFX(xe); } -static s64 detect_bar2_dgfx(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr) -{ - struct xe_tile *tile = xe_device_get_root_tile(xe); - struct xe_mmio *mmio = xe_root_tile_mmio(xe); - struct pci_dev *pdev = to_pci_dev(xe->drm.dev); - u64 stolen_size; - u64 tile_offset; - u64 tile_size; - - tile_offset = tile->mem.vram.io_start - xe->mem.vram.io_start; - tile_size = tile->mem.vram.actual_physical_size; - - /* Use DSM base address instead for stolen memory */ - mgr->stolen_base = (xe_mmio_read64_2x32(mmio, DSMBASE) & BDSM_MASK) - tile_offset; - if (drm_WARN_ON(&xe->drm, tile_size < mgr->stolen_base)) - return 0; - - stolen_size = tile_size - mgr->stolen_base; - - /* Verify usage fits in the actual resource available */ - if (mgr->stolen_base + stolen_size <= pci_resource_len(pdev, LMEM_BAR)) - mgr->io_base = tile->mem.vram.io_start + mgr->stolen_base; - - /* - * There may be few KB of platform dependent reserved memory at the end - * of vram which is not part of the DSM. Such reserved memory portion is - * always less then DSM granularity so align down the stolen_size to DSM - * granularity to accommodate such reserve vram portion. - */ - return ALIGN_DOWN(stolen_size, SZ_1M); -} - static u32 get_wopcm_size(struct xe_device *xe) { u32 wopcm_size; @@ -112,6 +80,44 @@ static u32 get_wopcm_size(struct xe_device *xe) return wopcm_size; } +static s64 detect_bar2_dgfx(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr) +{ + struct xe_tile *tile = xe_device_get_root_tile(xe); + struct xe_mmio *mmio = xe_root_tile_mmio(xe); + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); + u64 stolen_size, wopcm_size; + u64 tile_offset; + u64 tile_size; + + tile_offset = tile->mem.vram.io_start - xe->mem.vram.io_start; + tile_size = tile->mem.vram.actual_physical_size; + + /* Use DSM base address instead for stolen memory */ + mgr->stolen_base = (xe_mmio_read64_2x32(mmio, DSMBASE) & BDSM_MASK) - tile_offset; + if (drm_WARN_ON(&xe->drm, tile_size < mgr->stolen_base)) + return 0; + + /* Carve out the top of DSM as it contains the reserved WOPCM region */ + wopcm_size = get_wopcm_size(xe); + if (drm_WARN_ON(&xe->drm, !wopcm_size)) + return 0; + + stolen_size = tile_size - mgr->stolen_base; + stolen_size -= wopcm_size; + + /* Verify usage fits in the actual resource available */ + if (mgr->stolen_base + stolen_size <= pci_resource_len(pdev, LMEM_BAR)) + mgr->io_base = tile->mem.vram.io_start + mgr->stolen_base; + + /* + * There may be few KB of platform dependent reserved memory at the end + * of vram which is not part of the DSM. Such reserved memory portion is + * always less then DSM granularity so align down the stolen_size to DSM + * granularity to accommodate such reserve vram portion. + */ + return ALIGN_DOWN(stolen_size, SZ_1M); +} + static u32 detect_bar2_integrated(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr) { struct pci_dev *pdev = to_pci_dev(xe->drm.dev);

6 months

1
0
0 0

[PATCH v2] ufs: core: bsg: Fix memory crash in case arpmb command failed

by Arthur Simchaev

In case the device doesn't support arpmb, the kernel get memory crash due to copy user data in bsg_transport_sg_io_fn level. So in case ufs_bsg_exec_advanced_rpmb_req returned error, do not set the job's reply_len. Memory crash backtrace: 3,1290,531166405,-;ufshcd 0000:00:12.5: ARPMB OP failed: error code -22 4,1308,531166555,-;Call Trace: 4,1309,531166559,-; <TASK> 4,1310,531166565,-; ? show_regs+0x6d/0x80 4,1311,531166575,-; ? die+0x37/0xa0 4,1312,531166583,-; ? do_trap+0xd4/0xf0 4,1313,531166593,-; ? do_error_trap+0x71/0xb0 4,1314,531166601,-; ? usercopy_abort+0x6c/0x80 4,1315,531166610,-; ? exc_invalid_op+0x52/0x80 4,1316,531166622,-; ? usercopy_abort+0x6c/0x80 4,1317,531166630,-; ? asm_exc_invalid_op+0x1b/0x20 4,1318,531166643,-; ? usercopy_abort+0x6c/0x80 4,1319,531166652,-; __check_heap_object+0xe3/0x120 4,1320,531166661,-; check_heap_object+0x185/0x1d0 4,1321,531166670,-; __check_object_size.part.0+0x72/0x150 4,1322,531166679,-; __check_object_size+0x23/0x30 4,1323,531166688,-; bsg_transport_sg_io_fn+0x314/0x3b0 Fixes: 6ff265fc5ef6 ("scsi: ufs: core: bsg: Add advanced RPMB support in ufs_bsg") Cc: stable(a)vger.kernel.org Signed-off-by: Arthur Simchaev <arthur.simchaev(a)sandisk.com> --- Changes in v2: - Add Fixes tag - Elaborate commit log Signed-off-by: Arthur Simchaev <arthur.simchaev(a)sandisk.com> --- drivers/ufs/core/ufs_bsg.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/ufs/core/ufs_bsg.c b/drivers/ufs/core/ufs_bsg.c index 8d4ad0a3f2cf..a8ed9bc6e4f1 100644 --- a/drivers/ufs/core/ufs_bsg.c +++ b/drivers/ufs/core/ufs_bsg.c @@ -194,10 +194,12 @@ static int ufs_bsg_request(struct bsg_job *job) ufshcd_rpm_put_sync(hba); kfree(buff); bsg_reply->result = ret; - job->reply_len = !rpmb ? sizeof(struct ufs_bsg_reply) : sizeof(struct ufs_rpmb_reply); /* complete the job here only if no error */ - if (ret == 0) + if (ret == 0) { + job->reply_len = !rpmb ? sizeof(struct ufs_bsg_reply) : + sizeof(struct ufs_rpmb_reply); bsg_job_done(job, ret, bsg_reply->reply_payload_rcv_len); + } return ret; } -- 2.34.1

6 months

1
0
0 0

[PATCH v2] gpiolib: protect gpio_chip with SRCU in array_info paths in multi get/set

by Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> During the locking rework in GPIOLIB, we omitted one important use-case, namely: setting and getting values for GPIO descriptor arrays with array_info present. This patch does two things: first it makes struct gpio_array store the address of the underlying GPIO device and not chip. Next: it protects the chip with SRCU from removal in gpiod_get_array_value_complex() and gpiod_set_array_value_complex(). Cc: stable(a)vger.kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- Changes in v2: - fix sparse warning about differing address spaces by checking for open-drain/source flags directly in the descriptor drivers/gpio/gpiolib.c | 48 +++++++++++++++++++++++++++++------------- drivers/gpio/gpiolib.h | 4 ++-- 2 files changed, 35 insertions(+), 17 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index f261f7893f85..fddbcf8a360e 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -3128,6 +3128,8 @@ static int gpiod_get_raw_value_commit(const struct gpio_desc *desc) static int gpio_chip_get_multiple(struct gpio_chip *gc, unsigned long *mask, unsigned long *bits) { + lockdep_assert_held(&gc->gpiodev->srcu); + if (gc->get_multiple) return gc->get_multiple(gc, mask, bits); if (gc->get) { @@ -3158,6 +3160,7 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, struct gpio_array *array_info, unsigned long *value_bitmap) { + struct gpio_chip *gc; int ret, i = 0; /* @@ -3169,10 +3172,15 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, array_size <= array_info->size && (void *)array_info == desc_array + array_info->size) { if (!can_sleep) - WARN_ON(array_info->chip->can_sleep); + WARN_ON(array_info->gdev->can_sleep); - ret = gpio_chip_get_multiple(array_info->chip, - array_info->get_mask, + guard(srcu)(&array_info->gdev->srcu); + gc = srcu_dereference(array_info->gdev->chip, + &array_info->gdev->srcu); + if (!gc) + return -ENODEV; + + ret = gpio_chip_get_multiple(gc, array_info->get_mask, value_bitmap); if (ret) return ret; @@ -3453,6 +3461,8 @@ static void gpiod_set_raw_value_commit(struct gpio_desc *desc, bool value) static void gpio_chip_set_multiple(struct gpio_chip *gc, unsigned long *mask, unsigned long *bits) { + lockdep_assert_held(&gc->gpiodev->srcu); + if (gc->set_multiple) { gc->set_multiple(gc, mask, bits); } else { @@ -3470,6 +3480,7 @@ int gpiod_set_array_value_complex(bool raw, bool can_sleep, struct gpio_array *array_info, unsigned long *value_bitmap) { + struct gpio_chip *gc; int i = 0; /* @@ -3481,14 +3492,19 @@ int gpiod_set_array_value_complex(bool raw, bool can_sleep, array_size <= array_info->size && (void *)array_info == desc_array + array_info->size) { if (!can_sleep) - WARN_ON(array_info->chip->can_sleep); + WARN_ON(array_info->gdev->can_sleep); + + guard(srcu)(&array_info->gdev->srcu); + gc = srcu_dereference(array_info->gdev->chip, + &array_info->gdev->srcu); + if (!gc) + return -ENODEV; if (!raw && !bitmap_empty(array_info->invert_mask, array_size)) bitmap_xor(value_bitmap, value_bitmap, array_info->invert_mask, array_size); - gpio_chip_set_multiple(array_info->chip, array_info->set_mask, - value_bitmap); + gpio_chip_set_multiple(gc, array_info->set_mask, value_bitmap); i = find_first_zero_bit(array_info->set_mask, array_size); if (i == array_size) @@ -4750,9 +4766,10 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, { struct gpio_desc *desc; struct gpio_descs *descs; + struct gpio_device *gdev; struct gpio_array *array_info = NULL; - struct gpio_chip *gc; int count, bitmap_size; + unsigned long dflags; size_t descs_size; count = gpiod_count(dev, con_id); @@ -4773,7 +4790,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, descs->desc[descs->ndescs] = desc; - gc = gpiod_to_chip(desc); + gdev = gpiod_to_gpio_device(desc); /* * If pin hardware number of array member 0 is also 0, select * its chip as a candidate for fast bitmap processing path. @@ -4781,8 +4798,8 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, if (descs->ndescs == 0 && gpio_chip_hwgpio(desc) == 0) { struct gpio_descs *array; - bitmap_size = BITS_TO_LONGS(gc->ngpio > count ? - gc->ngpio : count); + bitmap_size = BITS_TO_LONGS(gdev->ngpio > count ? + gdev->ngpio : count); array = krealloc(descs, descs_size + struct_size(array_info, invert_mask, 3 * bitmap_size), @@ -4802,7 +4819,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, array_info->desc = descs->desc; array_info->size = count; - array_info->chip = gc; + array_info->gdev = gdev; bitmap_set(array_info->get_mask, descs->ndescs, count - descs->ndescs); bitmap_set(array_info->set_mask, descs->ndescs, @@ -4815,7 +4832,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, continue; /* Unmark array members which don't belong to the 'fast' chip */ - if (array_info->chip != gc) { + if (array_info->gdev != gdev) { __clear_bit(descs->ndescs, array_info->get_mask); __clear_bit(descs->ndescs, array_info->set_mask); } @@ -4838,9 +4855,10 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, array_info->set_mask); } } else { + dflags = READ_ONCE(desc->flags); /* Exclude open drain or open source from fast output */ - if (gpiochip_line_is_open_drain(gc, descs->ndescs) || - gpiochip_line_is_open_source(gc, descs->ndescs)) + if (test_bit(FLAG_OPEN_DRAIN, &dflags) || + test_bit(FLAG_OPEN_SOURCE, &dflags)) __clear_bit(descs->ndescs, array_info->set_mask); /* Identify 'fast' pins which require invertion */ @@ -4852,7 +4870,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, if (array_info) dev_dbg(dev, "GPIO array info: chip=%s, size=%d, get_mask=%lx, set_mask=%lx, invert_mask=%lx\n", - array_info->chip->label, array_info->size, + array_info->gdev->label, array_info->size, *array_info->get_mask, *array_info->set_mask, *array_info->invert_mask); return descs; diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h index 83690f72f7e5..147156ec502b 100644 --- a/drivers/gpio/gpiolib.h +++ b/drivers/gpio/gpiolib.h @@ -114,7 +114,7 @@ extern const char *const gpio_suffixes[]; * * @desc: Array of pointers to the GPIO descriptors * @size: Number of elements in desc - * @chip: Parent GPIO chip + * @gdev: Parent GPIO device * @get_mask: Get mask used in fastpath * @set_mask: Set mask used in fastpath * @invert_mask: Invert mask used in fastpath @@ -126,7 +126,7 @@ extern const char *const gpio_suffixes[]; struct gpio_array { struct gpio_desc **desc; unsigned int size; - struct gpio_chip *chip; + struct gpio_device *gdev; unsigned long *get_mask; unsigned long *set_mask; unsigned long invert_mask[]; -- 2.45.2

6 months

1
1
0 0

[PATCH net 0/8] net: enetc: fix some known issues

by Wei Fang

There are some issues with the enetc driver, some of which are specific to the LS1028A platform, and some of which were introduced recently when i.MX95 ENETC support was added, so this patch set aims to clean up those issues. Wei Fang (8): net: enetc: fix the off-by-one issue in enetc_map_tx_buffs() net: enetc: correct the tx_swbd statistics net: enetc: correct the xdp_tx statistics net: enetc: VFs do not support HWTSTAMP_TX_ONESTEP_SYNC net: enetc: update UDP checksum when updating originTimestamp field net: enetc: add missing enetc4_link_deinit() net: enetc: remove the mm_lock from the ENETC v4 driver net: enetc: correct the EMDIO base offset for ENETC v4 drivers/net/ethernet/freescale/enetc/enetc.c | 53 +++++++++++++++---- .../net/ethernet/freescale/enetc/enetc4_hw.h | 3 ++ .../net/ethernet/freescale/enetc/enetc4_pf.c | 2 +- .../ethernet/freescale/enetc/enetc_ethtool.c | 7 ++- .../freescale/enetc/enetc_pf_common.c | 10 +++- 5 files changed, 60 insertions(+), 15 deletions(-) -- 2.34.1

6 months

4
20
0 0

[PATCH] mm/hugetlb: wait for hugepage folios to be freed

by yangge1116＠126.com

From: Ge Yang <yangge1116(a)126.com> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages, the allocation of contiguous memory through cma_alloc() may fail probabilistically. In the CMA allocation process, if it is found that the CMA area is occupied by in-use hugepage folios, these in-use hugepage folios need to be migrated to another location. When there are no available hugepage folios in the free HugeTLB pool during the migration of in-use HugeTLB pages, new folios are allocated from the buddy system. A temporary state is set on the newly allocated folio. Upon completion of the hugepage folio migration, the temporary state is transferred from the new folios to the old folios. Normally, when the old folios with the temporary state are freed, it is directly released back to the buddy system. However, due to the deferred freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading to the failure of cma_alloc(). Here is a simplified call trace illustrating the process: cma_alloc() ->__alloc_contig_migrate_range() // Migrate in-use hugepage ->unmap_and_move_huge_page() ->folio_putback_hugetlb() // Free old folios ->test_pages_isolated() ->__test_page_isolated_in_pageblock() ->PageBuddy(page) // Check if the page is in buddy To resolve this issue, we have implemented a function named wait_for_hugepage_folios_freed(). This function ensures that the hugepage folios are properly released back to the buddy system after their migration is completed. By invoking wait_for_hugepage_folios_freed() following the migration process, we guarantee that when test_pages_isolated() is executed, it will successfully pass. Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages") Signed-off-by: Ge Yang <yangge1116(a)126.com> --- include/linux/hugetlb.h | 5 +++++ mm/hugetlb.c | 7 +++++++ mm/migrate.c | 16 ++++++++++++++-- 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6c6546b..c39e0d5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); +void wait_for_hugepage_folios_freed(struct hstate *h); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn, return 0; } +static inline void wait_for_hugepage_folios_freed(struct hstate *h) +{ +} + static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 30bc34d..64cae39 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2955,6 +2955,13 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) return ret; } +void wait_for_hugepage_folios_freed(struct hstate *h) +{ + WARN_ON(!h); + + flush_free_hpage_work(h); +} + typedef enum { /* * For either 0/1: we checked the per-vma resv map, and one resv diff --git a/mm/migrate.c b/mm/migrate.c index fb19a18..5dd1851 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1448,6 +1448,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, int page_was_mapped = 0; struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; + unsigned long size; if (folio_ref_count(src) == 1) { /* page was freed from under us. So we are done. */ @@ -1533,9 +1534,20 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, out_unlock: folio_unlock(src); out: - if (rc == MIGRATEPAGE_SUCCESS) + if (rc == MIGRATEPAGE_SUCCESS) { + size = folio_size(src); folio_putback_hugetlb(src); - else if (rc != -EAGAIN) + + /* + * Due to the deferred freeing of HugeTLB folios, the hugepage 'src' may + * not immediately release to the buddy system. This can lead to failure + * in allocating memory through the cma_alloc() function. To ensure that + * the hugepage folios are properly released back to the buddy system, + * we invoke the wait_for_hugepage_folios_freed() function to wait for + * the release to complete. + */ + wait_for_hugepage_folios_freed(size_to_hstate(size)); + } else if (rc != -EAGAIN) list_move_tail(&src->lru, ret); /* -- 2.7.4

6 months

4
7
0 0

[PATCH v1] ARM: dts: imx6qdl-apalis: Fix poweroff on Apalis iMX6

by Stefan Eichenberger

From: Stefan Eichenberger <stefan.eichenberger(a)toradex.com> The current solution for powering off the Apalis iMX6 is not functioning as intended. To resolve this, it is necessary to power off the vgen2_reg, which will also set the POWER_ENABLE_MOCI signal to a low state. This ensures the carrier board is properly informed to initiate its power-off sequence. The new solution uses the regulator-poweroff driver, which will power off the regulator during a system shutdown. CC: stable(a)vger.kernel.org Fixes: 4eb56e26f92e ("ARM: dts: imx6q-apalis: Command pmic to standby for poweroff") Signed-off-by: Stefan Eichenberger <stefan.eichenberger(a)toradex.com> --- arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi b/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi index 1c72da417011..614b65821995 100644 --- a/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi +++ b/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi @@ -108,6 +108,11 @@ lvds_panel_in: endpoint { }; }; + poweroff { + compatible = "regulator-poweroff"; + cpu-supply = <&vgen2_reg>; + }; + reg_module_3v3: regulator-module-3v3 { compatible = "regulator-fixed"; regulator-always-on; @@ -236,10 +241,6 @@ &can2 { status = "disabled"; }; -&clks { - fsl,pmic-stby-poweroff; -}; - /* Apalis SPI1 */ &ecspi1 { cs-gpios = <&gpio5 25 GPIO_ACTIVE_LOW>; @@ -527,7 +528,6 @@ &i2c2 { pmic: pmic@8 { compatible = "fsl,pfuze100"; - fsl,pmic-stby-poweroff; reg = <0x08>; regulators { -- 2.45.2

6 months

2
1
0 0

[PATCH v2] media: verisilicon: Fix AV1 decoder clock frequency

by Nicolas Dufresne

The desired clock frequency was correctly set to 400MHz in the device tree but was lowered by the driver to 300MHz breaking 4K 60Hz content playback. Fix the issue by removing the driver call to clk_set_rate(), which reduce the amount of board specific code. Fixes: 003afda97c65 ("media: verisilicon: Enable AV1 decoder on rk3588") Cc: stable(a)vger.kernel.org Signed-off-by: Nicolas Dufresne <nicolas.dufresne(a)collabora.com> --- This patch fixes user report of AV1 4K60 decoder not being fast enough on RK3588 based SoC. This is a break from Hantro original authors habbit of coding the frequencies in the driver instead of specifying this frequency in the device tree. The other calls to clk_set_rate() are left since this would require modifying many dtsi files, which would then be unsuitable for backport. --- Changes in v2: - Completely remove the unused init function, the driver is null-safe - Link to v1: https://lore.kernel.org/r/20250217-b4-hantro-av1-clock-rate-v1-1-65b91132c5… --- drivers/media/platform/verisilicon/rockchip_vpu_hw.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/drivers/media/platform/verisilicon/rockchip_vpu_hw.c b/drivers/media/platform/verisilicon/rockchip_vpu_hw.c index 964122e7c355934cd80eb442219f6ba51bba8b71..842040f713c15e6ff295771bc9fa5a7b66e584b2 100644 --- a/drivers/media/platform/verisilicon/rockchip_vpu_hw.c +++ b/drivers/media/platform/verisilicon/rockchip_vpu_hw.c @@ -17,7 +17,6 @@ #define RK3066_ACLK_MAX_FREQ (300 * 1000 * 1000) #define RK3288_ACLK_MAX_FREQ (400 * 1000 * 1000) -#define RK3588_ACLK_MAX_FREQ (300 * 1000 * 1000) #define ROCKCHIP_VPU981_MIN_SIZE 64 @@ -440,13 +439,6 @@ static int rk3066_vpu_hw_init(struct hantro_dev *vpu) return 0; } -static int rk3588_vpu981_hw_init(struct hantro_dev *vpu) -{ - /* Bump ACLKs to max. possible freq. to improve performance. */ - clk_set_rate(vpu->clocks[0].clk, RK3588_ACLK_MAX_FREQ); - return 0; -} - static int rockchip_vpu_hw_init(struct hantro_dev *vpu) { /* Bump ACLK to max. possible freq. to improve performance. */ @@ -807,7 +799,6 @@ const struct hantro_variant rk3588_vpu981_variant = { .codec_ops = rk3588_vpu981_codec_ops, .irqs = rk3588_vpu981_irqs, .num_irqs = ARRAY_SIZE(rk3588_vpu981_irqs), - .init = rk3588_vpu981_hw_init, .clk_names = rk3588_vpu981_vpu_clk_names, .num_clocks = ARRAY_SIZE(rk3588_vpu981_vpu_clk_names) }; --- base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b change-id: 20250217-b4-hantro-av1-clock-rate-e5497f1499df Best regards, -- Nicolas Dufresne <nicolas.dufresne(a)collabora.com>

6 months

3
2
0 0

regression on 6.1.129-rc with perf

by Max Krummenacher

Our CI built linux-stable-rc.git queue/6.1 at commit eef4a8a45ba1 ("btrfs: output the reason for open_ctree() failure"). (built for arm and arm64, albeit I don't think it matters.) Building perf produced 2 build errors which I wanted to report even before the RC1 is out. | ...tools/perf/util/namespaces.c:247:27: error: invalid type argument of '->' (have 'int') | 247 | RC_CHK_ACCESS(nsi)->in_pidns = true; | | ^~ introduced by commit 93520bacf784 ("perf namespaces: Introduce nsinfo__set_in_pidns()"). The RC_CK_ACCSS macro was introduced in 6.4. Removing the macro made this go away ( nsi->in_pidns = true; ). Second perf build error: | ld: ...tools/perf/util/machine.c:1176: undefined reference to `kallsyms__get_symbol_start' Introduced by commits: 710c2e913aa9 perf machine: Don't ignore _etext when not a text symbol 69a87a32f5cd perf machine: Include data symbols in the kernel map These two use the function kallsyms__get_symbol_start added with: f9dd531c5b82 perf symbols: Add kallsyms__get_symbol_start() So f9dd531c5b82 would additionally be needed. The kernel itself built fine, due to the perf error we don't have runtime testresults. Best regards Max

6 months, 1 week

2
1
0 0

Linux 6.12.15

by Greg Kroah-Hartman

I'm announcing the release of the 6.12.15 kernel. This is ONLY needed for users of the XFS filesystem. If you do not use XFS, no need to upgrade. If you do use XFS, please upgrade. The updated 6.12.y git tree can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.12.y and can be browsed at the normal kernel.org git web browser: https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary thanks, greg k-h ------------ Makefile | 2 +- fs/xfs/xfs_quota.h | 5 +++-- fs/xfs/xfs_trans.c | 10 +++------- fs/xfs/xfs_trans_dquot.c | 31 ++++++++++++++++++++++++++----- 4 files changed, 33 insertions(+), 15 deletions(-) Darrick J. Wong (1): xfs: don't lose solo dquot update transactions Greg Kroah-Hartman (1): Linux 6.12.15

6 months, 1 week

1
1
0 0

[PATCH] m68k: sun3: Add check for __pgd_alloc()

by Haoxiang Li

Add check for the return value of __pgd_alloc() in pgd_alloc() to prevent null pointer dereference. Fixes: a9b3c355c2e6 ("asm-generic: pgalloc: provide generic __pgd_{alloc,free}") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- arch/m68k/include/asm/sun3_pgalloc.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/m68k/include/asm/sun3_pgalloc.h b/arch/m68k/include/asm/sun3_pgalloc.h index f1ae4ed890db..80afc3a18724 100644 --- a/arch/m68k/include/asm/sun3_pgalloc.h +++ b/arch/m68k/include/asm/sun3_pgalloc.h @@ -44,8 +44,10 @@ static inline pgd_t * pgd_alloc(struct mm_struct *mm) pgd_t *new_pgd; new_pgd = __pgd_alloc(mm, 0); - memcpy(new_pgd, swapper_pg_dir, PAGE_SIZE); - memset(new_pgd, 0, (PAGE_OFFSET >> PGDIR_SHIFT)); + if (likely(new_pgd != NULL)) { + memcpy(new_pgd, swapper_pg_dir, PAGE_SIZE); + memset(new_pgd, 0, (PAGE_OFFSET >> PGDIR_SHIFT)); + } return new_pgd; } -- 2.25.1

6 months, 1 week

2
1
0 0

[PATCH v2] nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()

by Haoxiang Li

Add check for the return value of nfp_app_ctrl_msg_alloc() in nfp_bpf_cmsg_alloc() to prevent null pointer dereference. Fixes: ff3d43f7568c ("nfp: bpf: implement helpers for FW map ops") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- Changes in v2: - remove the bracket for one single-statement. Thanks, Guru! --- drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c index 2ec62c8d86e1..b02d5fbb8c8c 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c @@ -20,6 +20,8 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size) struct sk_buff *skb; skb = nfp_app_ctrl_msg_alloc(bpf->app, size, GFP_KERNEL); + if (!skp) + return NULL; skb_put(skb, size); return skb; -- 2.25.1

6 months, 1 week

3
2
0 0

[PATCH] x86/i8253: Disable PIT timer 0 when not in use

by jaywang-amazon

From: David Woodhouse <dwmw(a)amazon.co.uk> [upstream commit 70e6b7d9ae3c63df90a7bba7700e8d5c300c360] Leaving the PIT interrupt running can cause noticeable steal time for virtual guests. The VMM generally has a timer which toggles the IRQ input to the PIC and I/O APIC, which takes CPU time away from the guest. Even on real hardware, running the counter may use power needlessly (albeit not much). Make sure it's turned off if it isn't going to be used. Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Michael Kelley <mhkelley(a)outlook.com> Link: https://lore.kernel.org/all/20240802135555.564941-1-dwmw2@infradead.org (cherry picked from commit 70e6b7d9ae3c63df90a7bba7700e8d5c300c3c60) Cc: stable(a)vger.kernel.org # v5.15 Signed-off-by: jaywang-amazon <wanjay(a)amazon.com> --- arch/x86/kernel/i8253.c | 11 +++++++++-- drivers/clocksource/i8253.c | 13 +++++++++---- include/linux/i8253.h | 1 + 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c index 2b7999a1a50a..80e262bb627f 100644 --- a/arch/x86/kernel/i8253.c +++ b/arch/x86/kernel/i8253.c @@ -8,6 +8,7 @@ #include <linux/timex.h> #include <linux/i8253.h> +#include <asm/hypervisor.h> #include <asm/apic.h> #include <asm/hpet.h> #include <asm/time.h> @@ -39,9 +40,15 @@ static bool __init use_pit(void) bool __init pit_timer_init(void) { - if (!use_pit()) + if (!use_pit()) { + /* + * Don't just ignore the PIT. Ensure it's stopped, because + * VMMs otherwise steal CPU time just to pointlessly waggle + * the (masked) IRQ. + */ + clockevent_i8253_disable(); return false; - + } clockevent_i8253_init(true); global_clock_event = &i8253_clockevent; return true; diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c index d4350bb10b83..cb215e6f2e83 100644 --- a/drivers/clocksource/i8253.c +++ b/drivers/clocksource/i8253.c @@ -108,11 +108,8 @@ int __init clocksource_i8253_init(void) #endif #ifdef CONFIG_CLKEVT_I8253 -static int pit_shutdown(struct clock_event_device *evt) +void clockevent_i8253_disable(void) { - if (!clockevent_state_oneshot(evt) && !clockevent_state_periodic(evt)) - return 0; - raw_spin_lock(&i8253_lock); outb_p(0x30, PIT_MODE); @@ -123,6 +120,14 @@ static int pit_shutdown(struct clock_event_device *evt) } raw_spin_unlock(&i8253_lock); +} + +static int pit_shutdown(struct clock_event_device *evt) +{ + if (!clockevent_state_oneshot(evt) && !clockevent_state_periodic(evt)) + return 0; + + clockevent_i8253_disable(); return 0; } diff --git a/include/linux/i8253.h b/include/linux/i8253.h index 8336b2f6f834..bf169cfef7f1 100644 --- a/include/linux/i8253.h +++ b/include/linux/i8253.h @@ -24,6 +24,7 @@ extern raw_spinlock_t i8253_lock; extern bool i8253_clear_counter_on_shutdown; extern struct clock_event_device i8253_clockevent; extern void clockevent_i8253_init(bool oneshot); +extern void clockevent_i8253_disable(void); extern void setup_pit_timer(void); -- 2.47.1

6 months, 1 week

2
1
0 0

[PATCH 6.1 0/1] Backport upstream commit 70e6b7d9ae3c63df90a7bba7700e8d5c300c3c60 to stable 6.1

by jaywang-amazon

Backport upstream commit 70e6b7d9ae3c63df90a7bba7700e8d5c300c3c60 to stable 6.1 David Woodhouse (1): x86/i8253: Disable PIT timer 0 when not in use arch/x86/kernel/i8253.c | 11 +++++++++-- drivers/clocksource/i8253.c | 13 +++++++++---- include/linux/i8253.h | 1 + 3 files changed, 19 insertions(+), 6 deletions(-) -- 2.47.1

6 months, 1 week

2
2
0 0

[merged mm-hotfixes-stable] mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages has been removed from the -mm tree. Its filename was mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Qi Zheng <zhengqi.arch(a)bytedance.com> Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages Date: Tue, 11 Feb 2025 15:26:25 +0800 In zap_pte_range(), if the pte lock was released midway, the pte entries may be refilled with physical pages by another thread, which may cause a non-empty PTE page to be reclaimed and eventually cause the system to crash. To fix it, fall back to the slow path in this case to recheck if all pte entries are still none. Link: https://lkml.kernel.org/r/20250211072625.89188-1-zhengqi.arch@bytedance.com Fixes: 6375e95f381e ("mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)") Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com> Reported-by: Christian Brauner <brauner(a)kernel.org> Closes: https://lore.kernel.org/all/20250207-anbot-bankfilialen-acce9d79a2c7@braune… Reported-by: Qu Wenruo <quwenruo.btrfs(a)gmx.com> Closes: https://lore.kernel.org/all/152296f3-5c81-4a94-97f3-004108fba7be@gmx.com/ Tested-by: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Cc: "Darrick J. Wong" <djwong(a)kernel.org> Cc: Dave Chinner <david(a)fromorbit.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Jann Horn <jannh(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Zi Yan <ziy(a)nvidia.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) --- a/mm/memory.c~mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages +++ a/mm/memory.c @@ -1719,7 +1719,7 @@ static unsigned long zap_pte_range(struc pmd_t pmdval; unsigned long start = addr; bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details); - bool direct_reclaim = false; + bool direct_reclaim = true; int nr; retry: @@ -1734,8 +1734,10 @@ retry: do { bool any_skipped = false; - if (need_resched()) + if (need_resched()) { + direct_reclaim = false; break; + } nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, &force_flush, &force_break, &any_skipped); @@ -1743,11 +1745,20 @@ retry: can_reclaim_pt = false; if (unlikely(force_break)) { addr += nr * PAGE_SIZE; + direct_reclaim = false; break; } } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); - if (can_reclaim_pt && addr == end) + /* + * Fast path: try to hold the pmd lock and unmap the PTE page. + * + * If the pte lock was released midway (retry case), or if the attempt + * to hold the pmd lock failed, then we need to recheck all pte entries + * to ensure they are still none, thereby preventing the pte entries + * from being repopulated by another thread. + */ + if (can_reclaim_pt && direct_reclaim && addr == end) direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval); add_mm_rss_vec(mm, rss); _ Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are arm-pgtable-fix-null-pointer-dereference-issue.patch

6 months, 1 week

1
0
0 0

[merged mm-hotfixes-stable] mm-migrate_device-dont-add-folio-to-be-freed-to-lru-in-migrate_device_finalize.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() has been removed from the -mm tree. Its filename was mm-migrate_device-dont-add-folio-to-be-freed-to-lru-in-migrate_device_finalize.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() Date: Mon, 10 Feb 2025 17:13:17 +0100 If migration succeeded, we called folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the old to the new folio. This will set memcg_data of the old folio to 0. Similarly, if migration failed, memcg_data of the dst folio is left unset. If we call folio_putback_lru() on such folios (memcg_data == 0), we will add the folio to be freed to the LRU, making memcg code unhappy. Running the hmm selftests: # ./hmm-tests ... # RUN hmm.hmm_device_private.migrate ... [ 102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00 [ 102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff) [ 102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9 [ 102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000 [ 102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled()) [ 102.087230][T14893] ------------[ cut here ]------------ [ 102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.090478][T14893] Modules linked in: [ 102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151 [ 102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.096104][T14893] Code: ... [ 102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293 [ 102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426 [ 102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880 [ 102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 [ 102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8 [ 102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000 [ 102.108830][T14893] FS: 00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 [ 102.110643][T14893] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0 [ 102.113478][T14893] PKRU: 55555554 [ 102.114172][T14893] Call Trace: [ 102.114805][T14893] <TASK> [ 102.115397][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.116547][T14893] ? __warn.cold+0x110/0x210 [ 102.117461][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.118667][T14893] ? report_bug+0x1b9/0x320 [ 102.119571][T14893] ? handle_bug+0x54/0x90 [ 102.120494][T14893] ? exc_invalid_op+0x17/0x50 [ 102.121433][T14893] ? asm_exc_invalid_op+0x1a/0x20 [ 102.122435][T14893] ? __wake_up_klogd.part.0+0x76/0xd0 [ 102.123506][T14893] ? dump_page+0x4f/0x60 [ 102.124352][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.125500][T14893] folio_batch_move_lru+0xd4/0x200 [ 102.126577][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.127505][T14893] __folio_batch_add_and_move+0x391/0x720 [ 102.128633][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.129550][T14893] folio_putback_lru+0x16/0x80 [ 102.130564][T14893] migrate_device_finalize+0x9b/0x530 [ 102.131640][T14893] dmirror_migrate_to_device.constprop.0+0x7c5/0xad0 [ 102.133047][T14893] dmirror_fops_unlocked_ioctl+0x89b/0xc80 Likely, nothing else goes wrong: putting the last folio reference will remove the folio from the LRU again. So besides memcg complaining, adding the folio to be freed to the LRU is just an unnecessary step. The new flow resembles what we have in migrate_folio_move(): add the dst to the lru, remove migration ptes, unlock and unref dst. Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory") Signed-off-by: David Hildenbrand <david(a)redhat.com> Cc: J��r��me Glisse <jglisse(a)redhat.com> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Alistair Popple <apopple(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate_device.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) --- a/mm/migrate_device.c~mm-migrate_device-dont-add-folio-to-be-freed-to-lru-in-migrate_device_finalize +++ a/mm/migrate_device.c @@ -840,20 +840,15 @@ void migrate_device_finalize(unsigned lo dst = src; } + if (!folio_is_zone_device(dst)) + folio_add_lru(dst); remove_migration_ptes(src, dst, 0); folio_unlock(src); - - if (folio_is_zone_device(src)) - folio_put(src); - else - folio_putback_lru(src); + folio_put(src); if (dst != src) { folio_unlock(dst); - if (folio_is_zone_device(dst)) - folio_put(dst); - else - folio_putback_lru(dst); + folio_put(dst); } } } _ Patches currently in -mm which might be from david(a)redhat.com are mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive-fix.patch mm-rmap-implement-make_device_exclusive-using-folio_walk-instead-of-rmap-walk.patch mm-memory-detect-writability-in-restore_exclusive_pte-through-can_change_pte_writable.patch mm-use-single-swp_device_exclusive-entry-type.patch mm-page_vma_mapped-device-exclusive-entries-are-not-migration-entries.patch kernel-events-uprobes-handle-device-exclusive-entries-correctly-in-__replace_page.patch mm-ksm-handle-device-exclusive-entries-correctly-in-write_protect_page.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_unmap_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_migrate_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-page_vma_mkclean_one.patch mm-page_idle-handle-device-exclusive-entries-correctly-in-page_idle_clear_pte_refs_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_young_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_mkold_one.patch mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries.patch mm-rmap-avoid-ebusy-from-make_device_exclusive.patch

6 months, 1 week

1
0
0 0

[merged mm-hotfixes-stable] mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm,madvise,hugetlb: check for 0-length range after end address adjustment has been removed from the -mm tree. Its filename was mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ricardo Ca��uelo Navarro <rcn(a)igalia.com> Subject: mm,madvise,hugetlb: check for 0-length range after end address adjustment Date: Mon, 3 Feb 2025 08:52:06 +0100 Add a sanity check to madvise_dontneed_free() to address a corner case in madvise where a race condition causes the current vma being processed to be backed by a different page size. During a madvise(MADV_DONTNEED) call on a memory region registered with a userfaultfd, there's a period of time where the process mm lock is temporarily released in order to send a UFFD_EVENT_REMOVE and let userspace handle the event. During this time, the vma covering the current address range may change due to an explicit mmap done concurrently by another thread. If, after that change, the memory region, which was originally backed by 4KB pages, is now backed by hugepages, the end address is rounded down to a hugepage boundary to avoid data loss (see "Fixes" below). This rounding may cause the end address to be truncated to the same address as the start. Make this corner case follow the same semantics as in other similar cases where the requested region has zero length (ie. return 0). This will make madvise_walk_vmas() continue to the next vma in the range (this time holding the process mm lock) which, due to the prev pointer becoming stale because of the vma change, will be the same hugepage-backed vma that was just checked before. The next time madvise_dontneed_free() runs for this vma, if the start address isn't aligned to a hugepage boundary, it'll return -EINVAL, which is also in line with the madvise api. From userspace perspective, madvise() will return EINVAL because the start address isn't aligned according to the new vma alignment requirements (hugepage), even though it was correctly page-aligned when the call was issued. Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs") Signed-off-by: Ricardo Ca��uelo Navarro <rcn(a)igalia.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Florent Revest <revest(a)google.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/madvise.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/mm/madvise.c~mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment +++ a/mm/madvise.c @@ -933,7 +933,16 @@ static long madvise_dontneed_free(struct */ end = vma->vm_end; } - VM_WARN_ON(start >= end); + /* + * If the memory region between start and end was + * originally backed by 4kB pages and then remapped to + * be backed by hugepages while mmap_lock was dropped, + * the adjustment for hugetlb vma above may have rounded + * end down to the start address. + */ + if (start == end) + return 0; + VM_WARN_ON(start > end); } if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED) _ Patches currently in -mm which might be from rcn(a)igalia.com are

6 months, 1 week

1
0
0 0

[merged mm-hotfixes-stable] mm-zswap-fix-inconsistency-when-zswap_store_page-fails.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/zswap: fix inconsistency when zswap_store_page() fails has been removed from the -mm tree. Its filename was mm-zswap-fix-inconsistency-when-zswap_store_page-fails.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Hyeonggon Yoo <42.hyeyoo(a)gmail.com> Subject: mm/zswap: fix inconsistency when zswap_store_page() fails Date: Wed, 29 Jan 2025 19:08:44 +0900 Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") skips charging any zswap entries when it failed to zswap the entire folio. However, when some base pages are zswapped but it failed to zswap the entire folio, the zswap operation is rolled back. When freeing zswap entries for those pages, zswap_entry_free() uncharges the zswap entries that were not previously charged, causing zswap charging to become inconsistent. This inconsistency triggers two warnings with following steps: # On a machine with 64GiB of RAM and 36GiB of zswap $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng $ sudo reboot The two warnings are: in mm/memcontrol.c:163, function obj_cgroup_release(): WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); in mm/page_counter.c:60, function page_counter_cancel(): if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n", new, nr_pages)) zswap_stored_pages also becomes inconsistent in the same way. As suggested by Kanchana, increment zswap_stored_pages and charge zswap entries within zswap_store_page() when it succeeds. This way, zswap_entry_free() will decrement the counter and uncharge the entries when it failed to zswap the entire folio. While this could potentially be optimized by batching objcg charging and incrementing the counter, let's focus on fixing the bug this time and leave the optimization for later after some evaluation. After resolving the inconsistency, the warnings disappear. [42.hyeyoo(a)gmail.com: refactor zswap_store_page()] Link: https://lkml.kernel.org/r/20250131082037.2426-1-42.hyeyoo@gmail.com Link: https://lkml.kernel.org/r/20250129100844.2935-1-42.hyeyoo@gmail.com Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com> Signed-off-by: Hyeonggon Yoo <42.hyeyoo(a)gmail.com> Acked-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> Acked-by: Nhat Pham <nphamcs(a)gmail.com> Cc: Chengming Zhou <chengming.zhou(a)linux.dev> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/zswap.c | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-) --- a/mm/zswap.c~mm-zswap-fix-inconsistency-when-zswap_store_page-fails +++ a/mm/zswap.c @@ -1445,9 +1445,9 @@ resched: * main API **********************************/ -static ssize_t zswap_store_page(struct page *page, - struct obj_cgroup *objcg, - struct zswap_pool *pool) +static bool zswap_store_page(struct page *page, + struct obj_cgroup *objcg, + struct zswap_pool *pool) { swp_entry_t page_swpentry = page_swap_entry(page); struct zswap_entry *entry, *old; @@ -1456,7 +1456,7 @@ static ssize_t zswap_store_page(struct p entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page)); if (!entry) { zswap_reject_kmemcache_fail++; - return -EINVAL; + return false; } if (!zswap_compress(page, entry, pool)) @@ -1483,13 +1483,17 @@ static ssize_t zswap_store_page(struct p /* * The entry is successfully compressed and stored in the tree, there is - * no further possibility of failure. Grab refs to the pool and objcg. - * These refs will be dropped by zswap_entry_free() when the entry is - * removed from the tree. + * no further possibility of failure. Grab refs to the pool and objcg, + * charge zswap memory, and increment zswap_stored_pages. + * The opposite actions will be performed by zswap_entry_free() + * when the entry is removed from the tree. */ zswap_pool_get(pool); - if (objcg) + if (objcg) { obj_cgroup_get(objcg); + obj_cgroup_charge_zswap(objcg, entry->length); + } + atomic_long_inc(&zswap_stored_pages); /* * We finish initializing the entry while it's already in xarray. @@ -1510,13 +1514,13 @@ static ssize_t zswap_store_page(struct p zswap_lru_add(&zswap_list_lru, entry); } - return entry->length; + return true; store_failed: zpool_free(pool->zpool, entry->handle); compress_failed: zswap_entry_cache_free(entry); - return -EINVAL; + return false; } bool zswap_store(struct folio *folio) @@ -1526,7 +1530,6 @@ bool zswap_store(struct folio *folio) struct obj_cgroup *objcg = NULL; struct mem_cgroup *memcg = NULL; struct zswap_pool *pool; - size_t compressed_bytes = 0; bool ret = false; long index; @@ -1564,20 +1567,14 @@ bool zswap_store(struct folio *folio) for (index = 0; index < nr_pages; ++index) { struct page *page = folio_page(folio, index); - ssize_t bytes; - bytes = zswap_store_page(page, objcg, pool); - if (bytes < 0) + if (!zswap_store_page(page, objcg, pool)) goto put_pool; - compressed_bytes += bytes; } - if (objcg) { - obj_cgroup_charge_zswap(objcg, compressed_bytes); + if (objcg) count_objcg_events(objcg, ZSWPOUT, nr_pages); - } - atomic_long_add(nr_pages, &zswap_stored_pages); count_vm_events(ZSWPOUT, nr_pages); ret = true; _ Patches currently in -mm which might be from 42.hyeyoo(a)gmail.com are

6 months, 1 week

1
0
0 0

[merged mm-hotfixes-stable] lib-iov_iter-fix-import_iovec_ubuf-iovec-management.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: lib/iov_iter: fix import_iovec_ubuf iovec management has been removed from the -mm tree. Its filename was lib-iov_iter-fix-import_iovec_ubuf-iovec-management.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Pavel Begunkov <asml.silence(a)gmail.com> Subject: lib/iov_iter: fix import_iovec_ubuf iovec management Date: Fri, 31 Jan 2025 14:13:15 +0000 import_iovec() says that it should always be fine to kfree the iovec returned in @iovp regardless of the error code. __import_iovec_ubuf() never reallocates it and thus should clear the pointer even in cases when copy_iovec_*() fail. Link: https://lkml.kernel.org/r/378ae26923ffc20fd5e41b4360d673bf47b1775b.17383324… Fixes: 3b2deb0e46da ("iov_iter: import single vector iovecs as ITER_UBUF") Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Reviewed-by: Jens Axboe <axboe(a)kernel.dk> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/iov_iter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/lib/iov_iter.c~lib-iov_iter-fix-import_iovec_ubuf-iovec-management +++ a/lib/iov_iter.c @@ -1428,6 +1428,8 @@ static ssize_t __import_iovec_ubuf(int t struct iovec *iov = *iovp; ssize_t ret; + *iovp = NULL; + if (compat) ret = copy_compat_iovec_from_user(iov, uvec, 1); else @@ -1438,7 +1440,6 @@ static ssize_t __import_iovec_ubuf(int t ret = import_ubuf(type, iov->iov_base, iov->iov_len, i); if (unlikely(ret)) return ret; - *iovp = NULL; return i->count; } _ Patches currently in -mm which might be from asml.silence(a)gmail.com are

6 months, 1 week

1
0
0 0

+ hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio has been added to the -mm mm-hotfixes-unstable branch. Its filename is hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ma Wupeng <mawupeng1(a)huawei.com> Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio Date: Mon, 17 Feb 2025 09:43:29 +0800 Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined) add page poison checks in do_migrate_range in order to make offline hwpoisoned page possible by introducing isolate_lru_page and try_to_unmap for hwpoisoned page. However folio lock must be held before calling try_to_unmap. Add it to fix this problem. Warning will be produced if folio is not locked during unmap: ------------[ cut here ]------------ kernel BUG at ./include/linux/swapops.h:400! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 Tainted: [W]=WARN Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : try_to_unmap_one+0xb08/0xd3c lr : try_to_unmap_one+0x3dc/0xd3c Call trace: try_to_unmap_one+0xb08/0xd3c (P) try_to_unmap_one+0x3dc/0xd3c (L) rmap_walk_anon+0xdc/0x1f8 rmap_walk+0x3c/0x58 try_to_unmap+0x88/0x90 unmap_poisoned_folio+0x30/0xa8 do_migrate_range+0x4a0/0x568 offline_pages+0x5a4/0x670 memory_block_action+0x17c/0x374 memory_subsys_offline+0x3c/0x78 device_offline+0xa4/0xd0 state_store+0x8c/0xf0 dev_attr_store+0x18/0x2c sysfs_kf_write+0x44/0x54 kernfs_fop_write_iter+0x118/0x1a8 vfs_write+0x3a8/0x4bc ksys_write+0x6c/0xf8 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x100 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0xc8/0xcc el0t_64_sync+0x198/0x19c Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory_hotplug.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/mm/memory_hotplug.c~hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio +++ a/mm/memory_hotplug.c @@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned lo (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); - if (folio_mapped(folio)) + if (folio_mapped(folio)) { + folio_lock(folio); unmap_poisoned_folio(folio, pfn, false); + folio_unlock(folio); + } goto put_folio; } _ Patches currently in -mm which might be from mawupeng1(a)huawei.com are mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch

6 months, 1 week

1
0
0 0

+ mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ma Wupeng <mawupeng1(a)huawei.com> Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range Date: Mon, 17 Feb 2025 09:43:28 +0800 If a folio has an increased reference count, folio_try_get() will acquire it, perform necessary operations, and then release it. In the case of a poisoned folio without an elevated reference count (which is unlikely for memory-failure), folio_try_get() will simply bypass it. Therefore, relocate the folio_try_get() function, responsible for checking and acquiring this reference count at first. Link: https://lkml.kernel.org/r/20250217014329.3610326-3-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory_hotplug.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) --- a/mm/memory_hotplug.c~mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range +++ a/mm/memory_hotplug.c @@ -1822,12 +1822,12 @@ static void do_migrate_range(unsigned lo if (folio_test_large(folio)) pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; - /* - * HWPoison pages have elevated reference counts so the migration would - * fail on them. It also doesn't make any sense to migrate them in the - * first place. Still try to unmap such a page in case it is still mapped - * (keep the unmap as the catch all safety net). - */ + if (!folio_try_get(folio)) + continue; + + if (unlikely(page_folio(page) != folio)) + goto put_folio; + if (folio_test_hwpoison(folio) || (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { if (WARN_ON(folio_test_lru(folio))) @@ -1835,14 +1835,8 @@ static void do_migrate_range(unsigned lo if (folio_mapped(folio)) unmap_poisoned_folio(folio, pfn, false); - continue; - } - - if (!folio_try_get(folio)) - continue; - - if (unlikely(page_folio(page) != folio)) goto put_folio; + } if (!isolate_folio_to_list(folio, &source)) { if (__ratelimit(&migrate_rs)) { _ Patches currently in -mm which might be from mawupeng1(a)huawei.com are mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch

6 months, 1 week

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2025