The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 409f45387c937145adeeeebc6d6032c2ec232b35
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021856-heave-blade-985e@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 409f45387c937145adeeeebc6d6032c2ec232b35 Mon Sep 17 00:00:00 2001
From: Ashish Kalra <ashish.kalra(a)amd.com>
Date: Mon, 10 Feb 2025 22:54:18 +0000
Subject: [PATCH] x86/sev: Fix broken SNP support with KVM module built-in
Fix issues with enabling SNP host support and effectively SNP support
which is broken with respect to the KVM module being built-in.
SNP host support is enabled in snp_rmptable_init() which is invoked as
device_initcall(). SNP check on IOMMU is done during IOMMU PCI init
(IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is
currently invoked via device_initcall() and cannot be invoked via
subsys_initcall() as core IOMMU subsystem gets initialized via
subsys_initcall().
Now, if kvm_amd module is built-in, it gets initialized before SNP host
support is enabled in snp_rmptable_init() :
[ 10.131811] kvm_amd: TSC scaling supported
[ 10.136384] kvm_amd: Nested Virtualization enabled
[ 10.141734] kvm_amd: Nested Paging enabled
[ 10.146304] kvm_amd: LBR virtualization supported
[ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509)
[ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99)
[ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99)
[ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported
[ 10.177052] kvm_amd: Virtual GIF supported
...
...
[ 10.201648] kvm_amd: in svm_enable_virtualization_cpu
And then svm_x86_ops->enable_virtualization_cpu()
(svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following:
wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa);
So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs.
snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu()
as following :
...
[ 11.256138] kvm_amd: in svm_enable_virtualization_cpu
...
[ 11.264918] SEV-SNP: in snp_rmptable_init
This triggers a #GP exception in snp_rmptable_init() when snp_enable()
is invoked to set SNP_EN in SYSCFG MSR:
[ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30)
...
[ 11.294404] Call Trace:
[ 11.294482] <IRQ>
[ 11.294513] ? show_stack_regs+0x26/0x30
[ 11.294522] ? ex_handler_msr+0x10f/0x180
[ 11.294529] ? search_extable+0x2b/0x40
[ 11.294538] ? fixup_exception+0x2dd/0x340
[ 11.294542] ? exc_general_protection+0x14f/0x440
[ 11.294550] ? asm_exc_general_protection+0x2b/0x30
[ 11.294557] ? __pfx_snp_enable+0x10/0x10
[ 11.294567] ? native_write_msr+0x8/0x30
[ 11.294570] ? __snp_enable+0x5d/0x70
[ 11.294575] snp_enable+0x19/0x20
[ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0
[ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20
[ 11.294589] __sysvec_call_function+0x20/0x90
[ 11.294596] sysvec_call_function+0x80/0xb0
[ 11.294601] </IRQ>
[ 11.294603] <TASK>
[ 11.294605] asm_sysvec_call_function+0x1f/0x30
...
[ 11.294631] arch_cpu_idle+0xd/0x20
[ 11.294633] default_idle_call+0x34/0xd0
[ 11.294636] do_idle+0x1f1/0x230
[ 11.294643] ? complete+0x71/0x80
[ 11.294649] cpu_startup_entry+0x30/0x40
[ 11.294652] start_secondary+0x12d/0x160
[ 11.294655] common_startup_64+0x13e/0x141
[ 11.294662] </TASK>
This #GP exception is getting triggered due to the following errata for
AMD family 19h Models 10h-1Fh Processors:
Processor may generate spurious #GP(0) Exception on WRMSR instruction:
Description:
The Processor will generate a spurious #GP(0) Exception on a WRMSR
instruction if the following conditions are all met:
- the target of the WRMSR is a SYSCFG register.
- the write changes the value of SYSCFG.SNPEn from 0 to 1.
- One of the threads that share the physical core has a non-zero
value in the VM_HSAVE_PA MSR.
The document being referred to above:
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revisi…
To summarize, with kvm_amd module being built-in, KVM/SVM initialization
happens before host SNP is enabled and this SVM initialization
sets VM_HSAVE_PA to non-zero, which then triggers a #GP when
SYSCFG.SNPEn is being set and this will subsequently cause
SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not
set on all CPUs.
Essentially SNP host enabling code should be invoked before KVM
initialization, which is currently not the case when KVM is built-in.
Add fix to call snp_rmptable_init() early from iommu_snp_enable()
directly and not invoked via device_initcall() which enables SNP host
support before KVM initialization with kvm_amd module built-in.
Add additional handling for `iommu=off` or `amd_iommu=off` options.
Note that IOMMUs need to be enabled for SNP initialization, therefore,
if host SNP support is enabled but late IOMMU initialization fails
then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity
checks in SNP firmware will fail with invalid configuration error as
below:
[ 9.723114] ccp 0000:23:00.1: sev enabled
[ 9.727602] ccp 0000:23:00.1: psp enabled
[ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002)
[ 9.739098] ccp 0000:a2:00.1: no command queues available
[ 9.745167] ccp 0000:a2:00.1: psp enabled
[ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3
[ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5
Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Co-developed-by: Vasant Hegde <vasant.hegde(a)amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde(a)amd.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com>
Acked-by: Joerg Roedel <jroedel(a)suse.de>
Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra(a)amd.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 5d9685f92e5c..1581246491b5 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -531,6 +531,7 @@ static inline void __init snp_secure_tsc_init(void) { }
#ifdef CONFIG_KVM_AMD_SEV
bool snp_probe_rmptable_info(void);
+int snp_rmptable_init(void);
int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
void snp_dump_hva_rmpentry(unsigned long address);
int psmash(u64 pfn);
@@ -541,6 +542,7 @@ void kdump_sev_callback(void);
void snp_fixup_e820_tables(void);
#else
static inline bool snp_probe_rmptable_info(void) { return false; }
+static inline int snp_rmptable_init(void) { return -ENOSYS; }
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
static inline void snp_dump_hva_rmpentry(unsigned long address) {}
static inline int psmash(u64 pfn) { return -ENODEV; }
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 1dcc027ec77e..42e74a5a7d78 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -505,19 +505,19 @@ static bool __init setup_rmptable(void)
* described in the SNP_INIT_EX firmware command description in the SNP
* firmware ABI spec.
*/
-static int __init snp_rmptable_init(void)
+int __init snp_rmptable_init(void)
{
unsigned int i;
u64 val;
- if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
- return 0;
+ if (WARN_ON_ONCE(!cc_platform_has(CC_ATTR_HOST_SEV_SNP)))
+ return -ENOSYS;
- if (!amd_iommu_snp_en)
- goto nosnp;
+ if (WARN_ON_ONCE(!amd_iommu_snp_en))
+ return -ENOSYS;
if (!setup_rmptable())
- goto nosnp;
+ return -ENOSYS;
/*
* Check if SEV-SNP is already enabled, this can happen in case of
@@ -530,7 +530,7 @@ static int __init snp_rmptable_init(void)
/* Zero out the RMP bookkeeping area */
if (!clear_rmptable_bookkeeping()) {
free_rmp_segment_table();
- goto nosnp;
+ return -ENOSYS;
}
/* Zero out the RMP entries */
@@ -562,17 +562,8 @@ static int __init snp_rmptable_init(void)
crash_kexec_post_notifiers = true;
return 0;
-
-nosnp:
- cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
- return -ENOSYS;
}
-/*
- * This must be called after the IOMMU has been initialized.
- */
-device_initcall(snp_rmptable_init);
-
static void set_rmp_segment_info(unsigned int segment_shift)
{
rmp_segment_shift = segment_shift;
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c5cd92edada0..2fecfed75e54 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3194,7 +3194,7 @@ static bool __init detect_ivrs(void)
return true;
}
-static void iommu_snp_enable(void)
+static __init void iommu_snp_enable(void)
{
#ifdef CONFIG_KVM_AMD_SEV
if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
@@ -3219,6 +3219,14 @@ static void iommu_snp_enable(void)
goto disable_snp;
}
+ /*
+ * Enable host SNP support once SNP support is checked on IOMMU.
+ */
+ if (snp_rmptable_init()) {
+ pr_warn("SNP: RMP initialization failed, SNP cannot be supported.\n");
+ goto disable_snp;
+ }
+
pr_info("IOMMU SNP support enabled.\n");
return;
@@ -3318,6 +3326,19 @@ static int __init iommu_go_to_state(enum iommu_init_state state)
ret = state_next();
}
+ /*
+ * SNP platform initilazation requires IOMMUs to be fully configured.
+ * If the SNP support on IOMMUs has NOT been checked, simply mark SNP
+ * as unsupported. If the SNP support on IOMMUs has been checked and
+ * host SNP support enabled but RMP enforcement has not been enabled
+ * in IOMMUs, then the system is in a half-baked state, but can limp
+ * along as all memory should be Hypervisor-Owned in the RMP. WARN,
+ * but leave SNP as "supported" to avoid confusing the kernel.
+ */
+ if (ret && cc_platform_has(CC_ATTR_HOST_SEV_SNP) &&
+ !WARN_ON_ONCE(amd_iommu_snp_en))
+ cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
+
return ret;
}
@@ -3426,18 +3447,23 @@ void __init amd_iommu_detect(void)
int ret;
if (no_iommu || (iommu_detected && !gart_iommu_aperture))
- return;
+ goto disable_snp;
if (!amd_iommu_sme_check())
- return;
+ goto disable_snp;
ret = iommu_go_to_state(IOMMU_IVRS_DETECTED);
if (ret)
- return;
+ goto disable_snp;
amd_iommu_detected = true;
iommu_detected = 1;
x86_init.iommu.iommu_init = amd_iommu_init;
+ return;
+
+disable_snp:
+ if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
+ cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
}
/****************************************************************************
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021837-volumes-twig-2514@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 29 Jan 2025 17:08:25 -0800
Subject: [PATCH] KVM: nSVM: Enter guest mode before initializing nested NPT
MMU
When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest
mode prior to initializing the MMU for nested NPT so that guest_mode is
set in the MMU's role. KVM's model is that all L2 MMUs are tagged with
guest_mode, as the behavior of hypervisor MMUs tends to be significantly
different than kernel MMUs.
Practically speaking, the bug is relatively benign, as KVM only directly
queries role.guest_mode in kvm_mmu_free_guest_mode_roots() and
kvm_mmu_page_ad_need_write_protect(), which SVM doesn't use, and in paths
that are optimizations (mmu_page_zap_pte() and
shadow_mmu_try_split_huge_pages()).
And while the role is incorprated into shadow page usage, because nested
NPT requires KVM to be using NPT for L1, reusing shadow pages across L1
and L2 is impossible as L1 MMUs will always have direct=1, while L2 MMUs
will have direct=0.
Hoist the TLB processing and setting of HF_GUEST_MASK to the beginning
of the flow instead of forcing guest_mode in the MMU, as nothing in
nested_vmcb02_prepare_control() between the old and new locations touches
TLB flush requests or HF_GUEST_MASK, i.e. there's no reason to present
inconsistent vCPU state to the MMU.
Fixes: 69cb877487de ("KVM: nSVM: move MMU setup to nested_prepare_vmcb_control")
Cc: stable(a)vger.kernel.org
Reported-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Link: https://lore.kernel.org/r/20250130010825.220346-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 74c20dbb92da..d4ac4a1f8b81 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5540,7 +5540,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
union kvm_mmu_page_role root_role;
/* NPT requires CR0.PG=1. */
- WARN_ON_ONCE(cpu_role.base.direct);
+ WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode);
root_role = cpu_role.base;
root_role.level = kvm_mmu_get_tdp_level(vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d77b094d9a4d..04c375bf1ac2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -646,6 +646,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
u32 pause_count12;
u32 pause_thresh12;
+ nested_svm_transition_tlb_flush(vcpu);
+
+ /* Enter Guest-Mode */
+ enter_guest_mode(vcpu);
+
/*
* Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2,
* exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
@@ -762,11 +767,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
}
}
- nested_svm_transition_tlb_flush(vcpu);
-
- /* Enter Guest-Mode */
- enter_guest_mode(vcpu);
-
/*
* Merge guest and host intercepts - must be called with vcpu in
* guest-mode to take effect.
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021824-strict-motivator-41ae@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46d6c6f3ef0eaff71c2db6d77d4e2ebb7adac34f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 29 Jan 2025 17:08:25 -0800
Subject: [PATCH] KVM: nSVM: Enter guest mode before initializing nested NPT
MMU
When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest
mode prior to initializing the MMU for nested NPT so that guest_mode is
set in the MMU's role. KVM's model is that all L2 MMUs are tagged with
guest_mode, as the behavior of hypervisor MMUs tends to be significantly
different than kernel MMUs.
Practically speaking, the bug is relatively benign, as KVM only directly
queries role.guest_mode in kvm_mmu_free_guest_mode_roots() and
kvm_mmu_page_ad_need_write_protect(), which SVM doesn't use, and in paths
that are optimizations (mmu_page_zap_pte() and
shadow_mmu_try_split_huge_pages()).
And while the role is incorprated into shadow page usage, because nested
NPT requires KVM to be using NPT for L1, reusing shadow pages across L1
and L2 is impossible as L1 MMUs will always have direct=1, while L2 MMUs
will have direct=0.
Hoist the TLB processing and setting of HF_GUEST_MASK to the beginning
of the flow instead of forcing guest_mode in the MMU, as nothing in
nested_vmcb02_prepare_control() between the old and new locations touches
TLB flush requests or HF_GUEST_MASK, i.e. there's no reason to present
inconsistent vCPU state to the MMU.
Fixes: 69cb877487de ("KVM: nSVM: move MMU setup to nested_prepare_vmcb_control")
Cc: stable(a)vger.kernel.org
Reported-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Link: https://lore.kernel.org/r/20250130010825.220346-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 74c20dbb92da..d4ac4a1f8b81 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5540,7 +5540,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
union kvm_mmu_page_role root_role;
/* NPT requires CR0.PG=1. */
- WARN_ON_ONCE(cpu_role.base.direct);
+ WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode);
root_role = cpu_role.base;
root_role.level = kvm_mmu_get_tdp_level(vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d77b094d9a4d..04c375bf1ac2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -646,6 +646,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
u32 pause_count12;
u32 pause_thresh12;
+ nested_svm_transition_tlb_flush(vcpu);
+
+ /* Enter Guest-Mode */
+ enter_guest_mode(vcpu);
+
/*
* Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2,
* exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
@@ -762,11 +767,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
}
}
- nested_svm_transition_tlb_flush(vcpu);
-
- /* Enter Guest-Mode */
- enter_guest_mode(vcpu);
-
/*
* Merge guest and host intercepts - must be called with vcpu in
* guest-mode to take effect.
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a8de7f100bb5989d9c3627d3a223ee1c863f3b69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021851-nautical-buffoon-d8b1@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a8de7f100bb5989d9c3627d3a223ee1c863f3b69 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 17 Jan 2025 16:34:51 -0800
Subject: [PATCH] KVM: x86: Reject Hyper-V's SEND_IPI hypercalls if local APIC
isn't in-kernel
Advertise support for Hyper-V's SEND_IPI and SEND_IPI_EX hypercalls if and
only if the local API is emulated/virtualized by KVM, and explicitly reject
said hypercalls if the local APIC is emulated in userspace, i.e. don't rely
on userspace to opt-in to KVM_CAP_HYPERV_ENFORCE_CPUID.
Rejecting SEND_IPI and SEND_IPI_EX fixes a NULL-pointer dereference if
Hyper-V enlightenments are exposed to the guest without an in-kernel local
APIC:
dump_stack+0xbe/0xfd
__kasan_report.cold+0x34/0x84
kasan_report+0x3a/0x50
__apic_accept_irq+0x3a/0x5c0
kvm_hv_send_ipi.isra.0+0x34e/0x820
kvm_hv_hypercall+0x8d9/0x9d0
kvm_emulate_hypercall+0x506/0x7e0
__vmx_handle_exit+0x283/0xb60
vmx_handle_exit+0x1d/0xd0
vcpu_enter_guest+0x16b0/0x24c0
vcpu_run+0xc0/0x550
kvm_arch_vcpu_ioctl_run+0x170/0x6d0
kvm_vcpu_ioctl+0x413/0xb20
__se_sys_ioctl+0x111/0x160
do_syscal1_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Note, checking the sending vCPU is sufficient, as the per-VM irqchip_mode
can't be modified after vCPUs are created, i.e. if one vCPU has an
in-kernel local APIC, then all vCPUs have an in-kernel local APIC.
Reported-by: Dongjie Zou <zoudongjie(a)huawei.com>
Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20250118003454.2619573-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6a6dd5a84f22..6ebeb6cea6c0 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2226,6 +2226,9 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
u32 vector;
bool all_cpus;
+ if (!lapic_in_kernel(vcpu))
+ return HV_STATUS_INVALID_HYPERCALL_INPUT;
+
if (hc->code == HVCALL_SEND_IPI) {
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi,
@@ -2852,7 +2855,8 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
- ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
+ if (!vcpu || lapic_in_kernel(vcpu))
+ ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
if (evmcs_ver)
ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a8de7f100bb5989d9c3627d3a223ee1c863f3b69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021850-exes-cabana-e868@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a8de7f100bb5989d9c3627d3a223ee1c863f3b69 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 17 Jan 2025 16:34:51 -0800
Subject: [PATCH] KVM: x86: Reject Hyper-V's SEND_IPI hypercalls if local APIC
isn't in-kernel
Advertise support for Hyper-V's SEND_IPI and SEND_IPI_EX hypercalls if and
only if the local API is emulated/virtualized by KVM, and explicitly reject
said hypercalls if the local APIC is emulated in userspace, i.e. don't rely
on userspace to opt-in to KVM_CAP_HYPERV_ENFORCE_CPUID.
Rejecting SEND_IPI and SEND_IPI_EX fixes a NULL-pointer dereference if
Hyper-V enlightenments are exposed to the guest without an in-kernel local
APIC:
dump_stack+0xbe/0xfd
__kasan_report.cold+0x34/0x84
kasan_report+0x3a/0x50
__apic_accept_irq+0x3a/0x5c0
kvm_hv_send_ipi.isra.0+0x34e/0x820
kvm_hv_hypercall+0x8d9/0x9d0
kvm_emulate_hypercall+0x506/0x7e0
__vmx_handle_exit+0x283/0xb60
vmx_handle_exit+0x1d/0xd0
vcpu_enter_guest+0x16b0/0x24c0
vcpu_run+0xc0/0x550
kvm_arch_vcpu_ioctl_run+0x170/0x6d0
kvm_vcpu_ioctl+0x413/0xb20
__se_sys_ioctl+0x111/0x160
do_syscal1_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Note, checking the sending vCPU is sufficient, as the per-VM irqchip_mode
can't be modified after vCPUs are created, i.e. if one vCPU has an
in-kernel local APIC, then all vCPUs have an in-kernel local APIC.
Reported-by: Dongjie Zou <zoudongjie(a)huawei.com>
Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20250118003454.2619573-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6a6dd5a84f22..6ebeb6cea6c0 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2226,6 +2226,9 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
u32 vector;
bool all_cpus;
+ if (!lapic_in_kernel(vcpu))
+ return HV_STATUS_INVALID_HYPERCALL_INPUT;
+
if (hc->code == HVCALL_SEND_IPI) {
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi,
@@ -2852,7 +2855,8 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->eax |= HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
ent->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
ent->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
- ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
+ if (!vcpu || lapic_in_kernel(vcpu))
+ ent->eax |= HV_X64_CLUSTER_IPI_RECOMMENDED;
ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
if (evmcs_ver)
ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x e977499820782ab1c69f354d9f41b6d9ad1f43d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021815-sublease-unneeded-0077@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e977499820782ab1c69f354d9f41b6d9ad1f43d9 Mon Sep 17 00:00:00 2001
From: Nirmoy Das <nirmoy.das(a)intel.com>
Date: Mon, 10 Feb 2025 15:36:54 +0100
Subject: [PATCH] drm/xe: Carve out wopcm portion from the stolen memory
The top of stolen memory is WOPCM, which shouldn't be accessed. Remove
this portion from the stolen memory region for discrete platforms.
This was already done for integrated, but was missing for discrete
platforms.
This also moves get_wopcm_size() so detect_bar2_dgfx() and
detect_bar2_integrated can use the same function.
v2: Improve commit message and suitable stable version tag(Lucas)
Fixes: d8b52a02cb40 ("drm/xe: Implement stolen memory.")
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: stable(a)vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250210143654.2076747-1-nirm…
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
(cherry picked from commit 2c7f45cc7e197a792ce5c693e56ea48f60b312da)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index 423856cc18d4..d414421f8c13 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -57,38 +57,6 @@ bool xe_ttm_stolen_cpu_access_needs_ggtt(struct xe_device *xe)
return GRAPHICS_VERx100(xe) < 1270 && !IS_DGFX(xe);
}
-static s64 detect_bar2_dgfx(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr)
-{
- struct xe_tile *tile = xe_device_get_root_tile(xe);
- struct xe_mmio *mmio = xe_root_tile_mmio(xe);
- struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
- u64 stolen_size;
- u64 tile_offset;
- u64 tile_size;
-
- tile_offset = tile->mem.vram.io_start - xe->mem.vram.io_start;
- tile_size = tile->mem.vram.actual_physical_size;
-
- /* Use DSM base address instead for stolen memory */
- mgr->stolen_base = (xe_mmio_read64_2x32(mmio, DSMBASE) & BDSM_MASK) - tile_offset;
- if (drm_WARN_ON(&xe->drm, tile_size < mgr->stolen_base))
- return 0;
-
- stolen_size = tile_size - mgr->stolen_base;
-
- /* Verify usage fits in the actual resource available */
- if (mgr->stolen_base + stolen_size <= pci_resource_len(pdev, LMEM_BAR))
- mgr->io_base = tile->mem.vram.io_start + mgr->stolen_base;
-
- /*
- * There may be few KB of platform dependent reserved memory at the end
- * of vram which is not part of the DSM. Such reserved memory portion is
- * always less then DSM granularity so align down the stolen_size to DSM
- * granularity to accommodate such reserve vram portion.
- */
- return ALIGN_DOWN(stolen_size, SZ_1M);
-}
-
static u32 get_wopcm_size(struct xe_device *xe)
{
u32 wopcm_size;
@@ -112,6 +80,44 @@ static u32 get_wopcm_size(struct xe_device *xe)
return wopcm_size;
}
+static s64 detect_bar2_dgfx(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr)
+{
+ struct xe_tile *tile = xe_device_get_root_tile(xe);
+ struct xe_mmio *mmio = xe_root_tile_mmio(xe);
+ struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+ u64 stolen_size, wopcm_size;
+ u64 tile_offset;
+ u64 tile_size;
+
+ tile_offset = tile->mem.vram.io_start - xe->mem.vram.io_start;
+ tile_size = tile->mem.vram.actual_physical_size;
+
+ /* Use DSM base address instead for stolen memory */
+ mgr->stolen_base = (xe_mmio_read64_2x32(mmio, DSMBASE) & BDSM_MASK) - tile_offset;
+ if (drm_WARN_ON(&xe->drm, tile_size < mgr->stolen_base))
+ return 0;
+
+ /* Carve out the top of DSM as it contains the reserved WOPCM region */
+ wopcm_size = get_wopcm_size(xe);
+ if (drm_WARN_ON(&xe->drm, !wopcm_size))
+ return 0;
+
+ stolen_size = tile_size - mgr->stolen_base;
+ stolen_size -= wopcm_size;
+
+ /* Verify usage fits in the actual resource available */
+ if (mgr->stolen_base + stolen_size <= pci_resource_len(pdev, LMEM_BAR))
+ mgr->io_base = tile->mem.vram.io_start + mgr->stolen_base;
+
+ /*
+ * There may be few KB of platform dependent reserved memory at the end
+ * of vram which is not part of the DSM. Such reserved memory portion is
+ * always less then DSM granularity so align down the stolen_size to DSM
+ * granularity to accommodate such reserve vram portion.
+ */
+ return ALIGN_DOWN(stolen_size, SZ_1M);
+}
+
static u32 detect_bar2_integrated(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr)
{
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
There are some issues with the enetc driver, some of which are specific
to the LS1028A platform, and some of which were introduced recently when
i.MX95 ENETC support was added, so this patch set aims to clean up those
issues.
Wei Fang (8):
net: enetc: fix the off-by-one issue in enetc_map_tx_buffs()
net: enetc: correct the tx_swbd statistics
net: enetc: correct the xdp_tx statistics
net: enetc: VFs do not support HWTSTAMP_TX_ONESTEP_SYNC
net: enetc: update UDP checksum when updating originTimestamp field
net: enetc: add missing enetc4_link_deinit()
net: enetc: remove the mm_lock from the ENETC v4 driver
net: enetc: correct the EMDIO base offset for ENETC v4
drivers/net/ethernet/freescale/enetc/enetc.c | 53 +++++++++++++++----
.../net/ethernet/freescale/enetc/enetc4_hw.h | 3 ++
.../net/ethernet/freescale/enetc/enetc4_pf.c | 2 +-
.../ethernet/freescale/enetc/enetc_ethtool.c | 7 ++-
.../freescale/enetc/enetc_pf_common.c | 10 +++-
5 files changed, 60 insertions(+), 15 deletions(-)
--
2.34.1
From: Ge Yang <yangge1116(a)126.com>
Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
the allocation of contiguous memory through cma_alloc() may fail
probabilistically.
In the CMA allocation process, if it is found that the CMA area is occupied
by in-use hugepage folios, these in-use hugepage folios need to be migrated
to another location. When there are no available hugepage folios in the
free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
are allocated from the buddy system. A temporary state is set on the newly
allocated folio. Upon completion of the hugepage folio migration, the
temporary state is transferred from the new folios to the old folios.
Normally, when the old folios with the temporary state are freed, it is
directly released back to the buddy system. However, due to the deferred
freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugepage
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_hugepage_folios_freed(). This function ensures that the hugepage
folios are properly released back to the buddy system after their migration
is completed. By invoking wait_for_hugepage_folios_freed() following the
migration process, we guarantee that when test_pages_isolated() is
executed, it will successfully pass.
Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
---
include/linux/hugetlb.h | 5 +++++
mm/hugetlb.c | 7 +++++++
mm/migrate.c | 16 ++++++++++++++--
3 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6c6546b..c39e0d5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_hugepage_folios_freed(struct hstate *h);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_hugepage_folios_freed(struct hstate *h)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 30bc34d..64cae39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2955,6 +2955,13 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_hugepage_folios_freed(struct hstate *h)
+{
+ WARN_ON(!h);
+
+ flush_free_hpage_work(h);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/migrate.c b/mm/migrate.c
index fb19a18..5dd1851 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1448,6 +1448,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
int page_was_mapped = 0;
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
+ unsigned long size;
if (folio_ref_count(src) == 1) {
/* page was freed from under us. So we are done. */
@@ -1533,9 +1534,20 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
out_unlock:
folio_unlock(src);
out:
- if (rc == MIGRATEPAGE_SUCCESS)
+ if (rc == MIGRATEPAGE_SUCCESS) {
+ size = folio_size(src);
folio_putback_hugetlb(src);
- else if (rc != -EAGAIN)
+
+ /*
+ * Due to the deferred freeing of HugeTLB folios, the hugepage 'src' may
+ * not immediately release to the buddy system. This can lead to failure
+ * in allocating memory through the cma_alloc() function. To ensure that
+ * the hugepage folios are properly released back to the buddy system,
+ * we invoke the wait_for_hugepage_folios_freed() function to wait for
+ * the release to complete.
+ */
+ wait_for_hugepage_folios_freed(size_to_hstate(size));
+ } else if (rc != -EAGAIN)
list_move_tail(&src->lru, ret);
/*
--
2.7.4
From: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
The current solution for powering off the Apalis iMX6 is not functioning
as intended. To resolve this, it is necessary to power off the
vgen2_reg, which will also set the POWER_ENABLE_MOCI signal to a low
state. This ensures the carrier board is properly informed to initiate
its power-off sequence.
The new solution uses the regulator-poweroff driver, which will power
off the regulator during a system shutdown.
CC: stable(a)vger.kernel.org
Fixes: 4eb56e26f92e ("ARM: dts: imx6q-apalis: Command pmic to standby for poweroff")
Signed-off-by: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
---
arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi b/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi
index 1c72da417011..614b65821995 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi
@@ -108,6 +108,11 @@ lvds_panel_in: endpoint {
};
};
+ poweroff {
+ compatible = "regulator-poweroff";
+ cpu-supply = <&vgen2_reg>;
+ };
+
reg_module_3v3: regulator-module-3v3 {
compatible = "regulator-fixed";
regulator-always-on;
@@ -236,10 +241,6 @@ &can2 {
status = "disabled";
};
-&clks {
- fsl,pmic-stby-poweroff;
-};
-
/* Apalis SPI1 */
&ecspi1 {
cs-gpios = <&gpio5 25 GPIO_ACTIVE_LOW>;
@@ -527,7 +528,6 @@ &i2c2 {
pmic: pmic@8 {
compatible = "fsl,pfuze100";
- fsl,pmic-stby-poweroff;
reg = <0x08>;
regulators {
--
2.45.2
The desired clock frequency was correctly set to 400MHz in the device tree
but was lowered by the driver to 300MHz breaking 4K 60Hz content playback.
Fix the issue by removing the driver call to clk_set_rate(), which reduce
the amount of board specific code.
Fixes: 003afda97c65 ("media: verisilicon: Enable AV1 decoder on rk3588")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
---
This patch fixes user report of AV1 4K60 decoder not being fast enough
on RK3588 based SoC. This is a break from Hantro original authors
habbit of coding the frequencies in the driver instead of specifying this
frequency in the device tree. The other calls to clk_set_rate() are left
since this would require modifying many dtsi files, which would then be
unsuitable for backport.
---
Changes in v2:
- Completely remove the unused init function, the driver is null-safe
- Link to v1: https://lore.kernel.org/r/20250217-b4-hantro-av1-clock-rate-v1-1-65b91132c5…
---
drivers/media/platform/verisilicon/rockchip_vpu_hw.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/drivers/media/platform/verisilicon/rockchip_vpu_hw.c b/drivers/media/platform/verisilicon/rockchip_vpu_hw.c
index 964122e7c355934cd80eb442219f6ba51bba8b71..842040f713c15e6ff295771bc9fa5a7b66e584b2 100644
--- a/drivers/media/platform/verisilicon/rockchip_vpu_hw.c
+++ b/drivers/media/platform/verisilicon/rockchip_vpu_hw.c
@@ -17,7 +17,6 @@
#define RK3066_ACLK_MAX_FREQ (300 * 1000 * 1000)
#define RK3288_ACLK_MAX_FREQ (400 * 1000 * 1000)
-#define RK3588_ACLK_MAX_FREQ (300 * 1000 * 1000)
#define ROCKCHIP_VPU981_MIN_SIZE 64
@@ -440,13 +439,6 @@ static int rk3066_vpu_hw_init(struct hantro_dev *vpu)
return 0;
}
-static int rk3588_vpu981_hw_init(struct hantro_dev *vpu)
-{
- /* Bump ACLKs to max. possible freq. to improve performance. */
- clk_set_rate(vpu->clocks[0].clk, RK3588_ACLK_MAX_FREQ);
- return 0;
-}
-
static int rockchip_vpu_hw_init(struct hantro_dev *vpu)
{
/* Bump ACLK to max. possible freq. to improve performance. */
@@ -807,7 +799,6 @@ const struct hantro_variant rk3588_vpu981_variant = {
.codec_ops = rk3588_vpu981_codec_ops,
.irqs = rk3588_vpu981_irqs,
.num_irqs = ARRAY_SIZE(rk3588_vpu981_irqs),
- .init = rk3588_vpu981_hw_init,
.clk_names = rk3588_vpu981_vpu_clk_names,
.num_clocks = ARRAY_SIZE(rk3588_vpu981_vpu_clk_names)
};
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250217-b4-hantro-av1-clock-rate-e5497f1499df
Best regards,
--
Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Our CI built linux-stable-rc.git queue/6.1 at commit eef4a8a45ba1
("btrfs: output the reason for open_ctree() failure").
(built for arm and arm64, albeit I don't think it matters.)
Building perf produced 2 build errors which I wanted to report
even before the RC1 is out.
| ...tools/perf/util/namespaces.c:247:27: error: invalid type argument of '->' (have 'int')
| 247 | RC_CHK_ACCESS(nsi)->in_pidns = true;
| | ^~
introduced by commit 93520bacf784 ("perf namespaces: Introduce
nsinfo__set_in_pidns()"). The RC_CK_ACCSS macro was introduced in 6.4.
Removing the macro made this go away ( nsi->in_pidns = true; ).
Second perf build error:
| ld: ...tools/perf/util/machine.c:1176: undefined reference to `kallsyms__get_symbol_start'
Introduced by commits:
710c2e913aa9 perf machine: Don't ignore _etext when not a text symbol
69a87a32f5cd perf machine: Include data symbols in the kernel map
These two use the function kallsyms__get_symbol_start added with:
f9dd531c5b82 perf symbols: Add kallsyms__get_symbol_start()
So f9dd531c5b82 would additionally be needed.
The kernel itself built fine, due to the perf error we don't have runtime
testresults.
Best regards
Max
I'm announcing the release of the 6.12.15 kernel.
This is ONLY needed for users of the XFS filesystem. If you do not use
XFS, no need to upgrade. If you do use XFS, please upgrade.
The updated 6.12.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.12.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
fs/xfs/xfs_quota.h | 5 +++--
fs/xfs/xfs_trans.c | 10 +++-------
fs/xfs/xfs_trans_dquot.c | 31 ++++++++++++++++++++++++++-----
4 files changed, 33 insertions(+), 15 deletions(-)
Darrick J. Wong (1):
xfs: don't lose solo dquot update transactions
Greg Kroah-Hartman (1):
Linux 6.12.15
The quilt patch titled
Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages
has been removed from the -mm tree. Its filename was
mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Qi Zheng <zhengqi.arch(a)bytedance.com>
Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages
Date: Tue, 11 Feb 2025 15:26:25 +0800
In zap_pte_range(), if the pte lock was released midway, the pte entries
may be refilled with physical pages by another thread, which may cause a
non-empty PTE page to be reclaimed and eventually cause the system to
crash.
To fix it, fall back to the slow path in this case to recheck if all pte
entries are still none.
Link: https://lkml.kernel.org/r/20250211072625.89188-1-zhengqi.arch@bytedance.com
Fixes: 6375e95f381e ("mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)")
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reported-by: Christian Brauner <brauner(a)kernel.org>
Closes: https://lore.kernel.org/all/20250207-anbot-bankfilialen-acce9d79a2c7@braune…
Reported-by: Qu Wenruo <quwenruo.btrfs(a)gmx.com>
Closes: https://lore.kernel.org/all/152296f3-5c81-4a94-97f3-004108fba7be@gmx.com/
Tested-by: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Cc: "Darrick J. Wong" <djwong(a)kernel.org>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
--- a/mm/memory.c~mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages
+++ a/mm/memory.c
@@ -1719,7 +1719,7 @@ static unsigned long zap_pte_range(struc
pmd_t pmdval;
unsigned long start = addr;
bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details);
- bool direct_reclaim = false;
+ bool direct_reclaim = true;
int nr;
retry:
@@ -1734,8 +1734,10 @@ retry:
do {
bool any_skipped = false;
- if (need_resched())
+ if (need_resched()) {
+ direct_reclaim = false;
break;
+ }
nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss,
&force_flush, &force_break, &any_skipped);
@@ -1743,11 +1745,20 @@ retry:
can_reclaim_pt = false;
if (unlikely(force_break)) {
addr += nr * PAGE_SIZE;
+ direct_reclaim = false;
break;
}
} while (pte += nr, addr += PAGE_SIZE * nr, addr != end);
- if (can_reclaim_pt && addr == end)
+ /*
+ * Fast path: try to hold the pmd lock and unmap the PTE page.
+ *
+ * If the pte lock was released midway (retry case), or if the attempt
+ * to hold the pmd lock failed, then we need to recheck all pte entries
+ * to ensure they are still none, thereby preventing the pte entries
+ * from being repopulated by another thread.
+ */
+ if (can_reclaim_pt && direct_reclaim && addr == end)
direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval);
add_mm_rss_vec(mm, rss);
_
Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are
arm-pgtable-fix-null-pointer-dereference-issue.patch
The quilt patch titled
Subject: mm,madvise,hugetlb: check for 0-length range after end address adjustment
has been removed from the -mm tree. Its filename was
mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ricardo Ca��uelo Navarro <rcn(a)igalia.com>
Subject: mm,madvise,hugetlb: check for 0-length range after end address adjustment
Date: Mon, 3 Feb 2025 08:52:06 +0100
Add a sanity check to madvise_dontneed_free() to address a corner case in
madvise where a race condition causes the current vma being processed to
be backed by a different page size.
During a madvise(MADV_DONTNEED) call on a memory region registered with a
userfaultfd, there's a period of time where the process mm lock is
temporarily released in order to send a UFFD_EVENT_REMOVE and let
userspace handle the event. During this time, the vma covering the
current address range may change due to an explicit mmap done concurrently
by another thread.
If, after that change, the memory region, which was originally backed by
4KB pages, is now backed by hugepages, the end address is rounded down to
a hugepage boundary to avoid data loss (see "Fixes" below). This rounding
may cause the end address to be truncated to the same address as the
start.
Make this corner case follow the same semantics as in other similar cases
where the requested region has zero length (ie. return 0).
This will make madvise_walk_vmas() continue to the next vma in the range
(this time holding the process mm lock) which, due to the prev pointer
becoming stale because of the vma change, will be the same hugepage-backed
vma that was just checked before. The next time madvise_dontneed_free()
runs for this vma, if the start address isn't aligned to a hugepage
boundary, it'll return -EINVAL, which is also in line with the madvise
api.
From userspace perspective, madvise() will return EINVAL because the start
address isn't aligned according to the new vma alignment requirements
(hugepage), even though it was correctly page-aligned when the call was
issued.
Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com
Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs")
Signed-off-by: Ricardo Ca��uelo Navarro <rcn(a)igalia.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Florent Revest <revest(a)google.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mmmadvisehugetlb-check-for-0-length-range-after-end-address-adjustment
+++ a/mm/madvise.c
@@ -933,7 +933,16 @@ static long madvise_dontneed_free(struct
*/
end = vma->vm_end;
}
- VM_WARN_ON(start >= end);
+ /*
+ * If the memory region between start and end was
+ * originally backed by 4kB pages and then remapped to
+ * be backed by hugepages while mmap_lock was dropped,
+ * the adjustment for hugetlb vma above may have rounded
+ * end down to the start address.
+ */
+ if (start == end)
+ return 0;
+ VM_WARN_ON(start > end);
}
if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
_
Patches currently in -mm which might be from rcn(a)igalia.com are
The quilt patch titled
Subject: mm/zswap: fix inconsistency when zswap_store_page() fails
has been removed from the -mm tree. Its filename was
mm-zswap-fix-inconsistency-when-zswap_store_page-fails.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Subject: mm/zswap: fix inconsistency when zswap_store_page() fails
Date: Wed, 29 Jan 2025 19:08:44 +0900
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
skips charging any zswap entries when it failed to zswap the entire folio.
However, when some base pages are zswapped but it failed to zswap the
entire folio, the zswap operation is rolled back. When freeing zswap
entries for those pages, zswap_entry_free() uncharges the zswap entries
that were not previously charged, causing zswap charging to become
inconsistent.
This inconsistency triggers two warnings with following steps:
# On a machine with 64GiB of RAM and 36GiB of zswap
$ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
$ sudo reboot
The two warnings are:
in mm/memcontrol.c:163, function obj_cgroup_release():
WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
in mm/page_counter.c:60, function page_counter_cancel():
if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n",
new, nr_pages))
zswap_stored_pages also becomes inconsistent in the same way.
As suggested by Kanchana, increment zswap_stored_pages and charge zswap
entries within zswap_store_page() when it succeeds. This way,
zswap_entry_free() will decrement the counter and uncharge the entries
when it failed to zswap the entire folio.
While this could potentially be optimized by batching objcg charging and
incrementing the counter, let's focus on fixing the bug this time and
leave the optimization for later after some evaluation.
After resolving the inconsistency, the warnings disappear.
[42.hyeyoo(a)gmail.com: refactor zswap_store_page()]
Link: https://lkml.kernel.org/r/20250131082037.2426-1-42.hyeyoo@gmail.com
Link: https://lkml.kernel.org/r/20250129100844.2935-1-42.hyeyoo@gmail.com
Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Acked-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Acked-by: Nhat Pham <nphamcs(a)gmail.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zswap.c | 35 ++++++++++++++++-------------------
1 file changed, 16 insertions(+), 19 deletions(-)
--- a/mm/zswap.c~mm-zswap-fix-inconsistency-when-zswap_store_page-fails
+++ a/mm/zswap.c
@@ -1445,9 +1445,9 @@ resched:
* main API
**********************************/
-static ssize_t zswap_store_page(struct page *page,
- struct obj_cgroup *objcg,
- struct zswap_pool *pool)
+static bool zswap_store_page(struct page *page,
+ struct obj_cgroup *objcg,
+ struct zswap_pool *pool)
{
swp_entry_t page_swpentry = page_swap_entry(page);
struct zswap_entry *entry, *old;
@@ -1456,7 +1456,7 @@ static ssize_t zswap_store_page(struct p
entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
if (!entry) {
zswap_reject_kmemcache_fail++;
- return -EINVAL;
+ return false;
}
if (!zswap_compress(page, entry, pool))
@@ -1483,13 +1483,17 @@ static ssize_t zswap_store_page(struct p
/*
* The entry is successfully compressed and stored in the tree, there is
- * no further possibility of failure. Grab refs to the pool and objcg.
- * These refs will be dropped by zswap_entry_free() when the entry is
- * removed from the tree.
+ * no further possibility of failure. Grab refs to the pool and objcg,
+ * charge zswap memory, and increment zswap_stored_pages.
+ * The opposite actions will be performed by zswap_entry_free()
+ * when the entry is removed from the tree.
*/
zswap_pool_get(pool);
- if (objcg)
+ if (objcg) {
obj_cgroup_get(objcg);
+ obj_cgroup_charge_zswap(objcg, entry->length);
+ }
+ atomic_long_inc(&zswap_stored_pages);
/*
* We finish initializing the entry while it's already in xarray.
@@ -1510,13 +1514,13 @@ static ssize_t zswap_store_page(struct p
zswap_lru_add(&zswap_list_lru, entry);
}
- return entry->length;
+ return true;
store_failed:
zpool_free(pool->zpool, entry->handle);
compress_failed:
zswap_entry_cache_free(entry);
- return -EINVAL;
+ return false;
}
bool zswap_store(struct folio *folio)
@@ -1526,7 +1530,6 @@ bool zswap_store(struct folio *folio)
struct obj_cgroup *objcg = NULL;
struct mem_cgroup *memcg = NULL;
struct zswap_pool *pool;
- size_t compressed_bytes = 0;
bool ret = false;
long index;
@@ -1564,20 +1567,14 @@ bool zswap_store(struct folio *folio)
for (index = 0; index < nr_pages; ++index) {
struct page *page = folio_page(folio, index);
- ssize_t bytes;
- bytes = zswap_store_page(page, objcg, pool);
- if (bytes < 0)
+ if (!zswap_store_page(page, objcg, pool))
goto put_pool;
- compressed_bytes += bytes;
}
- if (objcg) {
- obj_cgroup_charge_zswap(objcg, compressed_bytes);
+ if (objcg)
count_objcg_events(objcg, ZSWPOUT, nr_pages);
- }
- atomic_long_add(nr_pages, &zswap_stored_pages);
count_vm_events(ZSWPOUT, nr_pages);
ret = true;
_
Patches currently in -mm which might be from 42.hyeyoo(a)gmail.com are
The quilt patch titled
Subject: lib/iov_iter: fix import_iovec_ubuf iovec management
has been removed from the -mm tree. Its filename was
lib-iov_iter-fix-import_iovec_ubuf-iovec-management.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Pavel Begunkov <asml.silence(a)gmail.com>
Subject: lib/iov_iter: fix import_iovec_ubuf iovec management
Date: Fri, 31 Jan 2025 14:13:15 +0000
import_iovec() says that it should always be fine to kfree the iovec
returned in @iovp regardless of the error code. __import_iovec_ubuf()
never reallocates it and thus should clear the pointer even in cases when
copy_iovec_*() fail.
Link: https://lkml.kernel.org/r/378ae26923ffc20fd5e41b4360d673bf47b1775b.17383324…
Fixes: 3b2deb0e46da ("iov_iter: import single vector iovecs as ITER_UBUF")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Reviewed-by: Jens Axboe <axboe(a)kernel.dk>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/iov_iter.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/iov_iter.c~lib-iov_iter-fix-import_iovec_ubuf-iovec-management
+++ a/lib/iov_iter.c
@@ -1428,6 +1428,8 @@ static ssize_t __import_iovec_ubuf(int t
struct iovec *iov = *iovp;
ssize_t ret;
+ *iovp = NULL;
+
if (compat)
ret = copy_compat_iovec_from_user(iov, uvec, 1);
else
@@ -1438,7 +1440,6 @@ static ssize_t __import_iovec_ubuf(int t
ret = import_ubuf(type, iov->iov_base, iov->iov_len, i);
if (unlikely(ret))
return ret;
- *iovp = NULL;
return i->count;
}
_
Patches currently in -mm which might be from asml.silence(a)gmail.com are
The patch titled
Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ma Wupeng <mawupeng1(a)huawei.com>
Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
Date: Mon, 17 Feb 2025 09:43:29 +0800
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------
kernel BUG at ./include/linux/swapops.h:400!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0xb08/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0xb08/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com
Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/memory_hotplug.c~hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio
+++ a/mm/memory_hotplug.c
@@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned lo
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);
- if (folio_mapped(folio))
+ if (folio_mapped(folio)) {
+ folio_lock(folio);
unmap_poisoned_folio(folio, pfn, false);
+ folio_unlock(folio);
+ }
goto put_folio;
}
_
Patches currently in -mm which might be from mawupeng1(a)huawei.com are
mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch
mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch
hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch
The patch titled
Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ma Wupeng <mawupeng1(a)huawei.com>
Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range
Date: Mon, 17 Feb 2025 09:43:28 +0800
If a folio has an increased reference count, folio_try_get() will acquire
it, perform necessary operations, and then release it. In the case of a
poisoned folio without an elevated reference count (which is unlikely for
memory-failure), folio_try_get() will simply bypass it.
Therefore, relocate the folio_try_get() function, responsible for checking
and acquiring this reference count at first.
Link: https://lkml.kernel.org/r/20250217014329.3610326-3-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range
+++ a/mm/memory_hotplug.c
@@ -1822,12 +1822,12 @@ static void do_migrate_range(unsigned lo
if (folio_test_large(folio))
pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
- /*
- * HWPoison pages have elevated reference counts so the migration would
- * fail on them. It also doesn't make any sense to migrate them in the
- * first place. Still try to unmap such a page in case it is still mapped
- * (keep the unmap as the catch all safety net).
- */
+ if (!folio_try_get(folio))
+ continue;
+
+ if (unlikely(page_folio(page) != folio))
+ goto put_folio;
+
if (folio_test_hwpoison(folio) ||
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
@@ -1835,14 +1835,8 @@ static void do_migrate_range(unsigned lo
if (folio_mapped(folio))
unmap_poisoned_folio(folio, pfn, false);
- continue;
- }
-
- if (!folio_try_get(folio))
- continue;
-
- if (unlikely(page_folio(page) != folio))
goto put_folio;
+ }
if (!isolate_folio_to_list(folio, &source)) {
if (__ratelimit(&migrate_rs)) {
_
Patches currently in -mm which might be from mawupeng1(a)huawei.com are
mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch
mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch
hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch