The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 650768c512faba8070bf4cfbb28c95eb5cd203f3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062304-oyster-overhang-6204@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 650768c512faba8070bf4cfbb28c95eb5cd203f3 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 27 May 2025 13:56:33 +0530
Subject: [PATCH] arm64: Restrict pagetable teardown to avoid false warning
Commit 9c006972c3fe ("arm64: mmu: drop pXd_present() checks from
pXd_free_pYd_table()") removes the pxd_present() checks because the
caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the
caller only checks pud_present(); pud_free_pmd_page() recurses on each
pmd through pmd_free_pte_page(), wherein the pmd may be none. Thus it is
possible to hit a warning in the latter, since pmd_none => !pmd_table().
Thus, add a pmd_present() check in pud_free_pmd_page().
This problem was found by code inspection.
Fixes: 9c006972c3fe ("arm64: mmu: drop pXd_present() checks from pXd_free_pYd_table()")
Cc: stable(a)vger.kernel.org
Reported-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Link: https://lore.kernel.org/r/20250527082633.61073-1-dev.jain@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8fcf59ba39db..00ab1d648db6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1305,7 +1305,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
next = addr;
end = addr + PUD_SIZE;
do {
- pmd_free_pte_page(pmdp, next);
+ if (pmd_present(pmdp_get(pmdp)))
+ pmd_free_pte_page(pmdp, next);
} while (pmdp++, next += PMD_SIZE, next != end);
pud_clear(pudp);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 650768c512faba8070bf4cfbb28c95eb5cd203f3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062304-prune-getup-2943@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 650768c512faba8070bf4cfbb28c95eb5cd203f3 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 27 May 2025 13:56:33 +0530
Subject: [PATCH] arm64: Restrict pagetable teardown to avoid false warning
Commit 9c006972c3fe ("arm64: mmu: drop pXd_present() checks from
pXd_free_pYd_table()") removes the pxd_present() checks because the
caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the
caller only checks pud_present(); pud_free_pmd_page() recurses on each
pmd through pmd_free_pte_page(), wherein the pmd may be none. Thus it is
possible to hit a warning in the latter, since pmd_none => !pmd_table().
Thus, add a pmd_present() check in pud_free_pmd_page().
This problem was found by code inspection.
Fixes: 9c006972c3fe ("arm64: mmu: drop pXd_present() checks from pXd_free_pYd_table()")
Cc: stable(a)vger.kernel.org
Reported-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Link: https://lore.kernel.org/r/20250527082633.61073-1-dev.jain@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8fcf59ba39db..00ab1d648db6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1305,7 +1305,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
next = addr;
end = addr + PUD_SIZE;
do {
- pmd_free_pte_page(pmdp, next);
+ if (pmd_present(pmdp_get(pmdp)))
+ pmd_free_pte_page(pmdp, next);
} while (pmdp++, next += PMD_SIZE, next != end);
pud_clear(pudp);
Some libc's like musl libc don't provide execinfo.h since it's not part
of POSIX. In order to fix compilation on musl, only include execinfo.h
if available (HAVE_BACKTRACE_SUPPORT)
This was discovered with c104c16073b7 ("Kunit to check the longest symbol length")
which starts to include linux/kallsyms.h with Alpine Linux' configs.
Signed-off-by: Achill Gilgenast <fossdd(a)pwned.life>
Cc: stable(a)vger.kernel.org
---
tools/include/linux/kallsyms.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/include/linux/kallsyms.h b/tools/include/linux/kallsyms.h
index 5a37ccbec54f..f61a01dd7eb7 100644
--- a/tools/include/linux/kallsyms.h
+++ b/tools/include/linux/kallsyms.h
@@ -18,6 +18,7 @@ static inline const char *kallsyms_lookup(unsigned long addr,
return NULL;
}
+#ifdef HAVE_BACKTRACE_SUPPORT
#include <execinfo.h>
#include <stdlib.h>
static inline void print_ip_sym(const char *loglvl, unsigned long ip)
@@ -30,5 +31,8 @@ static inline void print_ip_sym(const char *loglvl, unsigned long ip)
free(name);
}
+#else
+static inline void print_ip_sym(const char *loglvl, unsigned long ip) {}
+#endif
#endif
--
2.50.0
From: Lance Yang <lance.yang(a)linux.dev>
As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
may read past the end of a PTE table when a large folio's PTE mappings
are not fully contained within a single page table.
While this scenario might be rare, an issue triggerable from userspace must
be fixed regardless of its likelihood. This patch fixes the out-of-bounds
access by refactoring the logic into a new helper, folio_unmap_pte_batch().
The new helper correctly calculates the safe batch size by capping the scan
at both the VMA and PMD boundaries. To simplify the code, it also supports
partial batching (i.e., any number of pages from 1 up to the calculated
safe maximum), as there is no strong reason to special-case for fully
mapped folios.
[1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redha…
Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Cc: <stable(a)vger.kernel.org>
Acked-by: Barry Song <baohua(a)kernel.org>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: Barry Song <baohua(a)kernel.org>
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
---
v2 -> v3:
- Tweak changelog (per Barry and David)
- Pick AB from Barry - thanks!
- https://lore.kernel.org/linux-mm/20250627062319.84936-1-lance.yang@linux.dev
v1 -> v2:
- Update subject and changelog (per Barry)
- https://lore.kernel.org/linux-mm/20250627025214.30887-1-lance.yang@linux.dev
mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index fb63d9256f09..1320b88fab74 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page,
#endif
}
-/* We support batch unmapping of PTEs for lazyfree large folios */
-static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
- struct folio *folio, pte_t *ptep)
+static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
+ struct page_vma_mapped_walk *pvmw,
+ enum ttu_flags flags, pte_t pte)
{
const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
- int max_nr = folio_nr_pages(folio);
- pte_t pte = ptep_get(ptep);
+ unsigned long end_addr, addr = pvmw->address;
+ struct vm_area_struct *vma = pvmw->vma;
+ unsigned int max_nr;
+
+ if (flags & TTU_HWPOISON)
+ return 1;
+ if (!folio_test_large(folio))
+ return 1;
+ /* We may only batch within a single VMA and a single page table. */
+ end_addr = pmd_addr_end(addr, vma->vm_end);
+ max_nr = (end_addr - addr) >> PAGE_SHIFT;
+
+ /* We only support lazyfree batching for now ... */
if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
- return false;
+ return 1;
if (pte_unused(pte))
- return false;
- if (pte_pfn(pte) != folio_pfn(folio))
- return false;
+ return 1;
- return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
- NULL, NULL) == max_nr;
+ return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
+ NULL, NULL, NULL);
}
/*
@@ -2024,9 +2033,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (pte_dirty(pteval))
folio_mark_dirty(folio);
} else if (likely(pte_present(pteval))) {
- if (folio_test_large(folio) && !(flags & TTU_HWPOISON) &&
- can_batch_unmap_folio_ptes(address, folio, pvmw.pte))
- nr_pages = folio_nr_pages(folio);
+ nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
end_addr = address + nr_pages * PAGE_SIZE;
flush_cache_range(vma, address, end_addr);
@@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
hugetlb_remove_rmap(folio);
} else {
folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
- folio_ref_sub(folio, nr_pages - 1);
}
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
- folio_put(folio);
- /* We have already batched the entire folio */
- if (nr_pages > 1)
+ folio_put_refs(folio, nr_pages);
+
+ /*
+ * If we are sure that we batched the entire folio and cleared
+ * all PTEs, we can just optimize and stop right here.
+ */
+ if (nr_pages == folio_nr_pages(folio))
goto walk_done;
continue;
walk_abort:
--
2.49.0
Zero is a valid value for "preempt_dynamic_mode", namely
"preempt_dynamic_none".
Fix the off-by-one in preempt_model_str(), so that "preempty_dynamic_none"
is correctly formatted as PREEMPT(none) instead of PREEMPT(undef).
Fixes: 8bdc5daaa01e ("sched: Add a generic function to return the preemption string")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Tested-by: Shrikanth Hegde <sshegde(a)linux.ibm.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
Changes in v2:
- Pick up Reviewed-by and Tested-by
- Rebase on v6.16-rc1
- Link to v1: https://lore.kernel.org/r/20250603-preempt-str-none-v1-1-f0e9916dcf44@linut…
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dce50fa57471dffc4311b9d393ae300a43d38d20..021b0a703d094b3386c5ba50e0e111e3a7c2b3df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7663,7 +7663,7 @@ const char *preempt_model_str(void)
if (IS_ENABLED(CONFIG_PREEMPT_DYNAMIC)) {
seq_buf_printf(&s, "(%s)%s",
- preempt_dynamic_mode > 0 ?
+ preempt_dynamic_mode >= 0 ?
preempt_modes[preempt_dynamic_mode] : "undef",
brace ? "}" : "");
return seq_buf_str(&s);
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250603-preempt-str-none-d21231cc2238
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
From: Dan Carpenter <dan.carpenter(a)linaro.org>
commit fa332f5dc6fc662ad7d3200048772c96b861cf6b upstream
The "intf" list iterator is an invalid pointer if the correct
"intf->intf_num" is not found. Calling atomic_dec(&intf->nr_users) on
and invalid pointer will lead to memory corruption.
We don't really need to call atomic_dec() if we haven't called
atomic_add_return() so update the if (intf->in_shutdown) path as well.
Fixes: 8e76741c3d8b ("ipmi: Add a limit on the number of users that may use IPMI")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Message-ID: <aBjMZ8RYrOt6NOgi(a)stanley.mountain>
Signed-off-by: Corey Minyard <corey(a)minyard.net>
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
I have tested this in 6.12 with Google's platform drivers added to
reproduce the bug. The bug causes the panic notifier chain to get
corrupted leading to a crash. With the fix this goes away.
Applies to 6.6 too but I haven't tested it there.
Backport changes:
- Dropped change to the `if (intf->in_shutdown)` block since that logic
doesn't exist yet.
- Modified out_unlock to release the srcu lock instead of the mutex
since we don't have the mutex here yet.
---
drivers/char/ipmi/ipmi_msghandler.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index e12b531f5c2f338008a42dc2c35b0a62395c9f3c..6a4a8ecd0edd02793eda70f9f9ae578e37da477f 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -1241,7 +1241,7 @@ int ipmi_create_user(unsigned int if_num,
}
/* Not found, return an error */
rv = -EINVAL;
- goto out_kfree;
+ goto out_unlock;
found:
if (atomic_add_return(1, &intf->nr_users) > max_users) {
@@ -1283,6 +1283,7 @@ int ipmi_create_user(unsigned int if_num,
out_kfree:
atomic_dec(&intf->nr_users);
+out_unlock:
srcu_read_unlock(&ipmi_interfaces_srcu, index);
vfree(new_user);
return rv;
---
base-commit: 783cd2c3dca8b6c434e955b84c20c8940588dc68
change-id: 20250630-ipmi-fix-c565f7098afd
Best regards,
--
Brendan Jackman <jackmanb(a)google.com>
Only select ARCH_WANT_HUGE_PMD_SHARE if hugetlb page table sharing is
actually possible; page table sharing requires at least three levels,
because it involves shared references to PMD tables.
Having ARCH_WANT_HUGE_PMD_SHARE enabled on non-PAE 32-bit X86 (which
has 2-level paging) became particularly problematic after commit
59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count"),
since that changes `struct ptdesc` such that the `pt_mm` (for PGDs) and
the `pt_share_count` (for PMDs) share the same union storage - and with
2-level paging, PMDs are PGDs.
(For comparison, arm64 also gates ARCH_WANT_HUGE_PMD_SHARE on the
configuration of page tables such that it is never enabled with 2-level
paging.)
Reported-by: Vitaly Chikunov <vt(a)altlinux.org>
Closes: https://lore.kernel.org/r/srhpjxlqfna67blvma5frmy3aa@altlinux.org
Fixes: cfe28c5d63d8 ("x86: mm: Remove x86 version of huge_pmd_share.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jann Horn <jannh(a)google.com>
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 71019b3b54ea..917f523b994b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -147,7 +147,7 @@ config X86
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANTS_NO_INSTR
select ARCH_WANT_GENERAL_HUGETLB
- select ARCH_WANT_HUGE_PMD_SHARE
+ select ARCH_WANT_HUGE_PMD_SHARE if PGTABLE_LEVELS > 2
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if X86_64
select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64
---
base-commit: d0b3b7b22dfa1f4b515fd3a295b3fd958f9e81af
change-id: 20250630-x86-2level-hugetlb-b1d8feb255ce
--
Jann Horn <jannh(a)google.com>
When sysctl_nr_open is set to a very high value (for example, 1073741816
as set by systemd), processes attempting to use file descriptors near
the limit can trigger massive memory allocation attempts that exceed
INT_MAX, resulting in a WARNING in mm/slub.c:
WARNING: CPU: 0 PID: 44 at mm/slub.c:5027 __kvmalloc_node_noprof+0x21a/0x288
This happens because kvmalloc_array() and kvmalloc() check if the
requested size exceeds INT_MAX and emit a warning when the allocation is
not flagged with __GFP_NOWARN.
Specifically, when nr_open is set to 1073741816 (0x3ffffff8) and a
process calls dup2(oldfd, 1073741880), the kernel attempts to allocate:
- File descriptor array: 1073741880 * 8 bytes = 8,589,935,040 bytes
- Multiple bitmaps: ~400MB
- Total allocation size: > 8GB (exceeding INT_MAX = 2,147,483,647)
Reproducer:
1. Set /proc/sys/fs/nr_open to 1073741816:
# echo 1073741816 > /proc/sys/fs/nr_open
2. Run a program that uses a high file descriptor:
#include <unistd.h>
#include <sys/resource.h>
int main() {
struct rlimit rlim = {1073741824, 1073741824};
setrlimit(RLIMIT_NOFILE, &rlim);
dup2(2, 1073741880); // Triggers the warning
return 0;
}
3. Observe WARNING in dmesg at mm/slub.c:5027
systemd commit a8b627a introduced automatic bumping of fs.nr_open to the
maximum possible value. The rationale was that systems with memory
control groups (memcg) no longer need separate file descriptor limits
since memory is properly accounted. However, this change overlooked
that:
1. The kernel's allocation functions still enforce INT_MAX as a maximum
size regardless of memcg accounting
2. Programs and tests that legitimately test file descriptor limits can
inadvertently trigger massive allocations
3. The resulting allocations (>8GB) are impractical and will always fail
systemd's algorithm starts with INT_MAX and keeps halving the value
until the kernel accepts it. On most systems, this results in nr_open
being set to 1073741816 (0x3ffffff8), which is just under 1GB of file
descriptors.
While processes rarely use file descriptors near this limit in normal
operation, certain selftests (like
tools/testing/selftests/core/unshare_test.c) and programs that test file
descriptor limits can trigger this issue.
Fix this by adding a check in alloc_fdtable() to ensure the requested
allocation size does not exceed INT_MAX. This causes the operation to
fail with -EMFILE instead of triggering a kernel warning and avoids the
impractical >8GB memory allocation request.
Fixes: 9cfe015aa424 ("get rid of NR_OPEN and introduce a sysctl_nr_open")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/file.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/fs/file.c b/fs/file.c
index b6db031545e65..6d2275c3be9c6 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -197,6 +197,21 @@ static struct fdtable *alloc_fdtable(unsigned int slots_wanted)
return ERR_PTR(-EMFILE);
}
+ /*
+ * Check if the allocation size would exceed INT_MAX. kvmalloc_array()
+ * and kvmalloc() will warn if the allocation size is greater than
+ * INT_MAX, as filp_cachep objects are not __GFP_NOWARN.
+ *
+ * This can happen when sysctl_nr_open is set to a very high value and
+ * a process tries to use a file descriptor near that limit. For example,
+ * if sysctl_nr_open is set to 1073741816 (0x3ffffff8) - which is what
+ * systemd typically sets it to - then trying to use a file descriptor
+ * close to that value will require allocating a file descriptor table
+ * that exceeds 8GB in size.
+ */
+ if (unlikely(nr > INT_MAX / sizeof(struct file *)))
+ return ERR_PTR(-EMFILE);
+
fdt = kmalloc(sizeof(struct fdtable), GFP_KERNEL_ACCOUNT);
if (!fdt)
goto out;
--
2.39.5