The patch titled
Subject: mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vma-fix-anon_vma-uaf-on-mremap-faulted-unfaulted-merge.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
Date: Fri, 2 Jan 2026 20:55:20 +0000
Commit 879bca0a2c4f ("mm/vma: fix incorrectly disallowed anonymous VMA
merges") introduced the ability to merge previously unavailable VMA merge
scenarios.
The key piece of logic introduced was the ability to merge a faulted VMA
immediately next to an unfaulted VMA, which relies upon dup_anon_vma() to
correctly handle anon_vma state.
In the case of the merge of an existing VMA (that is changing properties
of a VMA and then merging if those properties are shared by adjacent
VMAs), dup_anon_vma() is invoked correctly.
However in the case of the merge of a new VMA, a corner case peculiar to
mremap() was missed.
The issue is that vma_expand() only performs dup_anon_vma() if the target
(the VMA that will ultimately become the merged VMA): is not the next VMA,
i.e. the one that appears after the range in which the new VMA is to be
established.
A key insight here is that in all other cases other than mremap(), a new
VMA merge either expands an existing VMA, meaning that the target VMA will
be that VMA, or would have anon_vma be NULL.
Specifically:
* __mmap_region() - no anon_vma in place, initial mapping.
* do_brk_flags() - expanding an existing VMA.
* vma_merge_extend() - expanding an existing VMA.
* relocate_vma_down() - no anon_vma in place, initial mapping.
In addition, we are in the unique situation of needing to duplicate
anon_vma state from a VMA that is neither the previous or next VMA being
merged with.
To account for this, introduce a new field in struct vma_merge_struct
specifically for the mremap() case, and update vma_expand() to explicitly
check for this case and invoke dup_anon_vma() to ensure anon_vma state is
correctly propagated.
This issue can be observed most directly by invoked mremap() to move
around a VMA and cause this kind of merge with the MREMAP_DONTUNMAP flag
specified.
This will result in unlink_anon_vmas() being called after failing to
duplicate anon_vma state to the target VMA, which results in the anon_vma
itself being freed with folios still possessing dangling pointers to the
anon_vma and thus a use-after-free bug.
This bug was discovered via a syzbot report, which this patch resolves.
The following program reproduces the issue (and is fixed by this patch):
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#define RESERVED_PGS (100)
#define VMA_A_PGS (10)
#define VMA_B_PGS (10)
#define NUM_ITERS (1000)
static void trigger_bug(void)
{
unsigned long page_size = sysconf(_SC_PAGE_SIZE);
char *reserved, *ptr_a, *ptr_b;
/*
* The goal here is to achieve:
*
* mremap() with MREMAP_DONTUNMAP such that A and B merge:
*
* |-------------------------|
* | |
* | |-----------| |---------|
* v | unfaulted | | faulted |
* |-----------| |---------|
* B A
*
* Then unmap VMA A to trigger the bug.
*/
/* Reserve a region of memory to operate in. */
reserved = mmap(NULL, RESERVED_PGS * page_size, PROT_NONE,
MAP_PRIVATE | MAP_ANON, -1, 0);
if (reserved == MAP_FAILED) {
perror("mmap reserved");
exit(EXIT_FAILURE);
}
/* Map VMA A into place. */
ptr_a = mmap(&reserved[page_size], VMA_A_PGS * page_size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0);
if (ptr_a == MAP_FAILED) {
perror("mmap VMA A");
exit(EXIT_FAILURE);
}
/* Fault it in. */
ptr_a[0] = 'x';
/*
* Now move it out of the way so we can place VMA B in position,
* unfaulted.
*/
ptr_a = mremap(ptr_a, VMA_A_PGS * page_size, VMA_A_PGS * page_size,
MREMAP_FIXED | MREMAP_MAYMOVE, &reserved[50 * page_size]);
if (ptr_a == MAP_FAILED) {
perror("mremap VMA A out of the way");
exit(EXIT_FAILURE);
}
/* Map VMA B into place. */
ptr_b = mmap(&reserved[page_size + VMA_A_PGS * page_size],
VMA_B_PGS * page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0);
if (ptr_b == MAP_FAILED) {
perror("mmap VMA B");
exit(EXIT_FAILURE);
}
/* Now move VMA A into position w/MREMAP_DONTUNMAP + free anon_vma. */
ptr_a = mremap(ptr_a, VMA_A_PGS * page_size, VMA_A_PGS * page_size,
MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP,
&reserved[page_size]);
if (ptr_a == MAP_FAILED) {
perror("mremap VMA A with MREMAP_DONTUNMAP");
exit(EXIT_FAILURE);
}
/* Finally, unmap VMA A which should trigger the bug. */
munmap(ptr_a, VMA_A_PGS * page_size);
/* Cleanup in case bug didn't trigger sufficiently visibly... */
munmap(reserved, RESERVED_PGS * page_size);
}
int main(void)
{
int i;
for (i = 0; i < NUM_ITERS; i++)
trigger_bug();
return EXIT_SUCCESS;
}
Link: https://lkml.kernel.org/r/20260102205520.986725-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Fixes: 879bca0a2c4f ("mm/vma: fix incorrectly disallowed anonymous VMA merges")
Reported-by: syzbot+b165fc2e11771c66d8ba(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/694a2745.050a0220.19928e.0017.GAE@google.com/
Cc: David Hildenbrand (Red Hat) <david(a)kernel.org>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jeongjun Park <aha310510(a)gmail.com>
Cc: levi.yun <yeoreum.yun(a)arm.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Pedro Falcato <pfalcato(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vma.c | 58 ++++++++++++++++++++++++++++++++++++++++-------------
mm/vma.h | 3 ++
2 files changed, 47 insertions(+), 14 deletions(-)
--- a/mm/vma.c~mm-vma-fix-anon_vma-uaf-on-mremap-faulted-unfaulted-merge
+++ a/mm/vma.c
@@ -1130,26 +1130,50 @@ int vma_expand(struct vma_merge_struct *
mmap_assert_write_locked(vmg->mm);
vma_start_write(target);
- if (next && (target != next) && (vmg->end == next->vm_end)) {
+ if (next && vmg->end == next->vm_end) {
+ struct vm_area_struct *copied_from = vmg->copied_from;
int ret;
- sticky_flags |= next->vm_flags & VM_STICKY;
- remove_next = true;
- /* This should already have been checked by this point. */
- VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
- vma_start_write(next);
- /*
- * In this case we don't report OOM, so vmg->give_up_on_mm is
- * safe.
- */
- ret = dup_anon_vma(target, next, &anon_dup);
- if (ret)
- return ret;
+ if (target != next) {
+ sticky_flags |= next->vm_flags & VM_STICKY;
+ remove_next = true;
+ /* This should already have been checked by this point. */
+ VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
+ vma_start_write(next);
+ /*
+ * In this case we don't report OOM, so vmg->give_up_on_mm is
+ * safe.
+ */
+ ret = dup_anon_vma(target, next, &anon_dup);
+ if (ret)
+ return ret;
+ } else if (copied_from) {
+ vma_start_write(next);
+
+ /*
+ * We are copying from a VMA (i.e. mremap()'ing) to
+ * next, and thus must ensure that either anon_vma's are
+ * already compatible (in which case this call is a nop)
+ * or all anon_vma state is propagated to next
+ */
+ ret = dup_anon_vma(next, copied_from, &anon_dup);
+ if (ret)
+ return ret;
+ } else {
+ /* In no other case may the anon_vma differ. */
+ VM_WARN_ON_VMG(target->anon_vma != next->anon_vma, vmg);
+ }
}
/* Not merging but overwriting any part of next is not handled. */
VM_WARN_ON_VMG(next && !remove_next &&
next != target && vmg->end > next->vm_start, vmg);
+ /*
+ * We should only see a copy with next as the target on a new merge
+ * which sets the end to the next of next.
+ */
+ VM_WARN_ON_VMG(target == next && vmg->copied_from &&
+ vmg->end != next->vm_end, vmg);
/* Only handles expanding */
VM_WARN_ON_VMG(target->vm_start < vmg->start ||
target->vm_end > vmg->end, vmg);
@@ -1808,6 +1832,13 @@ struct vm_area_struct *copy_vma(struct v
VMG_VMA_STATE(vmg, &vmi, NULL, vma, addr, addr + len);
/*
+ * VMG_VMA_STATE() installs vma in middle, but this is a new VMA, inform
+ * merging logic correctly.
+ */
+ vmg.copied_from = vma;
+ vmg.middle = NULL;
+
+ /*
* If anonymous vma has not yet been faulted, update new pgoff
* to match new location, to increase its chance of merging.
*/
@@ -1828,7 +1859,6 @@ struct vm_area_struct *copy_vma(struct v
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */
- vmg.middle = NULL; /* New VMA range. */
vmg.pgoff = pgoff;
vmg.next = vma_iter_next_rewind(&vmi, NULL);
new_vma = vma_merge_new_range(&vmg);
--- a/mm/vma.h~mm-vma-fix-anon_vma-uaf-on-mremap-faulted-unfaulted-merge
+++ a/mm/vma.h
@@ -106,6 +106,9 @@ struct vma_merge_struct {
struct anon_vma_name *anon_name;
enum vma_merge_state state;
+ /* If we are copying a VMA, which VMA are we copying from? */
+ struct vm_area_struct *copied_from;
+
/* Flags which callers can use to modify merge behaviour: */
/*
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-vma-fix-anon_vma-uaf-on-mremap-faulted-unfaulted-merge.patch
The patch below does not apply to the 6.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.18.y
git checkout FETCH_HEAD
git cherry-pick -x 8a0e4bdddd1c998b894d879a1d22f1e745606215
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122925-victory-numeral-2346@gregkh' --subject-prefix 'PATCH 6.18.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a0e4bdddd1c998b894d879a1d22f1e745606215 Mon Sep 17 00:00:00 2001
From: Wei Yang <richard.weiyang(a)gmail.com>
Date: Thu, 6 Nov 2025 03:41:55 +0000
Subject: [PATCH] mm/huge_memory: merge uniform_split_supported() and
non_uniform_split_supported()
uniform_split_supported() and non_uniform_split_supported() share
significantly similar logic.
The only functional difference is that uniform_split_supported() includes
an additional check on the requested @new_order.
The reason for this check comes from the following two aspects:
* some file system or swap cache just supports order-0 folio
* the behavioral difference between uniform/non-uniform split
The behavioral difference between uniform split and non-uniform:
* uniform split splits folio directly to @new_order
* non-uniform split creates after-split folios with orders from
folio_order(folio) - 1 to new_order.
This means for non-uniform split or !new_order split we should check the
file system and swap cache respectively.
This commit unifies the logic and merge the two functions into a single
combined helper, removing redundant code and simplifying the split
support checking mechanism.
Link: https://lkml.kernel.org/r/20251106034155.21398-3-richard.weiyang@gmail.com
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: "David Hildenbrand (Red Hat)" <david(a)kernel.org>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Lance Yang <lance.yang(a)linux.dev>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Nico Pache <npache(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b74708dc5b5f..19d4a5f52ca2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -374,10 +374,8 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
unsigned int new_order, bool unmapped);
int min_order_for_split(struct folio *folio);
int split_folio_to_list(struct folio *folio, struct list_head *list);
-bool uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns);
-bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns);
+bool folio_split_supported(struct folio *folio, unsigned int new_order,
+ enum split_type split_type, bool warns);
int folio_split(struct folio *folio, unsigned int new_order, struct page *page,
struct list_head *list);
@@ -408,7 +406,7 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
static inline int try_folio_split_to_order(struct folio *folio,
struct page *page, unsigned int new_order)
{
- if (!non_uniform_split_supported(folio, new_order, /* warns= */ false))
+ if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
return split_huge_page_to_order(&folio->page, new_order);
return folio_split(folio, new_order, page, NULL);
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4118f330c55e..d79a4bb363de 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3593,8 +3593,8 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
return 0;
}
-bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns)
+bool folio_split_supported(struct folio *folio, unsigned int new_order,
+ enum split_type split_type, bool warns)
{
if (folio_test_anon(folio)) {
/* order-1 is not supported for anonymous THP. */
@@ -3602,48 +3602,41 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
"Cannot split to order-1 folio");
if (new_order == 1)
return false;
- } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
- !mapping_large_folio_support(folio->mapping)) {
- /*
- * No split if the file system does not support large folio.
- * Note that we might still have THPs in such mappings due to
- * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
- * does not actually support large folios properly.
- */
- VM_WARN_ONCE(warns,
- "Cannot split file folio to non-0 order");
- return false;
- }
-
- /* Only swapping a whole PMD-mapped folio is supported */
- if (folio_test_swapcache(folio)) {
- VM_WARN_ONCE(warns,
- "Cannot split swapcache folio to non-0 order");
- return false;
- }
-
- return true;
-}
-
-/* See comments in non_uniform_split_supported() */
-bool uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns)
-{
- if (folio_test_anon(folio)) {
- VM_WARN_ONCE(warns && new_order == 1,
- "Cannot split to order-1 folio");
- if (new_order == 1)
- return false;
- } else if (new_order) {
+ } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
!mapping_large_folio_support(folio->mapping)) {
+ /*
+ * We can always split a folio down to a single page
+ * (new_order == 0) uniformly.
+ *
+ * For any other scenario
+ * a) uniform split targeting a large folio
+ * (new_order > 0)
+ * b) any non-uniform split
+ * we must confirm that the file system supports large
+ * folios.
+ *
+ * Note that we might still have THPs in such
+ * mappings, which is created from khugepaged when
+ * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
+ * case, the mapping does not actually support large
+ * folios properly.
+ */
VM_WARN_ONCE(warns,
"Cannot split file folio to non-0 order");
return false;
}
}
- if (new_order && folio_test_swapcache(folio)) {
+ /*
+ * swapcache folio could only be split to order 0
+ *
+ * non-uniform split creates after-split folios with orders from
+ * folio_order(folio) - 1 to new_order, making it not suitable for any
+ * swapcache folio split. Only uniform split to order-0 can be used
+ * here.
+ */
+ if ((split_type == SPLIT_TYPE_NON_UNIFORM || new_order) && folio_test_swapcache(folio)) {
VM_WARN_ONCE(warns,
"Cannot split swapcache folio to non-0 order");
return false;
@@ -3711,11 +3704,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
if (new_order >= old_order)
return -EINVAL;
- if (split_type == SPLIT_TYPE_UNIFORM && !uniform_split_supported(folio, new_order, true))
- return -EINVAL;
-
- if (split_type == SPLIT_TYPE_NON_UNIFORM &&
- !non_uniform_split_supported(folio, new_order, true))
+ if (!folio_split_supported(folio, new_order, split_type, /* warn = */ true))
return -EINVAL;
is_hzp = is_huge_zero_folio(folio);
Synopsys renamed DWC_usb32 IP to DWC_usb4 as of IP version 1.30. No
functional change except checking for the IP_NAME here. The driver will
treat the new IP_NAME as if it's DWC_usb32. Additional features for USB4
will be introduced and checked separately.
Cc: stable(a)vger.kernel.org
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
drivers/usb/dwc3/core.c | 2 ++
drivers/usb/dwc3/core.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 96f85eada047..f71b75465a60 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -993,6 +993,8 @@ static bool dwc3_core_is_valid(struct dwc3 *dwc)
reg = dwc3_readl(dwc->regs, DWC3_GSNPSID);
dwc->ip = DWC3_GSNPS_ID(reg);
+ if (dwc->ip == DWC4_IP)
+ dwc->ip = DWC32_IP;
/* This should read as U3 followed by revision number */
if (DWC3_IP_IS(DWC3)) {
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index a5fc92c4ffa3..45757169b672 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1265,6 +1265,7 @@ struct dwc3 {
#define DWC3_IP 0x5533
#define DWC31_IP 0x3331
#define DWC32_IP 0x3332
+#define DWC4_IP 0x3430
u32 revision;
base-commit: 18514fd70ea4ca9de137bb3bceeac1bac4bcad75
--
2.28.0
Hi,
syzbot reported a circular locking dependency in the NET/ROM routing
code involving nr_neigh_list_lock, nr_node_list_lock and
nr_node->node_lock when nr_rt_device_down() interacts with the
ioctl path. This series fixes that deadlock and also addresses a
long-standing reference count leak found while auditing the same
code.
Patch 1/2 refactors nr_rt_device_down() to avoid nested locking
between nr_neigh_list_lock and nr_node_list_lock by doing two
separate passes over nodes and neighbours, and adjusts nr_rt_free()
to follow the same lock ordering.
Patch 2/2 fixes a per-route reference count leak by dropping
nr_neigh->count and calling nr_neigh_put() when removing routes
from nr_rt_device_down(), mirroring the behaviour of
nr_dec_obs()/nr_del_node().
[1] https://syzkaller.appspot.com/bug?extid=14afda08dc3484d5db82
Thanks,
Junjie
Hi stable maintainers,
I have tried backporting some fixes to stable kernel 6.12.y which also
have CVE numbers and are fixing commits in 6.12.y.
I am not a subsystem expert and have only done overall testing that we
do for stable release candidate testing and not any patch specific testing.
Note: All these patches are present backports from upstream.
PATCH 1: The broken commit is in 6.12.y, and the fix is a clean
cherry-pick and addresses CVE-2025-68206
PATCH 2: The broken commit is present in 6.12.y and the fix is a clean
cherry-pick and addresses CVE-2025-40325.
PATCH 3: The broken commit is present in 6.12.y and backport needed a
minor conflict resolution due to missing commit fe69a3918084
("drm/panthor: Fix UAF in panthor_gem_create_with_handle() debugfs
in 6.12.y
PATCH 4,5,6: Patch 4 and 5 are pulled in as prerequisites for PATCH 6 which
is a fix for CVE-2025-40170 and needed a minor conflict resolution due to missing
commit: 22d6c9eebf2e ("net: Unexport shared functions for DCCP.") in 6.12.y
PATCH 7: The broken commit in present in 6.12.y and the backport of the
fix needed a minor conflict resolution due to missing commit in 6.12.y.
This is fix for CVE-2025-40164.
Please let me know if there are any comments.
Regards,
Harshit
Andrii Melnychenko (1):
netfilter: nft_ct: add seqadj extension for natted connections
Boris Brezillon (1):
drm/panthor: Flush shmem writes before mapping buffers CPU-uncached
Eric Dumazet (2):
ipv6: adopt dst_dev() helper
net: use dst_dev_rcu() in sk_setup_caps()
Justin Iurman (1):
net: ipv6: ioam6: use consistent dst names
Xiao Ni (1):
md/raid10: wait barrier before returning discard request with
REQ_NOWAIT
Zqiang (1):
usbnet: Fix using smp_processor_id() in preemptible code warnings
drivers/gpu/drm/panthor/panthor_gem.c | 18 +++++++++++++
drivers/md/raid10.c | 3 +--
drivers/net/usb/usbnet.c | 2 ++
include/net/ip.h | 6 +++--
include/net/ip6_route.h | 4 +--
include/net/route.h | 2 +-
net/core/sock.c | 16 +++++++-----
net/ipv6/exthdrs.c | 2 +-
net/ipv6/icmp.c | 4 ++-
net/ipv6/ila/ila_lwt.c | 2 +-
net/ipv6/ioam6_iptunnel.c | 37 ++++++++++++++-------------
net/ipv6/ip6_gre.c | 8 +++---
net/ipv6/ip6_output.c | 19 +++++++-------
net/ipv6/ip6_tunnel.c | 4 +--
net/ipv6/ip6_udp_tunnel.c | 2 +-
net/ipv6/ip6_vti.c | 2 +-
net/ipv6/ndisc.c | 6 +++--
net/ipv6/netfilter/nf_dup_ipv6.c | 2 +-
net/ipv6/output_core.c | 2 +-
net/ipv6/route.c | 20 +++++++++------
net/ipv6/rpl_iptunnel.c | 4 +--
net/ipv6/seg6_iptunnel.c | 20 ++++++++-------
net/ipv6/seg6_local.c | 2 +-
net/netfilter/nft_ct.c | 5 ++++
24 files changed, 118 insertions(+), 74 deletions(-)
--
2.50.1
On x86_64:
When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>" we observe a pafe fault that happens.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) – not-present page
This happens on x86_64 only, as this is already fixed in aarch64 in
commit: cbf9c4b9617b ("of: check previous kernel's ima-kexec-buffer
against memory bounds")
V1: https://lore.kernel.org/all/20251112193005.3772542-1-harshit.m.mogalapalli@…
V1 attempted to do a similar sanity check in x86_64. Borislav suggested
to add a generic helper ima_validate_range() which could then be used
for both OF based and x86_64.
Testing information:
--------------------
On x86_64: With latest 6.19-rc2 based, we could reproduce the issue, and
patched kernel works fine. (with mem=8G on a 16G memory machine)
Thanks to Yifei for finding enabling IMA_KEXEC is the cause.
Thanks for the reviews on V1.
V1 -> V2:
- Patch 1: Add a generic helper "ima_validate_range()"
- Patch 2: Use this new helper in drivers/of/kexec.c -> No functional
change.
- Patch 3: Fix the page fault by doing sanity check with
"ima_validate_range()"
V2: https://lore.kernel.org/all/20251229081523.622515-1-harshit.m.mogalapalli@o…
V2 -> V3:
Update subject of Patch 1 to more appropriate one (Suggested by Mimi
Zohar)
Thanks,
Harshit
Harshit Mogalapalli (3):
ima: verify the previous kernel's IMA buffer lies in addressable RAM
of/kexec: refactor ima_get_kexec_buffer() to use ima_validate_range()
x86/kexec: Add a sanity check on previous kernel's ima kexec buffer
arch/x86/kernel/setup.c | 6 +++++
drivers/of/kexec.c | 15 +++----------
include/linux/ima.h | 1 +
security/integrity/ima/ima_kexec.c | 35 ++++++++++++++++++++++++++++++
4 files changed, 45 insertions(+), 12 deletions(-)
--
2.50.1
When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback()
schedules delayed work with a delay of 0, causing immediate execution.
The function then reschedules itself with 0 delay again, creating an
infinite busy loop that causes 100% kworker CPU usage.
Fix by:
- Only scheduling delayed work in wakeup_dirtytime_writeback() when
dirtytime_expire_interval is non-zero
- Cancelling the delayed work in dirtytime_interval_handler() when
the interval is set to 0
- Adding a guard in start_dirtytime_writeback() for defensive coding
Tested by booting kernel in QEMU with virtme-ng:
- Before fix: kworker CPU spikes to ~73%
- After fix: CPU remains at normal levels
- Setting interval back to non-zero correctly resumes writeback
Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written")
Cc: stable(a)vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227
Signed-off-by: Laveesh Bansal <laveeshb(a)laveeshbansal.com>
---
fs/fs-writeback.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6800886c4d10..cd21c74cd0e5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2492,7 +2492,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w)
wb_wakeup(wb);
}
rcu_read_unlock();
- schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+ if (dirtytime_expire_interval)
+ schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
}
static int dirtytime_interval_handler(const struct ctl_table *table, int write,
@@ -2501,8 +2502,12 @@ static int dirtytime_interval_handler(const struct ctl_table *table, int write,
int ret;
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (ret == 0 && write)
- mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+ if (ret == 0 && write) {
+ if (dirtytime_expire_interval)
+ mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+ else
+ cancel_delayed_work_sync(&dirtytime_work);
+ }
return ret;
}
@@ -2519,7 +2524,8 @@ static const struct ctl_table vm_fs_writeback_table[] = {
static int __init start_dirtytime_writeback(void)
{
- schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+ if (dirtytime_expire_interval)
+ schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
register_sysctl_init("vm", vm_fs_writeback_table);
return 0;
}
--
2.43.0
From: Laveesh Bansal <laveeshbansal(a)gmail.com>
When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback()
schedules delayed work with a delay of 0, causing immediate execution.
The function then reschedules itself with 0 delay again, creating an
infinite busy loop that causes 100% kworker CPU usage.
Fix by:
- Only scheduling delayed work in wakeup_dirtytime_writeback() when
dirtytime_expire_interval is non-zero
- Cancelling the delayed work in dirtytime_interval_handler() when
the interval is set to 0
- Adding a guard in start_dirtytime_writeback() for defensive coding
Tested by booting kernel in QEMU with virtme-ng:
- Before fix: kworker CPU spikes to ~73%
- After fix: CPU remains at normal levels
- Setting interval back to non-zero correctly resumes writeback
Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written")
Cc: stable(a)vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227
Signed-off-by: Laveesh Bansal <laveeshbansal(a)gmail.com>
---
fs/fs-writeback.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6800886c4d10..cd21c74cd0e5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2492,7 +2492,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w)
wb_wakeup(wb);
}
rcu_read_unlock();
- schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+ if (dirtytime_expire_interval)
+ schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
}
static int dirtytime_interval_handler(const struct ctl_table *table, int write,
@@ -2501,8 +2502,12 @@ static int dirtytime_interval_handler(const struct ctl_table *table, int write,
int ret;
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (ret == 0 && write)
- mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+ if (ret == 0 && write) {
+ if (dirtytime_expire_interval)
+ mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+ else
+ cancel_delayed_work_sync(&dirtytime_work);
+ }
return ret;
}
@@ -2519,7 +2524,8 @@ static const struct ctl_table vm_fs_writeback_table[] = {
static int __init start_dirtytime_writeback(void)
{
- schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+ if (dirtytime_expire_interval)
+ schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
register_sysctl_init("vm", vm_fs_writeback_table);
return 0;
}
--
2.43.0