The quilt patch titled
Subject: mm: fix uprobe pte be overwritten when expanding vma
has been removed from the -mm tree. Its filename was
mm-fix-uprobe-pte-be-overwritten-when-expanding-vma.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Pu Lehui <pulehui(a)huawei.com>
Subject: mm: fix uprobe pte be overwritten when expanding vma
Date: Thu, 29 May 2025 15:56:47 +0000
Patch series "Fix uprobe pte be overwritten when expanding vma".
This patch (of 4):
We encountered a BUG alert triggered by Syzkaller as follows:
BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
And we can reproduce it with the following steps:
1. register uprobe on file at zero offset
2. mmap the file at zero offset:
addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
3. mremap part of vma1 to new vma2:
addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
4. mremap back to orig addr1:
mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
In step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new vma2
with range [addr2, addr2 + 8192], and remap uprobe anon page from the vma1
to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
In step 4, the vma2 range [addr2, addr2 + 4096] will be remap back to the
addr range [addr1, addr1 + 4096]. Since the addr range [addr1 + 4096,
addr1 + 8192] still maps the file, it will take vma_merge_new_range to
expand the range, and then do uprobe_mmap in vma_complete. Since the
merged vma pgoff is also zero offset, it will install uprobe anon page to
the merged vma. However, the upcomming move_page_tables step, which use
set_pte_at to remap the vma2 uprobe pte to the merged vma, will overwrite
the newly uprobe pte in the merged vma, and lead that pte to be orphan.
Since the uprobe pte will be remapped to the merged vma, we can remove the
unnecessary uprobe_mmap upon merged vma.
This problem was first found in linux-6.6.y and also exists in the
community syzkaller:
https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
Link: https://lkml.kernel.org/r/20250529155650.4017699-1-pulehui@huaweicloud.com
Link: https://lkml.kernel.org/r/20250529155650.4017699-2-pulehui@huaweicloud.com
Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Signed-off-by: Pu Lehui <pulehui(a)huawei.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vma.c | 20 +++++++++++++++++---
mm/vma.h | 7 +++++++
2 files changed, 24 insertions(+), 3 deletions(-)
--- a/mm/vma.c~mm-fix-uprobe-pte-be-overwritten-when-expanding-vma
+++ a/mm/vma.c
@@ -169,6 +169,9 @@ static void init_multi_vma_prep(struct v
vp->file = vma->vm_file;
if (vp->file)
vp->mapping = vma->vm_file->f_mapping;
+
+ if (vmg && vmg->skip_vma_uprobe)
+ vp->skip_vma_uprobe = true;
}
/*
@@ -358,10 +361,13 @@ static void vma_complete(struct vma_prep
if (vp->file) {
i_mmap_unlock_write(vp->mapping);
- uprobe_mmap(vp->vma);
- if (vp->adj_next)
- uprobe_mmap(vp->adj_next);
+ if (!vp->skip_vma_uprobe) {
+ uprobe_mmap(vp->vma);
+
+ if (vp->adj_next)
+ uprobe_mmap(vp->adj_next);
+ }
}
if (vp->remove) {
@@ -1823,6 +1829,14 @@ struct vm_area_struct *copy_vma(struct v
faulted_in_anon_vma = false;
}
+ /*
+ * If the VMA we are copying might contain a uprobe PTE, ensure
+ * that we do not establish one upon merge. Otherwise, when mremap()
+ * moves page tables, it will orphan the newly created PTE.
+ */
+ if (vma->vm_file)
+ vmg.skip_vma_uprobe = true;
+
new_vma = find_vma_prev(mm, addr, &vmg.prev);
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */
--- a/mm/vma.h~mm-fix-uprobe-pte-be-overwritten-when-expanding-vma
+++ a/mm/vma.h
@@ -19,6 +19,8 @@ struct vma_prepare {
struct vm_area_struct *insert;
struct vm_area_struct *remove;
struct vm_area_struct *remove2;
+
+ bool skip_vma_uprobe :1;
};
struct unlink_vma_file_batch {
@@ -120,6 +122,11 @@ struct vma_merge_struct {
*/
bool give_up_on_oom :1;
+ /*
+ * If set, skip uprobe_mmap upon merged vma.
+ */
+ bool skip_vma_uprobe :1;
+
/* Internal flags set during merge process: */
/*
_
Patches currently in -mm which might be from pulehui(a)huawei.com are
In the interrupt handler rain_interrupt(), the buffer full check on
rain->buf_len is performed before acquiring rain->buf_lock. This
creates a Time-of-Check to Time-of-Use (TOCTOU) race condition, as
rain->buf_len is concurrently accessed and modified in the work
handler rain_irq_work_handler() under the same lock.
Multiple interrupt invocations can race, with each reading buf_len
before it becomes full and then proceeding. This can lead to both
interrupts attempting to write to the buffer, incrementing buf_len
beyond its capacity (DATA_SIZE) and causing a buffer overflow.
Fix this bug by moving the spin_lock() to before the buffer full
check. This ensures that the check and the subsequent buffer modification
are performed atomically, preventing the race condition. An corresponding
spin_unlock() is added to the overflow path to correctly release the
lock.
This possible bug was found by an experimental static analysis tool
developed by our team.
Fixes: 0f314f6c2e77 ("[media] rainshadow-cec: new RainShadow Tech HDMI CEC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com>
---
drivers/media/cec/usb/rainshadow/rainshadow-cec.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/cec/usb/rainshadow/rainshadow-cec.c b/drivers/media/cec/usb/rainshadow/rainshadow-cec.c
index ee870ea1a886..6f8d6797c614 100644
--- a/drivers/media/cec/usb/rainshadow/rainshadow-cec.c
+++ b/drivers/media/cec/usb/rainshadow/rainshadow-cec.c
@@ -171,11 +171,12 @@ static irqreturn_t rain_interrupt(struct serio *serio, unsigned char data,
{
struct rain *rain = serio_get_drvdata(serio);
+ spin_lock(&rain->buf_lock);
if (rain->buf_len == DATA_SIZE) {
+ spin_unlock(&rain->buf_lock);
dev_warn_once(rain->dev, "buffer overflow\n");
return IRQ_HANDLED;
}
- spin_lock(&rain->buf_lock);
rain->buf_len++;
rain->buf[rain->buf_wr_idx] = data;
rain->buf_wr_idx = (rain->buf_wr_idx + 1) & 0xff;
--
2.25.1
From: Ajay Agarwal <ajayagarwal(a)google.com>
[ Upstream commit 7447990137bf06b2aeecad9c6081e01a9f47f2aa ]
PCIe r6.2, sec 5.5.4, requires that:
If setting either or both of the enable bits for ASPM L1 PM Substates,
both ports must be configured as described in this section while ASPM L1
is disabled.
Previously, pcie_config_aspm_l1ss() assumed that "setting enable bits"
meant "setting them to 1", and it configured L1SS as follows:
- Clear L1SS enable bits
- Disable L1
- Configure L1SS enable bits as required
- Enable L1 if required
With this sequence, when disabling L1SS on an ARM A-core with a Synopsys
DesignWare PCIe core, the CPU occasionally hangs when reading
PCI_L1SS_CTL1, leading to a reboot when the CPU watchdog expires.
Move the L1 disable to the caller (pcie_config_aspm_link(), where L1 was
already enabled) so L1 is always disabled while updating the L1SS bits:
- Disable L1
- Clear L1SS enable bits
- Configure L1SS enable bits as required
- Enable L1 if required
Change pcie_aspm_cap_init() similarly.
Link: https://lore.kernel.org/r/20241007032917.872262-1-ajayagarwal@google.com
Signed-off-by: Ajay Agarwal <ajayagarwal(a)google.com>
[bhelgaas: comments, commit log, compute L1SS setting before config access]
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Tested-by: Johnny-CC Chang <Johnny-CC.Chang(a)mediatek.com>
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
---
drivers/pci/pcie/aspm.c | 92 ++++++++++++++++++++++-------------------
1 file changed, 50 insertions(+), 42 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index cee2365e54b8..e943691bc931 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -805,6 +805,15 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &parent_lnkctl);
pcie_capability_read_word(child, PCI_EXP_LNKCTL, &child_lnkctl);
+ /* Disable L0s/L1 before updating L1SS config */
+ if (FIELD_GET(PCI_EXP_LNKCTL_ASPMC, child_lnkctl) ||
+ FIELD_GET(PCI_EXP_LNKCTL_ASPMC, parent_lnkctl)) {
+ pcie_capability_write_word(child, PCI_EXP_LNKCTL,
+ child_lnkctl & ~PCI_EXP_LNKCTL_ASPMC);
+ pcie_capability_write_word(parent, PCI_EXP_LNKCTL,
+ parent_lnkctl & ~PCI_EXP_LNKCTL_ASPMC);
+ }
+
/*
* Setup L0s state
*
@@ -829,6 +838,13 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
aspm_l1ss_init(link);
+ /* Restore L0s/L1 if they were enabled */
+ if (FIELD_GET(PCI_EXP_LNKCTL_ASPMC, child_lnkctl) ||
+ FIELD_GET(PCI_EXP_LNKCTL_ASPMC, parent_lnkctl)) {
+ pcie_capability_write_word(parent, PCI_EXP_LNKCTL, parent_lnkctl);
+ pcie_capability_write_word(child, PCI_EXP_LNKCTL, child_lnkctl);
+ }
+
/* Save default state */
link->aspm_default = link->aspm_enabled;
@@ -845,25 +861,28 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
}
}
-/* Configure the ASPM L1 substates */
+/* Configure the ASPM L1 substates. Caller must disable L1 first. */
static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
{
- u32 val, enable_req;
+ u32 val;
struct pci_dev *child = link->downstream, *parent = link->pdev;
- enable_req = (link->aspm_enabled ^ state) & state;
+ val = 0;
+ if (state & PCIE_LINK_STATE_L1_1)
+ val |= PCI_L1SS_CTL1_ASPM_L1_1;
+ if (state & PCIE_LINK_STATE_L1_2)
+ val |= PCI_L1SS_CTL1_ASPM_L1_2;
+ if (state & PCIE_LINK_STATE_L1_1_PCIPM)
+ val |= PCI_L1SS_CTL1_PCIPM_L1_1;
+ if (state & PCIE_LINK_STATE_L1_2_PCIPM)
+ val |= PCI_L1SS_CTL1_PCIPM_L1_2;
/*
- * Here are the rules specified in the PCIe spec for enabling L1SS:
- * - When enabling L1.x, enable bit at parent first, then at child
- * - When disabling L1.x, disable bit at child first, then at parent
- * - When enabling ASPM L1.x, need to disable L1
- * (at child followed by parent).
- * - The ASPM/PCIPM L1.2 must be disabled while programming timing
+ * PCIe r6.2, sec 5.5.4, rules for enabling L1 PM Substates:
+ * - Clear L1.x enable bits at child first, then at parent
+ * - Set L1.x enable bits at parent first, then at child
+ * - ASPM/PCIPM L1.2 must be disabled while programming timing
* parameters
- *
- * To keep it simple, disable all L1SS bits first, and later enable
- * what is needed.
*/
/* Disable all L1 substates */
@@ -871,26 +890,6 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
PCI_L1SS_CTL1_L1SS_MASK, 0);
pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
PCI_L1SS_CTL1_L1SS_MASK, 0);
- /*
- * If needed, disable L1, and it gets enabled later
- * in pcie_config_aspm_link().
- */
- if (enable_req & (PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)) {
- pcie_capability_clear_word(child, PCI_EXP_LNKCTL,
- PCI_EXP_LNKCTL_ASPM_L1);
- pcie_capability_clear_word(parent, PCI_EXP_LNKCTL,
- PCI_EXP_LNKCTL_ASPM_L1);
- }
-
- val = 0;
- if (state & PCIE_LINK_STATE_L1_1)
- val |= PCI_L1SS_CTL1_ASPM_L1_1;
- if (state & PCIE_LINK_STATE_L1_2)
- val |= PCI_L1SS_CTL1_ASPM_L1_2;
- if (state & PCIE_LINK_STATE_L1_1_PCIPM)
- val |= PCI_L1SS_CTL1_PCIPM_L1_1;
- if (state & PCIE_LINK_STATE_L1_2_PCIPM)
- val |= PCI_L1SS_CTL1_PCIPM_L1_2;
/* Enable what we need to enable */
pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
@@ -937,21 +936,30 @@ static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state)
dwstream |= PCI_EXP_LNKCTL_ASPM_L1;
}
+ /*
+ * Per PCIe r6.2, sec 5.5.4, setting either or both of the enable
+ * bits for ASPM L1 PM Substates must be done while ASPM L1 is
+ * disabled. Disable L1 here and apply new configuration after L1SS
+ * configuration has been completed.
+ *
+ * Per sec 7.5.3.7, when disabling ASPM L1, software must disable
+ * it in the Downstream component prior to disabling it in the
+ * Upstream component, and ASPM L1 must be enabled in the Upstream
+ * component prior to enabling it in the Downstream component.
+ *
+ * Sec 7.5.3.7 also recommends programming the same ASPM Control
+ * value for all functions of a multi-function device.
+ */
+ list_for_each_entry(child, &linkbus->devices, bus_list)
+ pcie_config_aspm_dev(child, 0);
+ pcie_config_aspm_dev(parent, 0);
+
if (link->aspm_capable & PCIE_LINK_STATE_L1SS)
pcie_config_aspm_l1ss(link, state);
- /*
- * Spec 2.0 suggests all functions should be configured the
- * same setting for ASPM. Enabling ASPM L1 should be done in
- * upstream component first and then downstream, and vice
- * versa for disabling ASPM L1. Spec doesn't mention L0S.
- */
- if (state & PCIE_LINK_STATE_L1)
- pcie_config_aspm_dev(parent, upstream);
+ pcie_config_aspm_dev(parent, upstream);
list_for_each_entry(child, &linkbus->devices, bus_list)
pcie_config_aspm_dev(child, dwstream);
- if (!(state & PCIE_LINK_STATE_L1))
- pcie_config_aspm_dev(parent, upstream);
link->aspm_enabled = state;
--
2.45.2
From: Ming Lei <ming.lei(a)redhat.com>
[ Upstream commit 26064d3e2b4d9a14df1072980e558c636fb023ea ]
>4GB folio is possible on some ARCHs, such as aarch64, 16GB hugepage
is supported, then 'offset' of folio can't be held in 'unsigned int',
cause warning in bio_add_folio_nofail() and IO failure.
Fix it by adjusting 'page' & trimming 'offset' so that `->bi_offset` won't
be overflow, and folio can be added to bio successfully.
Fixes: ed9832bc08db ("block: introduce folio awareness and add a bigger size from folio")
Cc: Kundan Kumar <kundan.kumar(a)samsung.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Gavin Shan <gshan(a)redhat.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Link: https://lore.kernel.org/r/20250312145136.2891229-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Alva Lan <alvalan9(a)foxmail.com>
---
The follow-up fix fbecd731de05 ("xfs: fix zoned GC data corruption due
to wrong bv_offset") addresses issues in the file fs/xfs/xfs_zone_gc.c.
This file was first introduced in version v6.15-rc1. So don't backport
the follow up fix to 6.12.y.
---
block/bio.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 20c74696bf23..094a5adf79d2 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1156,9 +1156,10 @@ EXPORT_SYMBOL(bio_add_page);
void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
size_t off)
{
+ unsigned long nr = off / PAGE_SIZE;
+
WARN_ON_ONCE(len > UINT_MAX);
- WARN_ON_ONCE(off > UINT_MAX);
- __bio_add_page(bio, &folio->page, len, off);
+ __bio_add_page(bio, folio_page(folio, nr), len, off % PAGE_SIZE);
}
EXPORT_SYMBOL_GPL(bio_add_folio_nofail);
@@ -1179,9 +1180,11 @@ EXPORT_SYMBOL_GPL(bio_add_folio_nofail);
bool bio_add_folio(struct bio *bio, struct folio *folio, size_t len,
size_t off)
{
- if (len > UINT_MAX || off > UINT_MAX)
+ unsigned long nr = off / PAGE_SIZE;
+
+ if (len > UINT_MAX)
return false;
- return bio_add_page(bio, &folio->page, len, off) > 0;
+ return bio_add_page(bio, folio_page(folio, nr), len, off % PAGE_SIZE) > 0;
}
EXPORT_SYMBOL(bio_add_folio);
--
2.34.1
Fix possible overflow in the address expression used as the second
argument to iommu_map() and iommu_unmap(). Without an explicit cast,
this expression may overflow when 'r->offset' or 'i' are large. Cast
the result to unsigned long before shifting to ensure correct IOVA
computation and prevent unintended wraparound.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org # v4.4+
Signed-off-by: Alexey Nepomnyashih <sdl(a)nppct.ru>
---
drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 201022ae9214..17a0e1a46211 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -334,7 +334,7 @@ gk20a_instobj_dtor_iommu(struct nvkm_memory *memory)
/* Unmap pages from GPU address space and free them */
for (i = 0; i < node->base.mn->length; i++) {
iommu_unmap(imem->domain,
- (r->offset + i) << imem->iommu_pgshift, PAGE_SIZE);
+ ((unsigned long)r->offset + i) << imem->iommu_pgshift, PAGE_SIZE);
dma_unmap_page(dev, node->dma_addrs[i], PAGE_SIZE,
DMA_BIDIRECTIONAL);
__free_page(node->pages[i]);
@@ -472,7 +472,7 @@ gk20a_instobj_ctor_iommu(struct gk20a_instmem *imem, u32 npages, u32 align,
/* Map into GPU address space */
for (i = 0; i < npages; i++) {
- u32 offset = (r->offset + i) << imem->iommu_pgshift;
+ unsigned long offset = ((unsigned long)r->offset + i) << imem->iommu_pgshift;
ret = iommu_map(imem->domain, offset, node->dma_addrs[i],
PAGE_SIZE, IOMMU_READ | IOMMU_WRITE,
--
2.43.0
Fix possible overflow in the address expression used as the second
argument to iommu_map() and iommu_unmap(). Without an explicit cast,
this expression may overflow when 'r->offset' or 'i' are large. Cast
the result to unsigned long before shifting to ensure correct IOVA
computation and prevent unintended wraparound.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org # v4.4+
Signed-off-by: Alexey Nepomnyashih <sdl(a)nppct.ru>
---
drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 201022ae9214..17a0e1a46211 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -334,7 +334,7 @@ gk20a_instobj_dtor_iommu(struct nvkm_memory *memory)
/* Unmap pages from GPU address space and free them */
for (i = 0; i < node->base.mn->length; i++) {
iommu_unmap(imem->domain,
- (r->offset + i) << imem->iommu_pgshift, PAGE_SIZE);
+ ((unsigned long)r->offset + i) << imem->iommu_pgshift, PAGE_SIZE);
dma_unmap_page(dev, node->dma_addrs[i], PAGE_SIZE,
DMA_BIDIRECTIONAL);
__free_page(node->pages[i]);
@@ -472,7 +472,7 @@ gk20a_instobj_ctor_iommu(struct gk20a_instmem *imem, u32 npages, u32 align,
/* Map into GPU address space */
for (i = 0; i < npages; i++) {
- u32 offset = (r->offset + i) << imem->iommu_pgshift;
+ unsigned long offset = ((unsigned long)r->offset + i) << imem->iommu_pgshift;
ret = iommu_map(imem->domain, offset, node->dma_addrs[i],
PAGE_SIZE, IOMMU_READ | IOMMU_WRITE,
--
2.43.0