The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f620d66af3165838bfa845dcf9f5f9b4089bf508
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101648-stimulate-stallion-5b49@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f620d66af3165838bfa845dcf9f5f9b4089bf508 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas(a)arm.com>
Date: Wed, 24 Sep 2025 13:31:22 +0100
Subject: [PATCH] arm64: mte: Do not flag the zero page as PG_mte_tagged
Commit 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the
zero page") attempted to fix ptrace() reading of tags from the zero page
by marking it as PG_mte_tagged during cpu_enable_mte(). The same commit
also changed the ptrace() tag access permission check to the VM_MTE vma
flag while turning the page flag test into a WARN_ON_ONCE().
Attempting to set the PG_mte_tagged flag early with
CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled may either hang (after commit
d77e59a8fccd "arm64: mte: Lock a page for MTE tag initialisation") or
have the flags cleared later during page_alloc_init_late(). In addition,
pages_identical() -> memcmp_pages() will reject any comparison with the
zero page as it is marked as tagged.
Partially revert the above commit to avoid setting PG_mte_tagged on the
zero page. Update the __access_remote_tags() warning on untagged pages
to ignore the zero page since it is known to have the tags initialised.
Note that all user mapping of the zero page are marked as pte_special().
The arm64 set_pte_at() will not call mte_sync_tags() on such pages, so
PG_mte_tagged will remain cleared.
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Fixes: 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page")
Reported-by: Gergely Kovacs <Gergely.Kovacs2(a)arm.com>
Cc: stable(a)vger.kernel.org # 5.10.x
Cc: Will Deacon <will(a)kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Lance Yang <lance.yang(a)linux.dev>
Acked-by: Lance Yang <lance.yang(a)linux.dev>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Lance Yang <lance.yang(a)linux.dev>
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ecb83ab0700e..7345987a50a0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2303,17 +2303,21 @@ static void bti_enable(const struct arm64_cpu_capabilities *__unused)
#ifdef CONFIG_ARM64_MTE
static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
{
+ static bool cleared_zero_page = false;
+
sysreg_clear_set(sctlr_el1, 0, SCTLR_ELx_ATA | SCTLR_EL1_ATA0);
mte_cpu_setup();
/*
* Clear the tags in the zero page. This needs to be done via the
- * linear map which has the Tagged attribute.
+ * linear map which has the Tagged attribute. Since this page is
+ * always mapped as pte_special(), set_pte_at() will not attempt to
+ * clear the tags or set PG_mte_tagged.
*/
- if (try_page_mte_tagging(ZERO_PAGE(0))) {
+ if (!cleared_zero_page) {
+ cleared_zero_page = true;
mte_clear_page_tags(lm_alias(empty_zero_page));
- set_page_mte_tagged(ZERO_PAGE(0));
}
kasan_init_hw_tags_cpu();
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index e5e773844889..63aed49ac181 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -460,7 +460,7 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
if (folio_test_hugetlb(folio))
WARN_ON_ONCE(!folio_test_hugetlb_mte_tagged(folio));
else
- WARN_ON_ONCE(!page_mte_tagged(page));
+ WARN_ON_ONCE(!page_mte_tagged(page) && !is_zero_page(page));
/* limit access to the end of the page */
offset = offset_in_page(addr);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 23b53639a793477326fd57ed103823a8ab63084f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101641-clutter-scruffy-a000@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23b53639a793477326fd57ed103823a8ab63084f Mon Sep 17 00:00:00 2001
From: Thomas Fourier <fourier.thomas(a)gmail.com>
Date: Wed, 9 Jul 2025 13:35:40 +0200
Subject: [PATCH] media: cx18: Add missing check after DMA map
The DMA map functions can fail and should be tested for errors.
If the mapping fails, dealloc buffers, and return.
Fixes: 1c1e45d17b66 ("V4L/DVB (7786): cx18: new driver for the Conexant CX23418 MPEG encoder chip")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Fourier <fourier.thomas(a)gmail.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco(a)kernel.org>
diff --git a/drivers/media/pci/cx18/cx18-queue.c b/drivers/media/pci/cx18/cx18-queue.c
index 013694bfcb1c..7cbb2d586932 100644
--- a/drivers/media/pci/cx18/cx18-queue.c
+++ b/drivers/media/pci/cx18/cx18-queue.c
@@ -379,15 +379,22 @@ int cx18_stream_alloc(struct cx18_stream *s)
break;
}
+ buf->dma_handle = dma_map_single(&s->cx->pci_dev->dev,
+ buf->buf, s->buf_size,
+ s->dma);
+ if (dma_mapping_error(&s->cx->pci_dev->dev, buf->dma_handle)) {
+ kfree(buf->buf);
+ kfree(mdl);
+ kfree(buf);
+ break;
+ }
+
INIT_LIST_HEAD(&mdl->list);
INIT_LIST_HEAD(&mdl->buf_list);
mdl->id = s->mdl_base_idx; /* a somewhat safe value */
cx18_enqueue(s, mdl, &s->q_idle);
INIT_LIST_HEAD(&buf->list);
- buf->dma_handle = dma_map_single(&s->cx->pci_dev->dev,
- buf->buf, s->buf_size,
- s->dma);
cx18_buf_sync_for_cpu(s, buf);
list_add_tail(&buf->list, &s->buf_pool);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 08df2d7dd4ab2db8a172d824cda7872d5eca460a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101626-colony-unbend-f00e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 08df2d7dd4ab2db8a172d824cda7872d5eca460a Mon Sep 17 00:00:00 2001
From: Jason Andryuk <jason.andryuk(a)amd.com>
Date: Wed, 27 Aug 2025 20:36:01 -0400
Subject: [PATCH] xen/events: Cleanup find_virq() return codes
rc is overwritten by the evtchn_status hypercall in each iteration, so
the return value will be whatever the last iteration is. This could
incorrectly return success even if the event channel was not found.
Change to an explicit -ENOENT for an un-found virq and return 0 on a
successful match.
Fixes: 62cc5fc7b2e0 ("xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
Reviewed-by: Jan Beulich <jbeulich(a)suse.com>
Reviewed-by: Juergen Gross <jgross(a)suse.com>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
Message-ID: <20250828003604.8949-2-jason.andryuk(a)amd.com>
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 41309d38f78c..374231d84e4f 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1318,10 +1318,11 @@ static int find_virq(unsigned int virq, unsigned int cpu, evtchn_port_t *evtchn)
{
struct evtchn_status status;
evtchn_port_t port;
- int rc = -ENOENT;
memset(&status, 0, sizeof(status));
for (port = 0; port < xen_evtchn_max_channels(); port++) {
+ int rc;
+
status.dom = DOMID_SELF;
status.port = port;
rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
@@ -1331,10 +1332,10 @@ static int find_virq(unsigned int virq, unsigned int cpu, evtchn_port_t *evtchn)
continue;
if (status.u.virq == virq && status.vcpu == xen_vcpu_nr(cpu)) {
*evtchn = port;
- break;
+ return 0;
}
}
- return rc;
+ return -ENOENT;
}
/**
rtla-timerlat allows a *thread* latency threshold to be set via the
-T/--thread option. However, the timerlat tracer calls this *total*
latency (stop_tracing_total_us), and stops tracing also when the
return-to-user latency is over the threshold.
Change the behavior of the timerlat BPF program to reflect what the
timerlat tracer is doing, to avoid discrepancy between stopping
collecting data in the BPF program and stopping tracing in the timerlat
tracer.
Cc: stable(a)vger.kernel.org
Fixes: e34293ddcebd ("rtla/timerlat: Add BPF skeleton to collect samples")
Signed-off-by: Tomas Glozar <tglozar(a)redhat.com>
---
tools/tracing/rtla/src/timerlat.bpf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/tracing/rtla/src/timerlat.bpf.c b/tools/tracing/rtla/src/timerlat.bpf.c
index 084cd10c21fc..e2265b5d6491 100644
--- a/tools/tracing/rtla/src/timerlat.bpf.c
+++ b/tools/tracing/rtla/src/timerlat.bpf.c
@@ -148,6 +148,9 @@ int handle_timerlat_sample(struct trace_event_raw_timerlat_sample *tp_args)
} else {
update_main_hist(&hist_user, bucket);
update_summary(&summary_user, latency, bucket);
+
+ if (thread_threshold != 0 && latency_us >= thread_threshold)
+ set_stop_tracing();
}
return 0;
--
2.51.0
Page cache folios from a file system that support large block size (LBS)
can have minimal folio order greater than 0, thus a high order folio might
not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
folio in minimum folio order chunks") bumps the target order of
split_huge_page*() to the minimum allowed order when splitting a LBS folio.
This causes confusion for some split_huge_page*() callers like memory
failure handling code, since they expect after-split folios all have
order-0 when split succeeds but in reality get min_order_for_split() order
folios and give warnings.
Fix it by failing a split if the folio cannot be split to the target order.
Rename try_folio_split() to try_folio_split_to_order() to reflect the added
new_order parameter. Remove its unused list parameter.
Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
[The test poisons LBS folios, which cannot be split to order-0 folios, and
also tries to poison all memory. The non split LBS folios take more memory
than the test anticipated, leading to OOM. The patch fixed the kernel
warning and the test needs some change to avoid OOM.]
Reported-by: syzbot+e6367ea2fdab6ed46056(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
Cc: stable(a)vger.kernel.org
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Luis Chamberlain <mcgrof(a)kernel.org>
Reviewed-by: Pankaj Raghav <p.raghav(a)samsung.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
---
From V2[1]:
1. Removed a typo in try_folio_split_to_order() comment.
2. Sent the Fixes patch separately.
[1] https://lore.kernel.org/linux-mm/20251016033452.125479-1-ziy@nvidia.com/
include/linux/huge_mm.h | 55 +++++++++++++++++------------------------
mm/huge_memory.c | 9 +------
mm/truncate.c | 6 +++--
3 files changed, 28 insertions(+), 42 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index c4a811958cda..7698b3542c4f 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -383,45 +383,30 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
}
/*
- * try_folio_split - try to split a @folio at @page using non uniform split.
+ * try_folio_split_to_order - try to split a @folio at @page to @new_order using
+ * non uniform split.
* @folio: folio to be split
- * @page: split to order-0 at the given page
- * @list: store the after-split folios
+ * @page: split to @new_order at the given page
+ * @new_order: the target split order
*
- * Try to split a @folio at @page using non uniform split to order-0, if
- * non uniform split is not supported, fall back to uniform split.
+ * Try to split a @folio at @page using non uniform split to @new_order, if
+ * non uniform split is not supported, fall back to uniform split. After-split
+ * folios are put back to LRU list. Use min_order_for_split() to get the lower
+ * bound of @new_order.
*
* Return: 0: split is successful, otherwise split failed.
*/
-static inline int try_folio_split(struct folio *folio, struct page *page,
- struct list_head *list)
+static inline int try_folio_split_to_order(struct folio *folio,
+ struct page *page, unsigned int new_order)
{
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- if (!non_uniform_split_supported(folio, 0, false))
- return split_huge_page_to_list_to_order(&folio->page, list,
- ret);
- return folio_split(folio, ret, page, list);
+ if (!non_uniform_split_supported(folio, new_order, /* warns= */ false))
+ return split_huge_page_to_list_to_order(&folio->page, NULL,
+ new_order);
+ return folio_split(folio, new_order, page, NULL);
}
static inline int split_huge_page(struct page *page)
{
- struct folio *folio = page_folio(page);
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- /*
- * split_huge_page() locks the page before splitting and
- * expects the same page that has been split to be locked when
- * returned. split_folio(page_folio(page)) cannot be used here
- * because it converts the page to folio and passes the head
- * page to be split.
- */
- return split_huge_page_to_list_to_order(page, NULL, ret);
+ return split_huge_page_to_list_to_order(page, NULL, 0);
}
void deferred_split_folio(struct folio *folio, bool partially_mapped);
#ifdef CONFIG_MEMCG
@@ -611,14 +596,20 @@ static inline int split_huge_page(struct page *page)
return -EINVAL;
}
+static inline int min_order_for_split(struct folio *folio)
+{
+ VM_WARN_ON_ONCE_FOLIO(1, folio);
+ return -EINVAL;
+}
+
static inline int split_folio_to_list(struct folio *folio, struct list_head *list)
{
VM_WARN_ON_ONCE_FOLIO(1, folio);
return -EINVAL;
}
-static inline int try_folio_split(struct folio *folio, struct page *page,
- struct list_head *list)
+static inline int try_folio_split_to_order(struct folio *folio,
+ struct page *page, unsigned int new_order)
{
VM_WARN_ON_ONCE_FOLIO(1, folio);
return -EINVAL;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f14fbef1eefd..fc65ec3393d2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3812,8 +3812,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
min_order = mapping_min_folio_order(folio->mapping);
if (new_order < min_order) {
- VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
- min_order);
ret = -EINVAL;
goto out;
}
@@ -4165,12 +4163,7 @@ int min_order_for_split(struct folio *folio)
int split_folio_to_list(struct folio *folio, struct list_head *list)
{
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- return split_huge_page_to_list_to_order(&folio->page, list, ret);
+ return split_huge_page_to_list_to_order(&folio->page, list, 0);
}
/*
diff --git a/mm/truncate.c b/mm/truncate.c
index 91eb92a5ce4f..9210cf808f5c 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
size_t size = folio_size(folio);
unsigned int offset, length;
struct page *split_at, *split_at2;
+ unsigned int min_order;
if (pos < start)
offset = start - pos;
@@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
if (!folio_test_large(folio))
return true;
+ min_order = mapping_min_folio_order(folio->mapping);
split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
- if (!try_folio_split(folio, split_at, NULL)) {
+ if (!try_folio_split_to_order(folio, split_at, min_order)) {
/*
* try to split at offset + length to make sure folios within
* the range can be dropped, especially to avoid memory waste
@@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
*/
if (folio_test_large(folio2) &&
folio2->mapping == folio->mapping)
- try_folio_split(folio2, split_at2, NULL);
+ try_folio_split_to_order(folio2, split_at2, min_order);
folio_unlock(folio2);
out:
--
2.51.0
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 9658d698a8a83540bf6a6c80d13c9a61590ee985
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101627-shortage-author-7f5b@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9658d698a8a83540bf6a6c80d13c9a61590ee985 Mon Sep 17 00:00:00 2001
From: Lance Yang <lance.yang(a)linux.dev>
Date: Tue, 30 Sep 2025 16:10:40 +0800
Subject: [PATCH] mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping
zero-filled mTHP subpage to shared zeropage
When splitting an mTHP and replacing a zero-filled subpage with the shared
zeropage, try_to_map_unused_to_zeropage() currently drops several
important PTE bits.
For userspace tools like CRIU, which rely on the soft-dirty mechanism for
incremental snapshots, losing the soft-dirty bit means modified pages are
missed, leading to inconsistent memory state after restore.
As pointed out by David, the more critical uffd-wp bit is also dropped.
This breaks the userfaultfd write-protection mechanism, causing writes to
be silently missed by monitoring applications, which can lead to data
corruption.
Preserve both the soft-dirty and uffd-wp bits from the old PTE when
creating the new zeropage mapping to ensure they are correctly tracked.
Link: https://lkml.kernel.org/r/20250930081040.80926-1-lance.yang@linux.dev
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: Dev Jain <dev.jain(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Dev Jain <dev.jain(a)arm.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Byungchul Park <byungchul(a)sk.com>
Cc: Gregory Price <gourry(a)gourry.net>
Cc: "Huang, Ying" <ying.huang(a)linux.alibaba.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Joshua Hahn <joshua.hahnjy(a)gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Mathew Brost <matthew.brost(a)intel.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rakie Kim <rakie.kim(a)sk.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Usama Arif <usamaarif642(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/migrate.c b/mm/migrate.c
index ce83c2c3c287..e3065c9edb55 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -296,8 +296,7 @@ bool isolate_folio_to_list(struct folio *folio, struct list_head *list)
}
static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
- struct folio *folio,
- unsigned long idx)
+ struct folio *folio, pte_t old_pte, unsigned long idx)
{
struct page *page = folio_page(folio, idx);
pte_t newpte;
@@ -306,7 +305,7 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
return false;
VM_BUG_ON_PAGE(!PageAnon(page), page);
VM_BUG_ON_PAGE(!PageLocked(page), page);
- VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
+ VM_BUG_ON_PAGE(pte_present(old_pte), page);
if (folio_test_mlocked(folio) || (pvmw->vma->vm_flags & VM_LOCKED) ||
mm_forbids_zeropage(pvmw->vma->vm_mm))
@@ -322,6 +321,12 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address),
pvmw->vma->vm_page_prot));
+
+ if (pte_swp_soft_dirty(old_pte))
+ newpte = pte_mksoft_dirty(newpte);
+ if (pte_swp_uffd_wp(old_pte))
+ newpte = pte_mkuffd_wp(newpte);
+
set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);
dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio));
@@ -364,13 +369,13 @@ static bool remove_migration_pte(struct folio *folio,
continue;
}
#endif
+ old_pte = ptep_get(pvmw.pte);
if (rmap_walk_arg->map_unused_to_zeropage &&
- try_to_map_unused_to_zeropage(&pvmw, folio, idx))
+ try_to_map_unused_to_zeropage(&pvmw, folio, old_pte, idx))
continue;
folio_get(folio);
pte = mk_pte(new, READ_ONCE(vma->vm_page_prot));
- old_pte = ptep_get(pvmw.pte);
entry = pte_to_swp_entry(old_pte);
if (!is_migration_entry_young(entry))
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4b1ff850e0c1aacc23e923ed22989b827b9808f9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101657-platform-jersey-adec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4b1ff850e0c1aacc23e923ed22989b827b9808f9 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Thu, 25 Sep 2025 12:32:36 +0200
Subject: [PATCH] mptcp: pm: in-kernel: usable client side with C-flag
When servers set the C-flag in their MP_CAPABLE to tell clients not to
create subflows to the initial address and port, clients will likely not
use their other endpoints. That's because the in-kernel path-manager
uses the 'subflow' endpoints to create subflows only to the initial
address and port.
If the limits have not been modified to accept ADD_ADDR, the client
doesn't try to establish new subflows. If the limits accept ADD_ADDR,
the routing routes will be used to select the source IP.
The C-flag is typically set when the server is operating behind a legacy
Layer 4 load balancer, or using anycast IP address. Clients having their
different 'subflow' endpoints setup, don't end up creating multiple
subflows as expected, and causing some deployment issues.
A special case is then added here: when servers set the C-flag in the
MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
The 'subflows' endpoints will then be used with this new remote IP and
port. This exception is only allowed when the ADD_ADDR is sent
immediately after the 3WHS, and makes the client switching to the 'fully
established' mode. After that, 'select_local_address()' will not be able
to find any subflows, because 'id_avail_bitmap' will be filled in
mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
established' mode.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received")
Cc: stable(a)vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-1-ad126c…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 204e1f61212e..584cab90aa6e 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -637,9 +637,12 @@ void mptcp_pm_add_addr_received(const struct sock *ssk,
} else {
__MPTCP_INC_STATS(sock_net((struct sock *)msk), MPTCP_MIB_ADDADDRDROP);
}
- /* id0 should not have a different address */
+ /* - id0 should not have a different address
+ * - special case for C-flag: linked to fill_local_addresses_vec()
+ */
} else if ((addr->id == 0 && !mptcp_pm_is_init_remote_addr(msk, addr)) ||
- (addr->id > 0 && !READ_ONCE(pm->accept_addr))) {
+ (addr->id > 0 && !READ_ONCE(pm->accept_addr) &&
+ !mptcp_pm_add_addr_c_flag_case(msk))) {
mptcp_pm_announce_addr(msk, addr, true);
mptcp_pm_add_addr_send_ack(msk);
} else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) {
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index 667803d72b64..8c46493a0835 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -389,10 +389,12 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
struct mptcp_addr_info mpc_addr;
struct pm_nl_pernet *pernet;
unsigned int subflows_max;
+ bool c_flag_case;
int i = 0;
pernet = pm_nl_get_pernet_from_msk(msk);
subflows_max = mptcp_pm_get_subflows_max(msk);
+ c_flag_case = remote->id && mptcp_pm_add_addr_c_flag_case(msk);
mptcp_local_address((struct sock_common *)msk, &mpc_addr);
@@ -405,12 +407,27 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
continue;
if (msk->pm.subflows < subflows_max) {
+ bool is_id0;
+
locals[i].addr = entry->addr;
locals[i].flags = entry->flags;
locals[i].ifindex = entry->ifindex;
+ is_id0 = mptcp_addresses_equal(&locals[i].addr,
+ &mpc_addr,
+ locals[i].addr.port);
+
+ if (c_flag_case &&
+ (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) {
+ __clear_bit(locals[i].addr.id,
+ msk->pm.id_avail_bitmap);
+
+ if (!is_id0)
+ msk->pm.local_addr_used++;
+ }
+
/* Special case for ID0: set the correct ID */
- if (mptcp_addresses_equal(&locals[i].addr, &mpc_addr, locals[i].addr.port))
+ if (is_id0)
locals[i].addr.id = 0;
msk->pm.subflows++;
@@ -419,6 +436,37 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
}
rcu_read_unlock();
+ /* Special case: peer sets the C flag, accept one ADD_ADDR if default
+ * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints
+ */
+ if (!i && c_flag_case) {
+ unsigned int local_addr_max = mptcp_pm_get_local_addr_max(msk);
+
+ while (msk->pm.local_addr_used < local_addr_max &&
+ msk->pm.subflows < subflows_max) {
+ struct mptcp_pm_local *local = &locals[i];
+
+ if (!select_local_address(pernet, msk, local))
+ break;
+
+ __clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
+
+ if (!mptcp_pm_addr_families_match(sk, &local->addr,
+ remote))
+ continue;
+
+ if (mptcp_addresses_equal(&local->addr, &mpc_addr,
+ local->addr.port))
+ continue;
+
+ msk->pm.local_addr_used++;
+ msk->pm.subflows++;
+ i++;
+ }
+
+ return i;
+ }
+
/* If the array is empty, fill in the single
* 'IPADDRANY' local address
*/
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a1787a1344ac..cbe54331e5c7 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1199,6 +1199,14 @@ static inline void mptcp_pm_close_subflow(struct mptcp_sock *msk)
spin_unlock_bh(&msk->pm.lock);
}
+static inline bool mptcp_pm_add_addr_c_flag_case(struct mptcp_sock *msk)
+{
+ return READ_ONCE(msk->pm.remote_deny_join_id0) &&
+ msk->pm.local_addr_used == 0 &&
+ mptcp_pm_get_add_addr_accept_max(msk) == 0 &&
+ msk->pm.subflows < mptcp_pm_get_subflows_max(msk);
+}
+
void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk);
static inline struct mptcp_ext *mptcp_get_ext(const struct sk_buff *skb)
From: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
Ensure TX descriptor type fields are written in a safe order so the DMA
engine does not begin processing a chain before all descriptors are
fully initialised.
For multi-descriptor transmissions the driver writes DT_FEND into the
last descriptor and DT_FSTART into the first. The DMA engine starts
processing when it sees DT_FSTART. If the compiler or CPU reorders the
writes and publishes DT_FSTART before DT_FEND, the DMA can start early
and process an incomplete chain, leading to corrupted transmissions or
DMA errors.
Fix this by writing DT_FEND before the dma_wmb() barrier, executing
dma_wmb() immediately before DT_FSTART (or DT_FSINGLE in the single
descriptor case), and then adding a wmb() after the type updates to
ensure CPU-side ordering before ringing the hardware doorbell.
On an RZ/G2L platform running an RT kernel, this reordering hazard was
observed as TX stalls and timeouts:
[ 372.968431] NETDEV WATCHDOG: end0 (ravb): transmit queue 0 timed out
[ 372.968494] WARNING: CPU: 0 PID: 10 at net/sched/sch_generic.c:467 dev_watchdog+0x4a4/0x4ac
[ 373.969291] ravb 11c20000.ethernet end0: transmit timed out, status 00000000, resetting...
This change enforces the required ordering and prevents the DMA engine
from observing DT_FSTART before the rest of the descriptor chain is
valid.
Fixes: 2f45d1902acf ("ravb: minimize TX data copying")
Cc: stable(a)vger.kernel.org
Co-developed-by: Fabrizio Castro <fabrizio.castro.jz(a)renesas.com>
Signed-off-by: Fabrizio Castro <fabrizio.castro.jz(a)renesas.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
---
drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index a200e205825a..2a995fa9bfff 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2211,15 +2211,19 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev)
skb_tx_timestamp(skb);
}
- /* Descriptor type must be set after all the above writes */
- dma_wmb();
+
+ /* For multi-descriptors set DT_FEND before calling dma_wmb() */
if (num_tx_desc > 1) {
desc->die_dt = DT_FEND;
desc--;
- desc->die_dt = DT_FSTART;
- } else {
- desc->die_dt = DT_FSINGLE;
}
+
+ /* Descriptor type must be set after all the above writes */
+ dma_wmb();
+ desc->die_dt = (num_tx_desc > 1) ? DT_FSTART : DT_FSINGLE;
+
+ /* Ensure data is written to RAM before initiating DMA transfer */
+ wmb();
ravb_modify(ndev, TCCR, TCCR_TSRQ0 << q, TCCR_TSRQ0 << q);
priv->cur_tx[q] += num_tx_desc;
--
2.43.0
Some NTFS volumes failed to mount because sparse data runs were not
handled correctly during runlist unpacking. The code performed arithmetic
on the special SPARSE_LCN64 marker, leading to invalid LCN values and
mount errors.
Add an explicit check for the case described above, marking the run as
sparse without applying arithmetic.
Fixes: 736fc7bf5f68 ("fs: ntfs3: Fix integer overflow in run_unpack()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
---
fs/ntfs3/run.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/ntfs3/run.c b/fs/ntfs3/run.c
index 88550085f745..5df55e4adbb1 100644
--- a/fs/ntfs3/run.c
+++ b/fs/ntfs3/run.c
@@ -984,8 +984,12 @@ int run_unpack(struct runs_tree *run, struct ntfs_sb_info *sbi, CLST ino,
if (!dlcn)
return -EINVAL;
- if (check_add_overflow(prev_lcn, dlcn, &lcn))
+ /* Check special combination: 0 + SPARSE_LCN64. */
+ if (!prev_lcn && dlcn == SPARSE_LCN64) {
+ lcn = SPARSE_LCN64;
+ } else if (check_add_overflow(prev_lcn, dlcn, &lcn)) {
return -EINVAL;
+ }
prev_lcn = lcn;
} else {
/* The size of 'dlcn' can't be > 8. */
--
2.43.0
Hi Greg,
On 16/10/2025 15:25, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> selftests: mptcp: join: validate C-flag + def limit
>
> to the 5.10-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> selftests-mptcp-join-validate-c-flag-def-limit.patch
> and it can be found in the queue-5.10 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I do! :)
> From 008385efd05e04d8dff299382df2e8be0f91d8a0 Mon Sep 17 00:00:00 2001
> From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
> Date: Thu, 25 Sep 2025 12:32:37 +0200
> Subject: selftests: mptcp: join: validate C-flag + def limit
>
> From: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
>
> commit 008385efd05e04d8dff299382df2e8be0f91d8a0 upstream.
>
> The previous commit adds an exception for the C-flag case. The
> 'mptcp_join.sh' selftest is extended to validate this case.
>
> In this subtest, there is a typical CDN deployment with a client where
> MPTCP endpoints have been 'automatically' configured:
>
> - the server set net.mptcp.allow_join_initial_addr_port=0
>
> - the client has multiple 'subflow' endpoints, and the default limits:
> not accepting ADD_ADDRs.
>
> Without the parent patch, the client is not able to establish new
> subflows using its 'subflow' endpoints. The parent commit fixes that.
>
> The 'Fixes' tag here below is the same as the one from the previous
> commit: this patch here is not fixing anything wrong in the selftests,
> but it validates the previous fix for an issue introduced by this commit
> ID.
>
> Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received")
This commit has been introduced in v5.14, and not backported before. Do
you mind dropping it from v5.10 please? :)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.