# TL;DR
previous discussion: https://lore.kernel.org/linux-mm/20250921232709.1608699-1-harry.yoo@oracle.…
A "bad pmd" error occurs due to race condition between
change_prot_numa() and THP migration. The mainline kernel does not have
this bug as commit 670ddd8cdc fixes the race condition. 6.1.y, 5.15.y,
5.10.y, 5.4.y are affected by this bug.
Fixing this in -stable kernels is tricky because pte_map_offset_lock()
has different semantics in pre-6.5 and post-6.5 kernels. I am trying to
backport the same mechanism we have in the mainline kernel.
# Testing
I verified that the bug described below is not reproduced anymore
(on a downstream kernel) after applying this patch series. It used to
trigger in few days of intensive numa balancing testing, but it survived
2 weeks with this applied.
# Bug Description
It was reported that a bad pmd is seen when automatic NUMA
balancing is marking page table entries as prot_numa:
[2437548.196018] mm/pgtable-generic.c:50: bad pmd 00000000af22fc02(dffffffe71fbfe02)
[2437548.235022] Call Trace:
[2437548.238234] <TASK>
[2437548.241060] dump_stack_lvl+0x46/0x61
[2437548.245689] panic+0x106/0x2e5
[2437548.249497] pmd_clear_bad+0x3c/0x3c
[2437548.253967] change_pmd_range.isra.0+0x34d/0x3a7
[2437548.259537] change_p4d_range+0x156/0x20e
[2437548.264392] change_protection_range+0x116/0x1a9
[2437548.269976] change_prot_numa+0x15/0x37
[2437548.274774] task_numa_work+0x1b8/0x302
[2437548.279512] task_work_run+0x62/0x95
[2437548.283882] exit_to_user_mode_loop+0x1a4/0x1a9
[2437548.289277] exit_to_user_mode_prepare+0xf4/0xfc
[2437548.294751] ? sysvec_apic_timer_interrupt+0x34/0x81
[2437548.300677] irqentry_exit_to_user_mode+0x5/0x25
[2437548.306153] asm_sysvec_apic_timer_interrupt+0x16/0x1b
This is due to a race condition between change_prot_numa() and
THP migration because the kernel doesn't check is_swap_pmd() and
pmd_trans_huge() atomically:
change_prot_numa() THP migration
======================================================================
- change_pmd_range()
-> is_swap_pmd() returns false,
meaning it's not a PMD migration
entry.
- do_huge_pmd_numa_page()
-> migrate_misplaced_page() sets
migration entries for the THP.
- change_pmd_range()
-> pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_none() and pmd_trans_huge() returns false
- pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_bad() returns true for the migration entry!
The upstream commit 670ddd8cdcbd ("mm/mprotect: delete
pmd_none_or_clear_bad_unless_trans_huge()") closes this race condition
by checking is_swap_pmd() and pmd_trans_huge() atomically.
# Backporting note
commit a79390f5d6a7 ("mm/mprotect: use long for page accountings and retval")
is backported to return an error code (negative value) in
change_pte_range().
Unlike the mainline, pte_offset_map_lock() does not check if the pmd
entry is a migration entry or a hugepage; acquires PTL unconditionally
instead of returning failure. Therefore, it is necessary to keep the
!is_swap_pmd() && !pmd_trans_huge() && !pmd_devmap() checks in
change_pmd_range() before acquiring the PTL.
After acquiring the lock, open-code the semantics of
pte_offset_map_lock() in the mainline kernel; change_pte_range() fails
if the pmd value has changed. This requires adding pmd_old parameter
(pmd_t value that is read before calling the function) to
change_pte_range().
Hugh Dickins (1):
mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()
Peter Xu (1):
mm/mprotect: use long for page accountings and retval
include/linux/hugetlb.h | 4 +-
include/linux/mm.h | 2 +-
mm/hugetlb.c | 4 +-
mm/mempolicy.c | 2 +-
mm/mprotect.c | 124 +++++++++++++++++-----------------------
5 files changed, 60 insertions(+), 76 deletions(-)
--
2.43.0
# TL;DR
previous discussion: https://lore.kernel.org/linux-mm/20250921232709.1608699-1-harry.yoo@oracle.…
A "bad pmd" error occurs due to race condition between
change_prot_numa() and THP migration. The mainline kernel does not have
this bug as commit 670ddd8cdc fixes the race condition. 6.1.y, 5.15.y,
5.10.y, 5.4.y are affected by this bug.
Fixing this in -stable kernels is tricky because pte_map_offset_lock()
has different semantics in pre-6.5 and post-6.5 kernels. I am trying to
backport the same mechanism we have in the mainline kernel.
# Testing
I verified that the bug described below is not reproduced anymore
(on a downstream kernel) after applying this patch series. It used to
trigger in few days of intensive numa balancing testing, but it survived
2 weeks with this applied.
# Bug Description
It was reported that a bad pmd is seen when automatic NUMA
balancing is marking page table entries as prot_numa:
[2437548.196018] mm/pgtable-generic.c:50: bad pmd 00000000af22fc02(dffffffe71fbfe02)
[2437548.235022] Call Trace:
[2437548.238234] <TASK>
[2437548.241060] dump_stack_lvl+0x46/0x61
[2437548.245689] panic+0x106/0x2e5
[2437548.249497] pmd_clear_bad+0x3c/0x3c
[2437548.253967] change_pmd_range.isra.0+0x34d/0x3a7
[2437548.259537] change_p4d_range+0x156/0x20e
[2437548.264392] change_protection_range+0x116/0x1a9
[2437548.269976] change_prot_numa+0x15/0x37
[2437548.274774] task_numa_work+0x1b8/0x302
[2437548.279512] task_work_run+0x62/0x95
[2437548.283882] exit_to_user_mode_loop+0x1a4/0x1a9
[2437548.289277] exit_to_user_mode_prepare+0xf4/0xfc
[2437548.294751] ? sysvec_apic_timer_interrupt+0x34/0x81
[2437548.300677] irqentry_exit_to_user_mode+0x5/0x25
[2437548.306153] asm_sysvec_apic_timer_interrupt+0x16/0x1b
This is due to a race condition between change_prot_numa() and
THP migration because the kernel doesn't check is_swap_pmd() and
pmd_trans_huge() atomically:
change_prot_numa() THP migration
======================================================================
- change_pmd_range()
-> is_swap_pmd() returns false,
meaning it's not a PMD migration
entry.
- do_huge_pmd_numa_page()
-> migrate_misplaced_page() sets
migration entries for the THP.
- change_pmd_range()
-> pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_none() and pmd_trans_huge() returns false
- pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_bad() returns true for the migration entry!
The upstream commit 670ddd8cdcbd ("mm/mprotect: delete
pmd_none_or_clear_bad_unless_trans_huge()") closes this race condition
by checking is_swap_pmd() and pmd_trans_huge() atomically.
# Backporting note
commit a79390f5d6a7 ("mm/mprotect: use long for page accountings and retval")
is backported to return an error code (negative value) in
change_pte_range().
Unlike the mainline, pte_offset_map_lock() does not check if the pmd
entry is a migration entry or a hugepage; acquires PTL unconditionally
instead of returning failure. Therefore, it is necessary to keep the
!is_swap_pmd() && !pmd_trans_huge() && !pmd_devmap() checks in
change_pmd_range() before acquiring the PTL.
After acquiring the lock, open-code the semantics of
pte_offset_map_lock() in the mainline kernel; change_pte_range() fails
if the pmd value has changed. This requires adding pmd_old parameter
(pmd_t value that is read before calling the function) to
change_pte_range().
Hugh Dickins (1):
mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()
Peter Xu (1):
mm/mprotect: use long for page accountings and retval
include/linux/hugetlb.h | 4 +-
include/linux/mm.h | 2 +-
mm/hugetlb.c | 4 +-
mm/mempolicy.c | 2 +-
mm/mprotect.c | 107 ++++++++++++++++++++++------------------
5 files changed, 64 insertions(+), 55 deletions(-)
--
2.43.0
On SM8250 (IRIS2) with firmware older than 1.0.087, the firmware could
not handle a dummy device address for EOS buffers, so a NULL device
address is sent instead. The existing check used IS_V6() alongside a
firmware version gate:
if (IS_V6(core) && is_fw_rev_or_older(core, 1, 0, 87))
fdata.device_addr = 0;
else
fdata.device_addr = 0xdeadb000;
However, SC7280 which is also V6, uses a firmware string of the form
"1.0.<commit-hash>", which the version parser translates to 1.0.0. This
unintentionally satisfies the `is_fw_rev_or_older(..., 1, 0, 87)`
condition on SC7280. Combined with IS_V6() matching there as well, the
quirk is incorrectly applied to SC7280, causing VP9 decode failures.
Constrain the check to IRIS2 (SM8250) only, which is the only platform
that needed this quirk, by replacing IS_V6() with IS_IRIS2(). This
restores correct behavior on SC7280 (no forced NULL EOS buffer address).
Fixes: 47f867cb1b63 ("media: venus: fix EOS handling in decoder stop command")
Cc: stable(a)vger.kernel.org
Reported-by: Mecid <notifications(a)github.com>
Closes: https://github.com/qualcomm-linux/kernel-topics/issues/222
Co-developed-by: Renjiang Han <renjiang.han(a)oss.qualcomm.com>
Signed-off-by: Renjiang Han <renjiang.han(a)oss.qualcomm.com>
Signed-off-by: Dikshita Agarwal <dikshita.agarwal(a)oss.qualcomm.com>
---
drivers/media/platform/qcom/venus/vdec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c
index 4a6641fdffcf79705893be58c7ec5cf485e2fab9..dc85a5b8c989eb8339e5de9fea7ab49532e7f15a 100644
--- a/drivers/media/platform/qcom/venus/vdec.c
+++ b/drivers/media/platform/qcom/venus/vdec.c
@@ -565,7 +565,7 @@ vdec_decoder_cmd(struct file *file, void *fh, struct v4l2_decoder_cmd *cmd)
fdata.buffer_type = HFI_BUFFER_INPUT;
fdata.flags |= HFI_BUFFERFLAG_EOS;
- if (IS_V6(inst->core) && is_fw_rev_or_older(inst->core, 1, 0, 87))
+ if (IS_IRIS2(inst->core) && is_fw_rev_or_older(inst->core, 1, 0, 87))
fdata.device_addr = 0;
else
fdata.device_addr = 0xdeadb000;
---
base-commit: 1f2353f5a1af995efbf7bea44341aa0d03460b28
change-id: 20251121-venus-vp9-fix-1ff602724c02
Best regards,
--
Dikshita Agarwal <dikshita.agarwal(a)oss.qualcomm.com>
The patch below does not apply to the 6.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.17.y
git checkout FETCH_HEAD
git cherry-pick -x 92e239e36d600002559074994a545fcfac9afd2d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025112435-stray-aflutter-0f77@gregkh' --subject-prefix 'PATCH 6.17.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 92e239e36d600002559074994a545fcfac9afd2d Mon Sep 17 00:00:00 2001
From: Gang Yan <yangang(a)kylinos.cn>
Date: Tue, 18 Nov 2025 08:20:28 +0100
Subject: [PATCH] mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
Fix inverted WARN_ON_ONCE condition that prevented normal address
removal counter updates. The current code only executes decrement
logic when the counter is already 0 (abnormal state), while
normal removals (counter > 0) are ignored.
Signed-off-by: Gang Yan <yangang(a)kylinos.cn>
Fixes: 636113918508 ("mptcp: pm: remove '_nl' from mptcp_pm_nl_rm_addr_received")
Cc: stable(a)vger.kernel.org
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-10-806d3…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index 2ae95476dba3..0a50fd5edc06 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -672,7 +672,7 @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
void mptcp_pm_nl_rm_addr(struct mptcp_sock *msk, u8 rm_id)
{
- if (rm_id && WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)) {
+ if (rm_id && !WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)) {
u8 limit_add_addr_accepted =
mptcp_pm_get_limit_add_addr_accepted(msk);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1bba3f219c5e8c29e63afa3c1fc24f875ebec119
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025112426-backroom-negate-d125@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1bba3f219c5e8c29e63afa3c1fc24f875ebec119 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Tue, 18 Nov 2025 08:20:22 +0100
Subject: [PATCH] mptcp: do not fallback when OoO is present
In case of DSS corruption, the MPTCP protocol tries to avoid the subflow
reset if fallback is possible. Such corruptions happen in the receive
path; to ensure fallback is possible the stack additionally needs to
check for OoO data, otherwise the fallback will break the data stream.
Fixes: e32d262c89e2 ("mptcp: handle consistently DSS corruption")
Cc: stable(a)vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/598
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-4-806d37…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e30e9043a694..6f0e8f670d83 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -76,6 +76,13 @@ bool __mptcp_try_fallback(struct mptcp_sock *msk, int fb_mib)
if (__mptcp_check_fallback(msk))
return true;
+ /* The caller possibly is not holding the msk socket lock, but
+ * in the fallback case only the current subflow is touching
+ * the OoO queue.
+ */
+ if (!RB_EMPTY_ROOT(&msk->out_of_order_queue))
+ return false;
+
spin_lock_bh(&msk->fallback_lock);
if (!msk->allow_infinite_fallback) {
spin_unlock_bh(&msk->fallback_lock);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1bba3f219c5e8c29e63afa3c1fc24f875ebec119
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025112425-math-lasso-c3b8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1bba3f219c5e8c29e63afa3c1fc24f875ebec119 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Tue, 18 Nov 2025 08:20:22 +0100
Subject: [PATCH] mptcp: do not fallback when OoO is present
In case of DSS corruption, the MPTCP protocol tries to avoid the subflow
reset if fallback is possible. Such corruptions happen in the receive
path; to ensure fallback is possible the stack additionally needs to
check for OoO data, otherwise the fallback will break the data stream.
Fixes: e32d262c89e2 ("mptcp: handle consistently DSS corruption")
Cc: stable(a)vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/598
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-4-806d37…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e30e9043a694..6f0e8f670d83 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -76,6 +76,13 @@ bool __mptcp_try_fallback(struct mptcp_sock *msk, int fb_mib)
if (__mptcp_check_fallback(msk))
return true;
+ /* The caller possibly is not holding the msk socket lock, but
+ * in the fallback case only the current subflow is touching
+ * the OoO queue.
+ */
+ if (!RB_EMPTY_ROOT(&msk->out_of_order_queue))
+ return false;
+
spin_lock_bh(&msk->fallback_lock);
if (!msk->allow_infinite_fallback) {
spin_unlock_bh(&msk->fallback_lock);
Hello,
New build issue found on stable-rc/linux-6.11.y:
---
./include/net/ip.h:472:14: error: default initialization of an object of type 'typeof (rt->dst.expires)' (aka 'const unsigned long') leaves the object uninitialized [-Werror,-Wdefault-const-init-var-unsafe] in security/selinux/avc.o (security/selinux/avc.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:16f2d4d7ea51d0d2cee2101fea4a5d762aaeca6d
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: f6d41443f54856ceece0d5b584f47f681513bde4
- tags: v6.11.11
Log excerpt:
=====================================================
In file included from security/selinux/avc.c:30:
In file included from ./security/selinux/include/avc.h:18:
In file included from ./include/linux/lsm_audit.h:25:
In file included from ./include/rdma/ib_verbs.h:26:
./include/net/ip.h:472:14: error: default initialization of an object of type 'typeof (rt->dst.expires)' (aka 'const unsigned long') leaves the object uninitialized [-Werror,-Wdefault-const-init-var-unsafe]
472 | if (mtu && time_before(jiffies, rt->dst.expires))
| ^
./include/linux/jiffies.h:138:26: note: expanded from macro 'time_before'
138 | #define time_before(a,b) time_after(b,a)
| ^
./include/linux/jiffies.h:128:3: note: expanded from macro 'time_after'
128 | (typecheck(unsigned long, a) && \
| ^
./include/linux/typecheck.h:11:12: note: expanded from macro 'typecheck'
11 | typeof(x) __dummy2; \
| ^
1 error generated.
CC security/keys/request_key_auth.o
=====================================================
# Builds where the incident occurred:
## i386_defconfig+allmodconfig+CONFIG_FRAME_WARN=2048 on (i386):
- compiler: clang-21
- config: https://files.kernelci.org/kbuild-clang-21-i386-allmodconfig-69250a4bf5b874…
- dashboard: https://d.kernelci.org/build/maestro:69250a4bf5b8743b1f5fbe05
## x86_64_defconfig on (x86_64):
- compiler: clang-21
- config: https://files.kernelci.org/kbuild-clang-21-x86-69250a16f5b8743b1f5fbddc/.co…
- dashboard: https://d.kernelci.org/build/maestro:69250a16f5b8743b1f5fbddc
#kernelci issue maestro:16f2d4d7ea51d0d2cee2101fea4a5d762aaeca6d
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-6.10.y:
---
the frame size of 1192 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] in drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_state.o (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_state.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:2fa2a6ca830cbfb7c388f6da90bf40ae58e3185b
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 47c2f92131c47a37ea0e3d8e1a4e4c82a9b473d4
- tags: v6.10.14
Log excerpt:
=====================================================
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_state.c:219:1: error: the frame size of 1192 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
219 | }
| ^
CC drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_reg.o
CC drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn20.o
CC drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn21.o
CC drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn30.o
CC drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn301.o
CC drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn302.o
cc1: all warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## defconfig+kcidebug+x86-board on (i386):
- compiler: gcc-14
- config: https://files.kernelci.org/kbuild-gcc-14-x86-kcidebug-6924fef4f5b8743b1f5fa…
- dashboard: https://d.kernelci.org/build/maestro:6924fef4f5b8743b1f5fac6f
#kernelci issue maestro:2fa2a6ca830cbfb7c388f6da90bf40ae58e3185b
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org