folio split clears PG_has_hwpoisoned, but the flag should be preserved in
after-split folios containing pages with PG_hwpoisoned flag if the folio is
split to >0 order folios. Scan all pages in a to-be-split folio to
determine which after-split folios need the flag.
An alternatives is to change PG_has_hwpoisoned to PG_maybe_hwpoisoned to
avoid the scan and set it on all after-split folios, but resulting false
positive has undesirable negative impact. To remove false positive, caller
of folio_test_has_hwpoisoned() and folio_contain_hwpoisoned_page() needs to
do the scan. That might be causing a hassle for current and future callers
and more costly than doing the scan in the split code. More details are
discussed in [1].
This issue can be exposed via:
1. splitting a has_hwpoisoned folio to >0 order from debugfs interface;
2. truncating part of a has_hwpoisoned folio in
truncate_inode_partial_folio().
And later accesses to a hwpoisoned page could be possible due to the
missing has_hwpoisoned folio flag. This will lead to MCE errors.
Link: https://lore.kernel.org/all/CAHbLzkoOZm0PXxE9qwtF4gKR=cpRXrSrJ9V9Pm2DJexs98… [1]
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
---
From V3[1]:
1. Separated from the original series;
2. Added Fixes tag and cc'd stable;
3. Simplified page_range_has_hwpoisoned();
4. Renamed check_poisoned_pages to handle_hwpoison, made it const, and
shorten the statement;
5. Removed poisoned_new_folio variable and checked the condition
directly.
[1] https://lore.kernel.org/all/20251022033531.389351-2-ziy@nvidia.com/
mm/huge_memory.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fc65ec3393d2..5215bb6aecfc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3455,6 +3455,14 @@ bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
caller_pins;
}
+static bool page_range_has_hwpoisoned(struct page *page, long nr_pages)
+{
+ for (; nr_pages; page++, nr_pages--)
+ if (PageHWPoison(page))
+ return true;
+ return false;
+}
+
/*
* It splits @folio into @new_order folios and copies the @folio metadata to
* all the resulting folios.
@@ -3462,17 +3470,24 @@ bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
static void __split_folio_to_order(struct folio *folio, int old_order,
int new_order)
{
+ /* Scan poisoned pages when split a poisoned folio to large folios */
+ const bool handle_hwpoison = folio_test_has_hwpoisoned(folio) && new_order;
long new_nr_pages = 1 << new_order;
long nr_pages = 1 << old_order;
long i;
+ folio_clear_has_hwpoisoned(folio);
+
+ /* Check first new_nr_pages since the loop below skips them */
+ if (handle_hwpoison &&
+ page_range_has_hwpoisoned(folio_page(folio, 0), new_nr_pages))
+ folio_set_has_hwpoisoned(folio);
/*
* Skip the first new_nr_pages, since the new folio from them have all
* the flags from the original folio.
*/
for (i = new_nr_pages; i < nr_pages; i += new_nr_pages) {
struct page *new_head = &folio->page + i;
-
/*
* Careful: new_folio is not a "real" folio before we cleared PageTail.
* Don't pass it around before clear_compound_head().
@@ -3514,6 +3529,10 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
(1L << PG_dirty) |
LRU_GEN_MASK | LRU_REFS_MASK));
+ if (handle_hwpoison &&
+ page_range_has_hwpoisoned(new_head, new_nr_pages))
+ folio_set_has_hwpoisoned(new_folio);
+
new_folio->mapping = folio->mapping;
new_folio->index = folio->index + i;
@@ -3600,8 +3619,6 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
int start_order = uniform_split ? new_order : old_order - 1;
int split_order;
- folio_clear_has_hwpoisoned(folio);
-
/*
* split to new_order one order at a time. For uniform split,
* folio is split to new_order directly.
--
2.51.0
A kernel memory leak was identified by the 'ioctl_sg01' test from Linux
Test Project (LTP). The following bytes were maily observed: 0x53425355.
When USB storage devices incorrectly skip the data phase with status data,
the code extracts/validates the CSW from the sg buffer, but fails to clear
it afterwards. This leaves status protocol data in srb's transfer buffer,
such as the US_BULK_CS_SIGN 'USBS' signature observed here. Thus, this
leads to USB protocols leaks to user space through SCSI generic (/dev/sg*)
interfaces, such as the one seen here when the LTP test requested 512 KiB.
Fix the leak by zeroing the CSW data in srb's transfer buffer immediately
after the validation of devices that skip data phase.
Note: Differently from CVE-2018-1000204, which fixed a big leak by zero-
ing pages at allocation time, this leak occurs after allocation, when USB
protocol data is written to already-allocated sg pages.
Fixes: a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Desnes Nunes <desnesn(a)redhat.com>
---
drivers/usb/storage/transport.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1aa1bd26c81f..8e9f6459e197 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -1200,7 +1200,17 @@ int usb_stor_Bulk_transport(struct scsi_cmnd *srb, struct us_data *us)
US_BULK_CS_WRAP_LEN &&
bcs->Signature ==
cpu_to_le32(US_BULK_CS_SIGN)) {
+ unsigned char buf[US_BULK_CS_WRAP_LEN];
+
+ sg = NULL;
+ offset = 0;
+ memset(buf, 0, US_BULK_CS_WRAP_LEN);
usb_stor_dbg(us, "Device skipped data phase\n");
+
+ if (usb_stor_access_xfer_buf(buf, US_BULK_CS_WRAP_LEN, srb,
+ &sg, &offset, TO_XFER_BUF) != US_BULK_CS_WRAP_LEN)
+ usb_stor_dbg(us, "Failed to clear CSW data\n");
+
scsi_set_resid(srb, transfer_length);
goto skipped_data_phase;
}
--
2.51.0
From: Al Viro <viro(a)zeniv.linux.org.uk>
commit c28f922c9dcee0e4876a2c095939d77fe7e15116 upstream.
What we want is to verify there is that clone won't expose something
hidden by a mount we wouldn't be able to undo. "Wouldn't be able to undo"
may be a result of MNT_LOCKED on a child, but it may also come from
lacking admin rights in the userns of the namespace mount belongs to.
clone_private_mnt() checks the former, but not the latter.
There's a number of rather confusing CAP_SYS_ADMIN checks in various
userns during the mount, especially with the new mount API; they serve
different purposes and in case of clone_private_mnt() they usually,
but not always end up covering the missing check mentioned above.
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Reported-by: "Orlando, Noah" <Noah.Orlando(a)deshaw.com>
Fixes: 427215d85e8d ("ovl: prevent private clone if bind mount is not allowed")
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
[ merge conflict resolution: clone_private_mount() was reworked in
db04662e2f4f ("fs: allow detached mounts in clone_private_mount()").
Tweak the relevant ns_capable check so that it works on older kernels ]
Signed-off-by: Noah Orlando <Noah.Orlando(a)deshaw.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
(cherry picked from commit 36fecd740de2d542d2091d65d36554ee2bcf9c65)
[ kovalev: bp to fix CVE-2025-38499 ]
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
fs/namespace.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/namespace.c b/fs/namespace.c
index e9b8d516f191..b6d9250e1b38 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1965,6 +1965,11 @@ struct vfsmount *clone_private_mount(const struct path *path)
if (!check_mnt(old_mnt))
goto invalid;
+ if (!ns_capable(old_mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)) {
+ up_read(&namespace_sem);
+ return ERR_PTR(-EPERM);
+ }
+
if (has_locked_children(old_mnt, path->dentry))
goto invalid;
--
2.50.1
From: Moon Hee Lee <moonhee.lee.ca(a)gmail.com>
commit 16ecdab5446f15a61ec88eb0d23d25d009821db0 upstream.
syzbot triggered a WARN in ieee80211_tdls_oper() by sending
NL80211_TDLS_ENABLE_LINK immediately after NL80211_CMD_CONNECT,
before association completed and without prior TDLS setup.
This left internal state like sdata->u.mgd.tdls_peer uninitialized,
leading to a WARN_ON() in code paths that assumed it was valid.
Reject the operation early if not in station mode or not associated.
Reported-by: syzbot+f73f203f8c9b19037380(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f73f203f8c9b19037380
Fixes: 81dd2b882241 ("mac80211: move TDLS data to mgd private part")
Tested-by: syzbot+f73f203f8c9b19037380(a)syzkaller.appspotmail.com
Signed-off-by: Moon Hee Lee <moonhee.lee.ca(a)gmail.com>
Link: https://patch.msgid.link/20250715230904.661092-2-moonhee.lee.ca@gmail.com
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
[ kovalev: bp to fix CVE-2025-38644; adapted check from !sdata->vif.cfg.assoc
to !sdata->vif.bss_conf.assoc due to the older kernel not having the mac80211
configuration refactoring (see upstream commit f276e20b182d) ]
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
net/mac80211/tdls.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/tdls.c b/net/mac80211/tdls.c
index 137be9ec94af..69a67fcf1112 100644
--- a/net/mac80211/tdls.c
+++ b/net/mac80211/tdls.c
@@ -1349,7 +1349,7 @@ int ieee80211_tdls_oper(struct wiphy *wiphy, struct net_device *dev,
if (!(wiphy->flags & WIPHY_FLAG_SUPPORTS_TDLS))
return -ENOTSUPP;
- if (sdata->vif.type != NL80211_IFTYPE_STATION)
+ if (sdata->vif.type != NL80211_IFTYPE_STATION || !sdata->vif.bss_conf.assoc)
return -EINVAL;
switch (oper) {
--
2.50.1
From: Ido Schimmel <idosch(a)nvidia.com>
commit 1f5d2fd1ca04a23c18b1bde9a43ce2fa2ffa1bce upstream.
When the "proxy" option is enabled on a VXLAN device, the device will
suppress ARP requests and IPv6 Neighbor Solicitation messages if it is
able to reply on behalf of the remote host. That is, if a matching and
valid neighbor entry is configured on the VXLAN device whose MAC address
is not behind the "any" remote (0.0.0.0 / ::).
The code currently assumes that the FDB entry for the neighbor's MAC
address points to a valid remote destination, but this is incorrect if
the entry is associated with an FDB nexthop group. This can result in a
NPD [1][3] which can be reproduced using [2][4].
Fix by checking that the remote destination exists before dereferencing
it.
[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 4 UID: 0 PID: 365 Comm: arping Not tainted 6.17.0-rc2-virtme-g2a89cb21162c #2 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:vxlan_xmit+0xb58/0x15f0
[...]
Call Trace:
<TASK>
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
packet_sendmsg+0x113a/0x1850
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
#!/bin/bash
ip address add 192.0.2.1/32 dev lo
ip nexthop add id 1 via 192.0.2.2 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 192.0.2.1 dstport 4789 proxy
ip neigh add 192.0.2.3 lladdr 00:11:22:33:44:55 nud perm dev vx0
bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10
arping -b -c 1 -s 192.0.2.1 -I vx0 192.0.2.3
[3]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 13 UID: 0 PID: 372 Comm: ndisc6 Not tainted 6.17.0-rc2-virtmne-g6ee90cb26014 #3 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1v996), BIOS 1.17.0-4.fc41 04/01/2x014
RIP: 0010:vxlan_xmit+0x803/0x1600
[...]
Call Trace:
<TASK>
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
ip6_finish_output2+0x210/0x6c0
ip6_finish_output+0x1af/0x2b0
ip6_mr_output+0x92/0x3e0
ip6_send_skb+0x30/0x90
rawv6_sendmsg+0xe6e/0x12e0
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f383422ec77
[4]
#!/bin/bash
ip address add 2001:db8:1::1/128 dev lo
ip nexthop add id 1 via 2001:db8:1::1 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 2001:db8:1::1 dstport 4789 proxy
ip neigh add 2001:db8:1::3 lladdr 00:11:22:33:44:55 nud perm dev vx0
bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10
ndisc6 -r 1 -s 2001:db8:1::1 -w 1 2001:db8:1::3 vx0
Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org>
Link: https://patch.msgid.link/20250901065035.159644-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[ kovalev: bp to fix CVE-2025-39850 ]
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
drivers/net/vxlan/vxlan_core.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 1b6b6acd3489..57606891e413 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -1883,6 +1883,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni)
n = neigh_lookup(&arp_tbl, &tip, dev);
if (n) {
+ struct vxlan_rdst *rdst = NULL;
struct vxlan_fdb *f;
struct sk_buff *reply;
@@ -1892,7 +1893,9 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni)
}
f = vxlan_find_mac(vxlan, n->ha, vni);
- if (f && vxlan_addr_any(&(first_remote_rcu(f)->remote_ip))) {
+ if (f)
+ rdst = first_remote_rcu(f);
+ if (rdst && vxlan_addr_any(&rdst->remote_ip)) {
/* bridge-local neighbor */
neigh_release(n);
goto out;
@@ -2047,6 +2050,7 @@ static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni)
n = neigh_lookup(ipv6_stub->nd_tbl, &msg->target, dev);
if (n) {
+ struct vxlan_rdst *rdst = NULL;
struct vxlan_fdb *f;
struct sk_buff *reply;
@@ -2056,7 +2060,9 @@ static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni)
}
f = vxlan_find_mac(vxlan, n->ha, vni);
- if (f && vxlan_addr_any(&(first_remote_rcu(f)->remote_ip))) {
+ if (f)
+ rdst = first_remote_rcu(f);
+ if (rdst && vxlan_addr_any(&rdst->remote_ip)) {
/* bridge-local neighbor */
neigh_release(n);
goto out;
--
2.50.1
A malicious user could pass an arbitrarily bad value
to memdup_user_nul(), potentially causing kernel crash.
This follows the same pattern as commit ee76746387f6
("netdevsim: prevent bad user input in nsim_dev_health_break_write()")
and commit 7ef4c19d245f
("smackfs: restrict bytes count in smackfs write functions")
Found via static analysis and code review.
Fixes: d0e6a8064c42 ("bna: use memdup_user to copy userspace buffers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
---
drivers/net/ethernet/brocade/bna/bnad_debugfs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/brocade/bna/bnad_debugfs.c b/drivers/net/ethernet/brocade/bna/bnad_debugfs.c
index 8f0972e6737c..ad33ab1d266d 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_debugfs.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_debugfs.c
@@ -311,6 +311,9 @@ bnad_debugfs_write_regrd(struct file *file, const char __user *buf,
unsigned long flags;
void *kern_buf;
+ if (nbytes == 0 || nbytes > PAGE_SIZE)
+ return -EINVAL;
+
/* Copy the user space buf */
kern_buf = memdup_user_nul(buf, nbytes);
if (IS_ERR(kern_buf))
--
2.39.5 (Apple Git-154)