Hi stable maintainers,
I have tried backporting some fixes to stable kernel 6.12.y which also have CVE numbers and are fixing commits in 6.12.y.
I am not a subsystem expert and have only done overall testing that we do for stable release candidate testing and not any patch specific testing.
Note: All these patches are present backports from upstream.
Patch 1: Fixes a race, by reading the code, I can confirm 6.12.y needs this backport -- Few conflicts resolved. This is fix for CVE-2025-38306
Patch 2,3,4,5 -- So this set corresponds to fixing CVE-2025-38272, commit: 1237c2d4a8db ("net: dsa: b53: do not enable EEE on bcm63xx"). While we can comeup with downstream fix byt writing a new function, I think backporting some prerequisites would help us backporting future fixes smoothly to 6.12.y. Patch 2,3,4 are pulled in as prerequisites. Patch3 has minor conflict resolution due to other missing patches. And Patch 5 is again a clean cherry-pick.
Patch 6,7,8: Patch 6 corresponds to the CVE-2025-22125 fix, Patch 7 is a fix for Patch6 and and Patch 8 fixes Patch 7. patch 6 had minor conflict resolution due to missing atomic writes feature in 6.12.y and Patches 7 and 8 are clean cherrypicks.
Patch 9, 10: Patch 10 is a fix for CVE-2025-22113 and patch 9 is pulled in as a prerequisite. Both are clean cherry-picks.
Patch 11: Had conflict resolution in the header file, this is fix for CVE-2025-38453
Patch 12, 13 : Patch 12 is a clean cherrypick and a fix for CVE-2025-23133. This wan't backported earlier or probably dropped as it showed up a regression which is fixed by Patch 13. [1], so we should be fine.
Patch 14: CVE-2025-22103 fix, clean cherry-pick
Patch 15: CVE-2025-22124 fix and a clean cherry-pick.
Please let me know if there are any comments.
Thanks, Harshit
[1] https://lore.kernel.org/all/2025041740-tableware-flight-b781@gregkh/
Al Viro (1): fs/fhandle.c: fix a race in call of has_locked_children()
Jens Axboe (1): io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCU
Jonas Gorski (1): net: dsa: b53: do not enable EEE on bcm63xx
Ojaswin Mujoo (2): ext4: define ext4_journal_destroy wrapper ext4: avoid journaling sb update on error if journal is destroying
Russell King (Oracle) (3): net: dsa: add hook to determine whether EEE is supported net: dsa: provide implementation of .support_eee() net: dsa: b53/bcm_sf2: implement .support_eee() method
Su Yue (1): md/md-bitmap: fix wrong bitmap_limit for clustermd when write sb
Wang Liang (1): net: fix NULL pointer dereference in l3mdev_l3_rcv
Wen Gong (2): wifi: ath11k: update channel list in reg notifier instead reg worker wifi: ath11k: update channel list in worker when wait flag is set
Yu Kuai (2): md/raid1,raid10: don't ignore IO flags md/raid1,raid10: don't handle IO error for REQ_RAHEAD and REQ_NOWAIT
Zheng Qixing (1): md/raid1,raid10: strip REQ_NOWAIT from member bios
drivers/md/md-bitmap.c | 6 +- drivers/md/raid1-10.c | 10 +++ drivers/md/raid1.c | 26 +++--- drivers/md/raid10.c | 20 ++--- drivers/net/dsa/b53/b53_common.c | 16 ++-- drivers/net/dsa/b53/b53_priv.h | 1 + drivers/net/dsa/bcm_sf2.c | 1 + drivers/net/ipvlan/ipvlan_l3s.c | 1 - drivers/net/wireless/ath/ath11k/core.c | 1 + drivers/net/wireless/ath/ath11k/core.h | 5 +- drivers/net/wireless/ath/ath11k/mac.c | 14 ++++ drivers/net/wireless/ath/ath11k/reg.c | 107 +++++++++++++++++-------- drivers/net/wireless/ath/ath11k/reg.h | 3 +- drivers/net/wireless/ath/ath11k/wmi.h | 1 + fs/ext4/ext4.h | 3 +- fs/ext4/ext4_jbd2.h | 29 +++++++ fs/ext4/super.c | 32 ++++---- fs/namespace.c | 18 ++++- include/linux/io_uring_types.h | 2 + include/net/dsa.h | 2 + io_uring/msg_ring.c | 4 +- net/dsa/port.c | 16 ++++ net/dsa/user.c | 8 ++ 23 files changed, 230 insertions(+), 96 deletions(-)
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit 1f282cdc1d219c4a557f7009e81bc792820d9d9a ]
may_decode_fh() is calling has_locked_children() while holding no locks. That's an oopsable race...
The rest of the callers are safe since they are holding namespace_sem and are guaranteed a positive refcount on the mount in question.
Rename the current has_locked_children() to __has_locked_children(), make it static and switch the fs/namespace.c users to it.
Make has_locked_children() a wrapper for __has_locked_children(), calling the latter under read_seqlock_excl(&mount_lock).
Reviewed-by: Christian Brauner brauner@kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks") Signed-off-by: Al Viro viro@zeniv.linux.org.uk (cherry picked from commit 1f282cdc1d219c4a557f7009e81bc792820d9d9a) [Harshit:Resolved conflicts due to missing commit: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()") in linux-6.12.y Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- fs/namespace.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c index 962fda4fa246..c8519302f582 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2227,7 +2227,7 @@ void drop_collected_mounts(struct vfsmount *mnt) namespace_unlock(); }
-bool has_locked_children(struct mount *mnt, struct dentry *dentry) +static bool __has_locked_children(struct mount *mnt, struct dentry *dentry) { struct mount *child;
@@ -2241,6 +2241,16 @@ bool has_locked_children(struct mount *mnt, struct dentry *dentry) return false; }
+bool has_locked_children(struct mount *mnt, struct dentry *dentry) +{ + bool res; + + read_seqlock_excl(&mount_lock); + res = __has_locked_children(mnt, dentry); + read_sequnlock_excl(&mount_lock); + return res; +} + /** * clone_private_mount - create a private clone of a path * @path: path to clone @@ -2268,7 +2278,7 @@ struct vfsmount *clone_private_mount(const struct path *path) return ERR_PTR(-EPERM); }
- if (has_locked_children(old_mnt, path->dentry)) + if (__has_locked_children(old_mnt, path->dentry)) goto invalid;
new_mnt = clone_mnt(old_mnt, path->dentry, CL_PRIVATE); @@ -2762,7 +2772,7 @@ static struct mount *__do_loopback(struct path *old_path, int recurse) if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations) return mnt;
- if (!recurse && has_locked_children(old, old_path->dentry)) + if (!recurse && __has_locked_children(old, old_path->dentry)) return mnt;
if (recurse) @@ -3152,7 +3162,7 @@ static int do_set_group(struct path *from_path, struct path *to_path) goto out;
/* From mount should not have locked children in place of To's root */ - if (has_locked_children(from, to->mnt.mnt_root)) + if (__has_locked_children(from, to->mnt.mnt_root)) goto out;
/* Setting sharing groups is only allowed on private mounts */
From: "Russell King (Oracle)" rmk+kernel@armlinux.org.uk
[ Upstream commit 9723a77318b7c0cfd06ea207e52a042f8c815318 ]
Add a hook to determine whether the switch supports EEE. This will return false if the switch does not, or true if it does. If the method is not implemented, we assume (currently) that the switch supports EEE.
Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Link: https://patch.msgid.link/E1tL144-006cZD-El@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 9723a77318b7c0cfd06ea207e52a042f8c815318) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- include/net/dsa.h | 1 + net/dsa/user.c | 8 ++++++++ 2 files changed, 9 insertions(+)
diff --git a/include/net/dsa.h b/include/net/dsa.h index d7a6c2930277..fa99fc5249e9 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -1003,6 +1003,7 @@ struct dsa_switch_ops { /* * Port's MAC EEE settings */ + bool (*support_eee)(struct dsa_switch *ds, int port); int (*set_mac_eee)(struct dsa_switch *ds, int port, struct ethtool_keee *e); int (*get_mac_eee)(struct dsa_switch *ds, int port, diff --git a/net/dsa/user.c b/net/dsa/user.c index 64f660d2334b..06267c526dc4 100644 --- a/net/dsa/user.c +++ b/net/dsa/user.c @@ -1231,6 +1231,10 @@ static int dsa_user_set_eee(struct net_device *dev, struct ethtool_keee *e) struct dsa_switch *ds = dp->ds; int ret;
+ /* Check whether the switch supports EEE */ + if (ds->ops->support_eee && !ds->ops->support_eee(ds, dp->index)) + return -EOPNOTSUPP; + /* Port's PHY and MAC both need to be EEE capable */ if (!dev->phydev || !dp->pl) return -ENODEV; @@ -1251,6 +1255,10 @@ static int dsa_user_get_eee(struct net_device *dev, struct ethtool_keee *e) struct dsa_switch *ds = dp->ds; int ret;
+ /* Check whether the switch supports EEE */ + if (ds->ops->support_eee && !ds->ops->support_eee(ds, dp->index)) + return -EOPNOTSUPP; + /* Port's PHY and MAC both need to be EEE capable */ if (!dev->phydev || !dp->pl) return -ENODEV;
From: "Russell King (Oracle)" rmk+kernel@armlinux.org.uk
[ Upstream commit 99379f587278c818777cb4778e2c79c6c1440c65 ]
Provide a trivial implementation for the .support_eee() method which switch drivers can use to simply indicate that they support EEE on all their user ports.
Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Link: https://patch.msgid.link/E1tL149-006cZJ-JJ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 99379f587278c818777cb4778e2c79c6c1440c65) [Harshit: Resolve contextual conflicts due to missing commit: 539770616521 ("net: dsa: remove obsolete phylink dsa_switch operations") and commit: ecb595ebba0e ("net: dsa: remove dsa_port_phylink_mac_select_pcs()") in 6.12.y] Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- include/net/dsa.h | 1 + net/dsa/port.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+)
diff --git a/include/net/dsa.h b/include/net/dsa.h index fa99fc5249e9..877f9b270cf6 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -1399,5 +1399,6 @@ static inline bool dsa_user_dev_check(const struct net_device *dev)
netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev); void dsa_port_phylink_mac_change(struct dsa_switch *ds, int port, bool up); +bool dsa_supports_eee(struct dsa_switch *ds, int port);
#endif diff --git a/net/dsa/port.c b/net/dsa/port.c index 25258b33e59e..9c77c80e8fe9 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -1589,6 +1589,22 @@ dsa_port_phylink_mac_select_pcs(struct phylink_config *config, return pcs; }
+/* dsa_supports_eee - indicate that EEE is supported + * @ds: pointer to &struct dsa_switch + * @port: port index + * + * A default implementation for the .support_eee() DSA operations member, + * which drivers can use to indicate that they support EEE on all of their + * user ports. + * + * Returns: true + */ +bool dsa_supports_eee(struct dsa_switch *ds, int port) +{ + return true; +} +EXPORT_SYMBOL_GPL(dsa_supports_eee); + static void dsa_port_phylink_mac_config(struct phylink_config *config, unsigned int mode, const struct phylink_link_state *state)
From: "Russell King (Oracle)" rmk+kernel@armlinux.org.uk
[ Upstream commit c86692fc2cb77d94dd8c166c2b9017f196d02a84 ]
Implement the .support_eee() method to indicate that EEE is not supported by two switch variants, rather than making these checks in the .set_mac_eee() and .get_mac_eee() methods.
Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Link: https://patch.msgid.link/E1tL14E-006cZU-Nc@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit c86692fc2cb77d94dd8c166c2b9017f196d02a84) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/net/dsa/b53/b53_common.c | 13 +++++++------ drivers/net/dsa/b53/b53_priv.h | 1 + drivers/net/dsa/bcm_sf2.c | 1 + 3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index 844cf2b8f727..f9c1cf71b059 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -2388,13 +2388,16 @@ int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy) } EXPORT_SYMBOL(b53_eee_init);
-int b53_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_keee *e) +bool b53_support_eee(struct dsa_switch *ds, int port) { struct b53_device *dev = ds->priv;
- if (is5325(dev) || is5365(dev)) - return -EOPNOTSUPP; + return !is5325(dev) && !is5365(dev); +} +EXPORT_SYMBOL(b53_support_eee);
+int b53_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_keee *e) +{ return 0; } EXPORT_SYMBOL(b53_get_mac_eee); @@ -2404,9 +2407,6 @@ int b53_set_mac_eee(struct dsa_switch *ds, int port, struct ethtool_keee *e) struct b53_device *dev = ds->priv; struct ethtool_keee *p = &dev->ports[port].eee;
- if (is5325(dev) || is5365(dev)) - return -EOPNOTSUPP; - p->eee_enabled = e->eee_enabled; b53_eee_enable_set(ds, port, e->eee_enabled);
@@ -2463,6 +2463,7 @@ static const struct dsa_switch_ops b53_switch_ops = { .port_setup = b53_setup_port, .port_enable = b53_enable_port, .port_disable = b53_disable_port, + .support_eee = b53_support_eee, .get_mac_eee = b53_get_mac_eee, .set_mac_eee = b53_set_mac_eee, .port_bridge_join = b53_br_join, diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h index 4f8c97098d2a..e908397e8b9a 100644 --- a/drivers/net/dsa/b53/b53_priv.h +++ b/drivers/net/dsa/b53/b53_priv.h @@ -387,6 +387,7 @@ int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy); void b53_disable_port(struct dsa_switch *ds, int port); void b53_brcm_hdr_setup(struct dsa_switch *ds, int port); int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy); +bool b53_support_eee(struct dsa_switch *ds, int port); int b53_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_keee *e); int b53_set_mac_eee(struct dsa_switch *ds, int port, struct ethtool_keee *e);
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index c4771a07878e..f1372830d5fa 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -1233,6 +1233,7 @@ static const struct dsa_switch_ops bcm_sf2_ops = { .port_setup = b53_setup_port, .port_enable = bcm_sf2_port_setup, .port_disable = bcm_sf2_port_disable, + .support_eee = b53_support_eee, .get_mac_eee = b53_get_mac_eee, .set_mac_eee = b53_set_mac_eee, .port_bridge_join = b53_br_join,
From: Jonas Gorski jonas.gorski@gmail.com
[ Upstream commit 1237c2d4a8db79dfd4369bff6930b0e385ed7d5c ]
BCM63xx internal switches do not support EEE, but provide multiple RGMII ports where external PHYs may be connected. If one of these PHYs are EEE capable, we may try to enable EEE for the MACs, which then hangs the system on access of the (non-existent) EEE registers.
Fix this by checking if the switch actually supports EEE before attempting to configure it.
Fixes: 22256b0afb12 ("net: dsa: b53: Move EEE functions to b53") Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Tested-by: Álvaro Fernández Rojas noltari@gmail.com Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Link: https://patch.msgid.link/20250602193953.1010487-2-jonas.gorski@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com
(cherry picked from commit 1237c2d4a8db79dfd4369bff6930b0e385ed7d5c) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/net/dsa/b53/b53_common.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index f9c1cf71b059..c903c6fcc666 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -2378,6 +2378,9 @@ int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy) { int ret;
+ if (!b53_support_eee(ds, port)) + return 0; + ret = phy_init_eee(phy, false); if (ret) return 0; @@ -2392,7 +2395,7 @@ bool b53_support_eee(struct dsa_switch *ds, int port) { struct b53_device *dev = ds->priv;
- return !is5325(dev) && !is5365(dev); + return !is5325(dev) && !is5365(dev) && !is63xx(dev); } EXPORT_SYMBOL(b53_support_eee);
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit e879a0d9cb086c8e52ce6c04e5bfa63825a6213c ]
If blk-wbt is enabled by default, it's found that raid write performance is quite bad because all IO are throttled by wbt of underlying disks, due to flag REQ_IDLE is ignored. And turns out this behaviour exist since blk-wbt is introduced.
Other than REQ_IDLE, other flags should not be ignored as well, for example REQ_META can be set for filesystems, clearing it can cause priority reverse problems; And REQ_NOWAIT should not be cleared as well, because io will wait instead of failing directly in underlying disks.
Fix those problems by keep IO flags from master bio.
Fises: f51d46d0e7cb ("md: add support for REQ_NOWAIT") Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") Fixes: 5404bc7a87b9 ("[PATCH] Allow file systems to differentiate between data and meta reads") Link: https://lore.kernel.org/linux-raid/20250227121657.832356-1-yukuai1@huaweiclo... Signed-off-by: Yu Kuai yukuai3@huawei.com (cherry picked from commit e879a0d9cb086c8e52ce6c04e5bfa63825a6213c) [Harshit: Resolve conflicts due to missing commit: f2a38abf5f1c ("md/raid1: Atomic write support") and commit: a1d9b4fd42d9 ("md/raid10: Atomic write support") in 6.12.y, we don't have Atomic writes feature in 6.12.y] Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/md/raid1.c | 4 ---- drivers/md/raid10.c | 7 ------- 2 files changed, 11 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index fe1599db69c8..6e93e3b6bd8c 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1315,8 +1315,6 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio, struct r1conf *conf = mddev->private; struct raid1_info *mirror; struct bio *read_bio; - const enum req_op op = bio_op(bio); - const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC; int max_sectors; int rdisk; bool r1bio_existed = !!r1_bio; @@ -1399,7 +1397,6 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio, read_bio->bi_iter.bi_sector = r1_bio->sector + mirror->rdev->data_offset; read_bio->bi_end_io = raid1_end_read_request; - read_bio->bi_opf = op | do_sync; if (test_bit(FailFast, &mirror->rdev->flags) && test_bit(R1BIO_FailFast, &r1_bio->state)) read_bio->bi_opf |= MD_FAILFAST; @@ -1619,7 +1616,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
mbio->bi_iter.bi_sector = (r1_bio->sector + rdev->data_offset); mbio->bi_end_io = raid1_end_write_request; - mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); if (test_bit(FailFast, &rdev->flags) && !test_bit(WriteMostly, &rdev->flags) && conf->raid_disks - mddev->degraded > 1) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 7515a98001ca..8bb9ec39dc4f 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1146,8 +1146,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio, { struct r10conf *conf = mddev->private; struct bio *read_bio; - const enum req_op op = bio_op(bio); - const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC; int max_sectors; struct md_rdev *rdev; char b[BDEVNAME_SIZE]; @@ -1226,7 +1224,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio, read_bio->bi_iter.bi_sector = r10_bio->devs[slot].addr + choose_data_offset(r10_bio, rdev); read_bio->bi_end_io = raid10_end_read_request; - read_bio->bi_opf = op | do_sync; if (test_bit(FailFast, &rdev->flags) && test_bit(R10BIO_FailFast, &r10_bio->state)) read_bio->bi_opf |= MD_FAILFAST; @@ -1240,9 +1237,6 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio, struct bio *bio, bool replacement, int n_copy) { - const enum req_op op = bio_op(bio); - const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC; - const blk_opf_t do_fua = bio->bi_opf & REQ_FUA; unsigned long flags; struct r10conf *conf = mddev->private; struct md_rdev *rdev; @@ -1261,7 +1255,6 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio, mbio->bi_iter.bi_sector = (r10_bio->devs[n_copy].addr + choose_data_offset(r10_bio, rdev)); mbio->bi_end_io = raid10_end_write_request; - mbio->bi_opf = op | do_sync | do_fua; if (!replacement && test_bit(FailFast, &conf->mirrors[devnum].rdev->flags) && enough(conf, devnum))
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 9f346f7d4ea73692b82f5102ca8698e4040469ea ]
IO with REQ_RAHEAD or REQ_NOWAIT can fail early, even if the storage medium is fine, hence record badblocks or remove the disk from array does not make sense.
This problem if found by lvm2 test lvcreate-large-raid, where dm-zero will fail read ahead IO directly.
Fixes: e879a0d9cb08 ("md/raid1,raid10: don't ignore IO flags") Reported-and-tested-by: Mikulas Patocka mpatocka@redhat.com Closes: https://lore.kernel.org/all/34fa755d-62c8-4588-8ee1-33cb1249bdf2@redhat.com/ Link: https://lore.kernel.org/linux-raid/20250527081407.3004055-1-yukuai1@huaweicl... Signed-off-by: Yu Kuai yukuai3@huawei.com (cherry picked from commit 9f346f7d4ea73692b82f5102ca8698e4040469ea) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/md/raid1-10.c | 10 ++++++++++ drivers/md/raid1.c | 19 ++++++++++--------- drivers/md/raid10.c | 11 ++++++----- 3 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c index 4378d3250bd7..f3750ceaa582 100644 --- a/drivers/md/raid1-10.c +++ b/drivers/md/raid1-10.c @@ -293,3 +293,13 @@ static inline bool raid1_should_read_first(struct mddev *mddev,
return false; } + +/* + * bio with REQ_RAHEAD or REQ_NOWAIT can fail at anytime, before such IO is + * submitted to the underlying disks, hence don't record badblocks or retry + * in this case. + */ +static inline bool raid1_should_handle_error(struct bio *bio) +{ + return !(bio->bi_opf & (REQ_RAHEAD | REQ_NOWAIT)); +} diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 6e93e3b6bd8c..9581c94450a4 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -371,14 +371,16 @@ static void raid1_end_read_request(struct bio *bio) */ update_head_pos(r1_bio->read_disk, r1_bio);
- if (uptodate) + if (uptodate) { set_bit(R1BIO_Uptodate, &r1_bio->state); - else if (test_bit(FailFast, &rdev->flags) && - test_bit(R1BIO_FailFast, &r1_bio->state)) + } else if (test_bit(FailFast, &rdev->flags) && + test_bit(R1BIO_FailFast, &r1_bio->state)) { /* This was a fail-fast read so we definitely * want to retry */ ; - else { + } else if (!raid1_should_handle_error(bio)) { + uptodate = 1; + } else { /* If all other devices have failed, we want to return * the error upwards rather than fail the last device. * Here we redefine "uptodate" to mean "Don't want to retry" @@ -449,16 +451,15 @@ static void raid1_end_write_request(struct bio *bio) struct bio *to_put = NULL; int mirror = find_bio_disk(r1_bio, bio); struct md_rdev *rdev = conf->mirrors[mirror].rdev; - bool discard_error; sector_t lo = r1_bio->sector; sector_t hi = r1_bio->sector + r1_bio->sectors; - - discard_error = bio->bi_status && bio_op(bio) == REQ_OP_DISCARD; + bool ignore_error = !raid1_should_handle_error(bio) || + (bio->bi_status && bio_op(bio) == REQ_OP_DISCARD);
/* * 'one mirror IO has finished' event handler: */ - if (bio->bi_status && !discard_error) { + if (bio->bi_status && !ignore_error) { set_bit(WriteErrorSeen, &rdev->flags); if (!test_and_set_bit(WantReplacement, &rdev->flags)) set_bit(MD_RECOVERY_NEEDED, & @@ -509,7 +510,7 @@ static void raid1_end_write_request(struct bio *bio)
/* Maybe we can clear some bad blocks. */ if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors) && - !discard_error) { + !ignore_error) { r1_bio->bios[mirror] = IO_MADE_GOOD; set_bit(R1BIO_MadeGood, &r1_bio->state); } diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 8bb9ec39dc4f..0d475bb2a732 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -398,6 +398,8 @@ static void raid10_end_read_request(struct bio *bio) * wait for the 'master' bio. */ set_bit(R10BIO_Uptodate, &r10_bio->state); + } else if (!raid1_should_handle_error(bio)) { + uptodate = 1; } else { /* If all other devices that store this block have * failed, we want to return the error upwards rather @@ -455,9 +457,8 @@ static void raid10_end_write_request(struct bio *bio) int slot, repl; struct md_rdev *rdev = NULL; struct bio *to_put = NULL; - bool discard_error; - - discard_error = bio->bi_status && bio_op(bio) == REQ_OP_DISCARD; + bool ignore_error = !raid1_should_handle_error(bio) || + (bio->bi_status && bio_op(bio) == REQ_OP_DISCARD);
dev = find_bio_disk(conf, r10_bio, bio, &slot, &repl);
@@ -471,7 +472,7 @@ static void raid10_end_write_request(struct bio *bio) /* * this branch is our 'one mirror IO has finished' event handler: */ - if (bio->bi_status && !discard_error) { + if (bio->bi_status && !ignore_error) { if (repl) /* Never record new bad blocks to replacement, * just fail it. @@ -526,7 +527,7 @@ static void raid10_end_write_request(struct bio *bio) /* Maybe we can clear some bad blocks. */ if (rdev_has_badblock(rdev, r10_bio->devs[slot].addr, r10_bio->sectors) && - !discard_error) { + !ignore_error) { bio_put(bio); if (repl) r10_bio->devs[slot].repl_bio = IO_MADE_GOOD;
From: Zheng Qixing zhengqixing@huawei.com
[ Upstream commit 5fa31c49928139fa948f078b094d80f12ed83f5f ]
RAID layers don't implement proper non-blocking semantics for REQ_NOWAIT, making the flag potentially misleading when propagated to member disks.
This patch clear REQ_NOWAIT from cloned bios in raid1/raid10. Retain original bio's REQ_NOWAIT flag for upper layer error handling.
Maybe we can implement non-blocking I/O handling mechanisms within RAID in future work.
Fixes: 9f346f7d4ea7 ("md/raid1,raid10: don't handle IO error for REQ_RAHEAD and REQ_NOWAIT") Signed-off-by: Zheng Qixing zhengqixing@huawei.com Link: https://lore.kernel.org/linux-raid/20250702102341.1969154-1-zhengqixing@huaw... Signed-off-by: Yu Kuai yukuai3@huawei.com (cherry picked from commit 5fa31c49928139fa948f078b094d80f12ed83f5f) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/md/raid1.c | 3 ++- drivers/md/raid10.c | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 9581c94450a4..772486d70718 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1392,7 +1392,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio, } read_bio = bio_alloc_clone(mirror->rdev->bdev, bio, gfp, &mddev->bio_set); - + read_bio->bi_opf &= ~REQ_NOWAIT; r1_bio->bios[rdisk] = read_bio;
read_bio->bi_iter.bi_sector = r1_bio->sector + @@ -1613,6 +1613,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, wait_for_serialization(rdev, r1_bio); }
+ mbio->bi_opf &= ~REQ_NOWAIT; r1_bio->bios[i] = mbio;
mbio->bi_iter.bi_sector = (r1_bio->sector + rdev->data_offset); diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 0d475bb2a732..6579bbb6a39a 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1218,6 +1218,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio, r10_bio->master_bio = bio; } read_bio = bio_alloc_clone(rdev->bdev, bio, gfp, &mddev->bio_set); + read_bio->bi_opf &= ~REQ_NOWAIT;
r10_bio->devs[slot].bio = read_bio; r10_bio->devs[slot].rdev = rdev; @@ -1248,6 +1249,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio, conf->mirrors[devnum].rdev;
mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO, &mddev->bio_set); + mbio->bi_opf &= ~REQ_NOWAIT; if (replacement) r10_bio->devs[n_copy].repl_bio = mbio; else
From: Ojaswin Mujoo ojaswin@linux.ibm.com
[ Upstream commit 5a02a6204ca37e7c22fbb55a789c503f05e8e89a ]
Define an ext4 wrapper over jbd2_journal_destroy to make sure we have consistent behavior during journal destruction. This will also come useful in the next patch where we add some ext4 specific logic in the destroy path.
Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Baokun Li libaokun1@huawei.com Signed-off-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/c3ba78c5c419757e6d5f2d8ebb4a8ce9d21da86a.1742279837... Signed-off-by: Theodore Ts'o tytso@mit.edu (cherry picked from commit 5a02a6204ca37e7c22fbb55a789c503f05e8e89a) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- fs/ext4/ext4_jbd2.h | 14 ++++++++++++++ fs/ext4/super.c | 16 ++++++---------- 2 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 0c77697d5e90..930778e507cc 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -513,4 +513,18 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) return 1; }
+/* + * Pass journal explicitly as it may not be cached in the sbi->s_journal in some + * cases + */ +static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *journal) +{ + int err = 0; + + err = jbd2_journal_destroy(journal); + sbi->s_journal = NULL; + + return err; +} + #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 722ac723f49b..d0500f19bf51 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1312,8 +1312,7 @@ static void ext4_put_super(struct super_block *sb)
if (sbi->s_journal) { aborted = is_journal_aborted(sbi->s_journal); - err = jbd2_journal_destroy(sbi->s_journal); - sbi->s_journal = NULL; + err = ext4_journal_destroy(sbi, sbi->s_journal); if ((err < 0) && !aborted) { ext4_abort(sb, -err, "Couldn't clean up the journal"); } @@ -4957,8 +4956,7 @@ static int ext4_load_and_init_journal(struct super_block *sb, out: /* flush s_sb_upd_work before destroying the journal. */ flush_work(&sbi->s_sb_upd_work); - jbd2_journal_destroy(sbi->s_journal); - sbi->s_journal = NULL; + ext4_journal_destroy(sbi, sbi->s_journal); return -EINVAL; }
@@ -5649,8 +5647,7 @@ failed_mount8: __maybe_unused if (sbi->s_journal) { /* flush s_sb_upd_work before journal destroy. */ flush_work(&sbi->s_sb_upd_work); - jbd2_journal_destroy(sbi->s_journal); - sbi->s_journal = NULL; + ext4_journal_destroy(sbi, sbi->s_journal); } failed_mount3a: ext4_es_unregister_shrinker(sbi); @@ -5958,7 +5955,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb, return journal;
out_journal: - jbd2_journal_destroy(journal); + ext4_journal_destroy(EXT4_SB(sb), journal); out_bdev: bdev_fput(bdev_file); return ERR_PTR(errno); @@ -6075,8 +6072,7 @@ static int ext4_load_journal(struct super_block *sb, EXT4_SB(sb)->s_journal = journal; err = ext4_clear_journal_err(sb, es); if (err) { - EXT4_SB(sb)->s_journal = NULL; - jbd2_journal_destroy(journal); + ext4_journal_destroy(EXT4_SB(sb), journal); return err; }
@@ -6094,7 +6090,7 @@ static int ext4_load_journal(struct super_block *sb, return 0;
err_out: - jbd2_journal_destroy(journal); + ext4_journal_destroy(EXT4_SB(sb), journal); return err; }
From: Ojaswin Mujoo ojaswin@linux.ibm.com
[ Upstream commit ce2f26e73783b4a7c46a86e3af5b5c8de0971790 ]
Presently we always BUG_ON if trying to start a transaction on a journal marked with JBD2_UNMOUNT, since this should never happen. However, while ltp running stress tests, it was observed that in case of some error handling paths, it is possible for update_super_work to start a transaction after the journal is destroyed eg:
(umount) ext4_kill_sb kill_block_super generic_shutdown_super sync_filesystem /* commits all txns */ evict_inodes /* might start a new txn */ ext4_put_super flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */ jbd2_journal_destroy journal_kill_thread journal->j_flags |= JBD2_UNMOUNT; jbd2_journal_commit_transaction jbd2_journal_get_descriptor_buffer jbd2_journal_bmap ext4_journal_bmap ext4_map_blocks ... ext4_inode_error ext4_handle_error schedule_work(&sbi->s_sb_upd_work)
/* work queue kicks in */ update_super_work jbd2_journal_start start_this_handle BUG_ON(journal->j_flags & JBD2_UNMOUNT)
Hence, introduce a new mount flag to indicate journal is destroying and only do a journaled (and deferred) update of sb if this flag is not set. Otherwise, just fallback to an un-journaled commit.
Further, in the journal destroy path, we have the following sequence:
1. Set mount flag indicating journal is destroying 2. force a commit and wait for it 3. flush pending sb updates
This sequence is important as it ensures that, after this point, there is no sb update that might be journaled so it is safe to update the sb outside the journal. (To avoid race discussed in 2d01ddc86606)
Also, we don't need a similar check in ext4_grp_locked_error since it is only called from mballoc and AFAICT it would be always valid to schedule work here.
Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") Reported-by: Mahesh Kumar maheshkumar657g@gmail.com Signed-off-by: Ojaswin Mujoo ojaswin@linux.ibm.com Reviewed-by: Jan Kara jack@suse.cz Link: https://patch.msgid.link/9613c465d6ff00cd315602f99283d5f24018c3f7.1742279837... Signed-off-by: Theodore Ts'o tytso@mit.edu (cherry picked from commit ce2f26e73783b4a7c46a86e3af5b5c8de0971790) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- fs/ext4/ext4.h | 3 ++- fs/ext4/ext4_jbd2.h | 15 +++++++++++++++ fs/ext4/super.c | 16 ++++++++-------- 3 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index a95525bfb99c..d8120b88fa00 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1823,7 +1823,8 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) */ enum { EXT4_MF_MNTDIR_SAMPLED, - EXT4_MF_FC_INELIGIBLE /* Fast commit ineligible */ + EXT4_MF_FC_INELIGIBLE, /* Fast commit ineligible */ + EXT4_MF_JOURNAL_DESTROY /* Journal is in process of destroying */ };
static inline void ext4_set_mount_flag(struct super_block *sb, int bit) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 930778e507cc..ada46189b086 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -521,6 +521,21 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour { int err = 0;
+ /* + * At this point only two things can be operating on the journal. + * JBD2 thread performing transaction commit and s_sb_upd_work + * issuing sb update through the journal. Once we set + * EXT4_JOURNAL_DESTROY, new ext4_handle_error() calls will not + * queue s_sb_upd_work and ext4_force_commit() makes sure any + * ext4_handle_error() calls from the running transaction commit are + * finished. Hence no new s_sb_upd_work can be queued after we + * flush it here. + */ + ext4_set_mount_flag(sbi->s_sb, EXT4_MF_JOURNAL_DESTROY); + + ext4_force_commit(sbi->s_sb); + flush_work(&sbi->s_sb_upd_work); + err = jbd2_journal_destroy(journal); sbi->s_journal = NULL;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index d0500f19bf51..58d125ad2371 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -719,9 +719,13 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, * In case the fs should keep running, we need to writeout * superblock through the journal. Due to lock ordering * constraints, it may not be safe to do it right here so we - * defer superblock flushing to a workqueue. + * defer superblock flushing to a workqueue. We just need to be + * careful when the journal is already shutting down. If we get + * here in that case, just update the sb directly as the last + * transaction won't commit anyway. */ - if (continue_fs && journal) + if (continue_fs && journal && + !ext4_test_mount_flag(sb, EXT4_MF_JOURNAL_DESTROY)) schedule_work(&EXT4_SB(sb)->s_sb_upd_work); else ext4_commit_super(sb); @@ -1306,7 +1310,6 @@ static void ext4_put_super(struct super_block *sb) ext4_unregister_li_request(sb); ext4_quotas_off(sb, EXT4_MAXQUOTAS);
- flush_work(&sbi->s_sb_upd_work); destroy_workqueue(sbi->rsv_conversion_wq); ext4_release_orphan_info(sb);
@@ -1316,7 +1319,8 @@ static void ext4_put_super(struct super_block *sb) if ((err < 0) && !aborted) { ext4_abort(sb, -err, "Couldn't clean up the journal"); } - } + } else + flush_work(&sbi->s_sb_upd_work);
ext4_es_unregister_shrinker(sbi); timer_shutdown_sync(&sbi->s_err_report); @@ -4954,8 +4958,6 @@ static int ext4_load_and_init_journal(struct super_block *sb, return 0;
out: - /* flush s_sb_upd_work before destroying the journal. */ - flush_work(&sbi->s_sb_upd_work); ext4_journal_destroy(sbi, sbi->s_journal); return -EINVAL; } @@ -5645,8 +5647,6 @@ failed_mount8: __maybe_unused sbi->s_ea_block_cache = NULL;
if (sbi->s_journal) { - /* flush s_sb_upd_work before journal destroy. */ - flush_work(&sbi->s_sb_upd_work); ext4_journal_destroy(sbi, sbi->s_journal); } failed_mount3a:
From: Jens Axboe axboe@kernel.dk
[ Upstream commit fc582cd26e888b0652bc1494f252329453fd3b23 ]
syzbot reports that defer/local task_work adding via msg_ring can hit a request that has been freed:
CPU: 1 UID: 0 PID: 19356 Comm: iou-wrk-19354 Not tainted 6.16.0-rc4-syzkaller-00108-g17bbde2e1716 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xd2/0x2b0 mm/kasan/report.c:521 kasan_report+0x118/0x150 mm/kasan/report.c:634 io_req_local_work_add io_uring/io_uring.c:1184 [inline] __io_req_task_work_add+0x589/0x950 io_uring/io_uring.c:1252 io_msg_remote_post io_uring/msg_ring.c:103 [inline] io_msg_data_remote io_uring/msg_ring.c:133 [inline] __io_msg_ring_data+0x820/0xaa0 io_uring/msg_ring.c:151 io_msg_ring_data io_uring/msg_ring.c:173 [inline] io_msg_ring+0x134/0xa00 io_uring/msg_ring.c:314 __io_issue_sqe+0x17e/0x4b0 io_uring/io_uring.c:1739 io_issue_sqe+0x165/0xfd0 io_uring/io_uring.c:1762 io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1874 io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:642 io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:696 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK>
which is supposed to be safe with how requests are allocated. But msg ring requests alloc and free on their own, and hence must defer freeing to a sane time.
Add an rcu_head and use kfree_rcu() in both spots where requests are freed. Only the one in io_msg_tw_complete() is strictly required as it has been visible on the other ring, but use it consistently in the other spot as well.
This should not cause any other issues outside of KASAN rightfully complaining about it.
Link: https://lore.kernel.org/io-uring/686cd2ea.a00a0220.338033.0007.GAE@google.co... Reported-by: syzbot+54cbbfb4db9145d26fc2@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Fixes: 0617bb500bfa ("io_uring/msg_ring: improve handling of target CQE posting") Signed-off-by: Jens Axboe axboe@kernel.dk (cherry picked from commit fc582cd26e888b0652bc1494f252329453fd3b23) [Harshit: Resolve conflicts due to missing commit: 01ee194d1aba ("io_uring: add support for hybrid IOPOLL") and commit: 69a62e03f896 ("io_uring/msg_ring: don't leave potentially dangling ->tctx pointer") in 6.12.y] Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- include/linux/io_uring_types.h | 2 ++ io_uring/msg_ring.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 5ce332fc6ff5..3b27d9bcf298 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -648,6 +648,8 @@ struct io_kiocb { struct io_task_work io_task_work; /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ struct hlist_node hash_node; + /* for private io_kiocb freeing */ + struct rcu_head rcu_head; /* internal polling, see IORING_FEAT_FAST_POLL */ struct async_poll *apoll; /* opcode allocated if it needs to store data for async defer */ diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c index 35b1b585e9cb..b68e009bce21 100644 --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -82,7 +82,7 @@ static void io_msg_tw_complete(struct io_kiocb *req, struct io_tw_state *ts) spin_unlock(&ctx->msg_lock); } if (req) - kmem_cache_free(req_cachep, req); + kfree_rcu(req, rcu_head); percpu_ref_put(&ctx->refs); }
@@ -91,7 +91,7 @@ static int io_msg_remote_post(struct io_ring_ctx *ctx, struct io_kiocb *req, { req->task = READ_ONCE(ctx->submitter_task); if (!req->task) { - kmem_cache_free(req_cachep, req); + kfree_rcu(req, rcu_head); return -EOWNERDEAD; } req->opcode = IORING_OP_NOP;
On 9/5/25 5:04 AM, Harshit Mogalapalli wrote:
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 5ce332fc6ff5..3b27d9bcf298 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -648,6 +648,8 @@ struct io_kiocb { struct io_task_work io_task_work; /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ struct hlist_node hash_node;
- /* for private io_kiocb freeing */
- struct rcu_head rcu_head; /* internal polling, see IORING_FEAT_FAST_POLL */ struct async_poll *apoll; /* opcode allocated if it needs to store data for async defer */
This should go into a union with hash_node, rather than bloat the struct. That's how it was done upstream, not sure why this one is different?
From: Wen Gong quic_wgong@quicinc.com
[ Upstream commit 933ab187e679e6fbdeea1835ae39efcc59c022d2 ]
Currently when ath11k gets a new channel list, it will be processed according to the following steps: 1. update new channel list to cfg80211 and queue reg_work. 2. cfg80211 handles new channel list during reg_work. 3. update cfg80211's handled channel list to firmware by ath11k_reg_update_chan_list().
But ath11k will immediately execute step 3 after reg_work is just queued. Since step 2 is asynchronous, cfg80211 may not have completed handling the new channel list, which may leading to an out-of-bounds write error: BUG: KASAN: slab-out-of-bounds in ath11k_reg_update_chan_list Call Trace: ath11k_reg_update_chan_list+0xbfe/0xfe0 [ath11k] kfree+0x109/0x3a0 ath11k_regd_update+0x1cf/0x350 [ath11k] ath11k_regd_update_work+0x14/0x20 [ath11k] process_one_work+0xe35/0x14c0
Should ensure step 2 is completely done before executing step 3. Thus Wen raised patch[1]. When flag NL80211_REGDOM_SET_BY_DRIVER is set, cfg80211 will notify ath11k after step 2 is done.
So enable the flag NL80211_REGDOM_SET_BY_DRIVER then cfg80211 will notify ath11k after step 2 is done. At this time, there will be no KASAN bug during the execution of the step 3.
[1] https://patchwork.kernel.org/project/linux-wireless/patch/20230201065313.272...
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Fixes: f45cb6b29cd3 ("wifi: ath11k: avoid deadlock during regulatory update in ath11k_regd_update()") Signed-off-by: Wen Gong quic_wgong@quicinc.com Signed-off-by: Kang Yang quic_kangyang@quicinc.com Reviewed-by: Aditya Kumar Singh quic_adisi@quicinc.com Link: https://patch.msgid.link/20250117061737.1921-2-quic_kangyang@quicinc.com Signed-off-by: Jeff Johnson jeff.johnson@oss.qualcomm.com (cherry picked from commit 933ab187e679e6fbdeea1835ae39efcc59c022d2) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/net/wireless/ath/ath11k/reg.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/reg.c b/drivers/net/wireless/ath/ath11k/reg.c index b0f289784dd3..7bfe47ad62a0 100644 --- a/drivers/net/wireless/ath/ath11k/reg.c +++ b/drivers/net/wireless/ath/ath11k/reg.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: BSD-3-Clause-Clear /* * Copyright (c) 2018-2019 The Linux Foundation. All rights reserved. - * Copyright (c) 2021-2024 Qualcomm Innovation Center, Inc. All rights reserved. + * Copyright (c) 2021-2025 Qualcomm Innovation Center, Inc. All rights reserved. */ #include <linux/rtnetlink.h>
@@ -55,6 +55,19 @@ ath11k_reg_notifier(struct wiphy *wiphy, struct regulatory_request *request) ath11k_dbg(ar->ab, ATH11K_DBG_REG, "Regulatory Notification received for %s\n", wiphy_name(wiphy));
+ if (request->initiator == NL80211_REGDOM_SET_BY_DRIVER) { + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "driver initiated regd update\n"); + if (ar->state != ATH11K_STATE_ON) + return; + + ret = ath11k_reg_update_chan_list(ar, true); + if (ret) + ath11k_warn(ar->ab, "failed to update channel list: %d\n", ret); + + return; + } + /* Currently supporting only General User Hints. Cell base user * hints to be handled later. * Hints from other sources like Core, Beacons are not expected for @@ -293,12 +306,6 @@ int ath11k_regd_update(struct ath11k *ar) if (ret) goto err;
- if (ar->state == ATH11K_STATE_ON) { - ret = ath11k_reg_update_chan_list(ar, true); - if (ret) - goto err; - } - return 0; err: ath11k_warn(ab, "failed to perform regd update : %d\n", ret); @@ -977,6 +984,7 @@ void ath11k_regd_update_work(struct work_struct *work) void ath11k_reg_init(struct ath11k *ar) { ar->hw->wiphy->regulatory_flags = REGULATORY_WIPHY_SELF_MANAGED; + ar->hw->wiphy->flags |= WIPHY_FLAG_NOTIFY_REGDOM_BY_DRIVER; ar->hw->wiphy->reg_notifier = ath11k_reg_notifier; }
From: Wen Gong quic_wgong@quicinc.com
[ Upstream commit 02aae8e2f957adc1b15b6b8055316f8a154ac3f5 ]
With previous patch "wifi: ath11k: move update channel list from update reg worker to reg notifier", ath11k_reg_update_chan_list() will be called during reg_process_self_managed_hint().
reg_process_self_managed_hint() will hold rtnl_lock all the time. But ath11k_reg_update_chan_list() may increase the occupation time of rtnl_lock, because when wait flag is set, wait_for_completion_timeout() will be called during 11d/hw scan.
Should minimize the occupation time of rtnl_lock as much as possible to avoid interfering with rest of the system. So move the update channel list operation to a new worker, so that wait_for_completion_timeout() won't be called and will not increase the occupation time of rtnl_lock.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Signed-off-by: Wen Gong quic_wgong@quicinc.com Co-developed-by: Kang Yang quic_kangyang@quicinc.com Signed-off-by: Kang Yang quic_kangyang@quicinc.com Reviewed-by: Aditya Kumar Singh quic_adisi@quicinc.com Link: https://patch.msgid.link/20250117061737.1921-3-quic_kangyang@quicinc.com Signed-off-by: Jeff Johnson jeff.johnson@oss.qualcomm.com (cherry picked from commit 02aae8e2f957adc1b15b6b8055316f8a154ac3f5) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/net/wireless/ath/ath11k/core.c | 1 + drivers/net/wireless/ath/ath11k/core.h | 5 +- drivers/net/wireless/ath/ath11k/mac.c | 14 +++++ drivers/net/wireless/ath/ath11k/reg.c | 85 ++++++++++++++++++-------- drivers/net/wireless/ath/ath11k/reg.h | 3 +- drivers/net/wireless/ath/ath11k/wmi.h | 1 + 6 files changed, 81 insertions(+), 28 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c index 2ec1771262fd..bb46ef986b2a 100644 --- a/drivers/net/wireless/ath/ath11k/core.c +++ b/drivers/net/wireless/ath/ath11k/core.c @@ -1972,6 +1972,7 @@ void ath11k_core_halt(struct ath11k *ar) ath11k_mac_scan_finish(ar); ath11k_mac_peer_cleanup_all(ar); cancel_delayed_work_sync(&ar->scan.timeout); + cancel_work_sync(&ar->channel_update_work); cancel_work_sync(&ar->regd_update_work); cancel_work_sync(&ab->update_11d_work);
diff --git a/drivers/net/wireless/ath/ath11k/core.h b/drivers/net/wireless/ath/ath11k/core.h index 09fdb7be0e19..0cf49754653a 100644 --- a/drivers/net/wireless/ath/ath11k/core.h +++ b/drivers/net/wireless/ath/ath11k/core.h @@ -689,7 +689,7 @@ struct ath11k { struct mutex conf_mutex; /* protects the radio specific data like debug stats, ppdu_stats_info stats, * vdev_stop_status info, scan data, ath11k_sta info, ath11k_vif info, - * channel context data, survey info, test mode data. + * channel context data, survey info, test mode data, channel_update_queue. */ spinlock_t data_lock;
@@ -747,6 +747,9 @@ struct ath11k { struct completion bss_survey_done;
struct work_struct regd_update_work; + struct work_struct channel_update_work; + /* protected with data_lock */ + struct list_head channel_update_queue;
struct work_struct wmi_mgmt_tx_work; struct sk_buff_head wmi_mgmt_tx_queue; diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c index ddf4ec6b244b..6313c45b3d2a 100644 --- a/drivers/net/wireless/ath/ath11k/mac.c +++ b/drivers/net/wireless/ath/ath11k/mac.c @@ -6289,6 +6289,7 @@ static void ath11k_mac_op_stop(struct ieee80211_hw *hw, bool suspend) { struct ath11k *ar = hw->priv; struct htt_ppdu_stats_info *ppdu_stats, *tmp; + struct scan_chan_list_params *params; int ret;
ath11k_mac_drain_tx(ar); @@ -6304,6 +6305,7 @@ static void ath11k_mac_op_stop(struct ieee80211_hw *hw, bool suspend) mutex_unlock(&ar->conf_mutex);
cancel_delayed_work_sync(&ar->scan.timeout); + cancel_work_sync(&ar->channel_update_work); cancel_work_sync(&ar->regd_update_work); cancel_work_sync(&ar->ab->update_11d_work);
@@ -6313,10 +6315,19 @@ static void ath11k_mac_op_stop(struct ieee80211_hw *hw, bool suspend) }
spin_lock_bh(&ar->data_lock); + list_for_each_entry_safe(ppdu_stats, tmp, &ar->ppdu_stats_info, list) { list_del(&ppdu_stats->list); kfree(ppdu_stats); } + + while ((params = list_first_entry_or_null(&ar->channel_update_queue, + struct scan_chan_list_params, + list))) { + list_del(¶ms->list); + kfree(params); + } + spin_unlock_bh(&ar->data_lock);
rcu_assign_pointer(ar->ab->pdevs_active[ar->pdev_idx], NULL); @@ -10101,6 +10112,7 @@ static const struct wiphy_iftype_ext_capab ath11k_iftypes_ext_capa[] = {
static void __ath11k_mac_unregister(struct ath11k *ar) { + cancel_work_sync(&ar->channel_update_work); cancel_work_sync(&ar->regd_update_work);
ieee80211_unregister_hw(ar->hw); @@ -10500,6 +10512,8 @@ int ath11k_mac_allocate(struct ath11k_base *ab) init_completion(&ar->thermal.wmi_sync);
INIT_DELAYED_WORK(&ar->scan.timeout, ath11k_scan_timeout_work); + INIT_WORK(&ar->channel_update_work, ath11k_regd_update_chan_list_work); + INIT_LIST_HEAD(&ar->channel_update_queue); INIT_WORK(&ar->regd_update_work, ath11k_regd_update_work);
INIT_WORK(&ar->wmi_mgmt_tx_work, ath11k_mgmt_over_wmi_tx_work); diff --git a/drivers/net/wireless/ath/ath11k/reg.c b/drivers/net/wireless/ath/ath11k/reg.c index 7bfe47ad62a0..d62a2014315a 100644 --- a/drivers/net/wireless/ath/ath11k/reg.c +++ b/drivers/net/wireless/ath/ath11k/reg.c @@ -124,32 +124,7 @@ int ath11k_reg_update_chan_list(struct ath11k *ar, bool wait) struct channel_param *ch; enum nl80211_band band; int num_channels = 0; - int i, ret, left; - - if (wait && ar->state_11d != ATH11K_11D_IDLE) { - left = wait_for_completion_timeout(&ar->completed_11d_scan, - ATH11K_SCAN_TIMEOUT_HZ); - if (!left) { - ath11k_dbg(ar->ab, ATH11K_DBG_REG, - "failed to receive 11d scan complete: timed out\n"); - ar->state_11d = ATH11K_11D_IDLE; - } - ath11k_dbg(ar->ab, ATH11K_DBG_REG, - "11d scan wait left time %d\n", left); - } - - if (wait && - (ar->scan.state == ATH11K_SCAN_STARTING || - ar->scan.state == ATH11K_SCAN_RUNNING)) { - left = wait_for_completion_timeout(&ar->scan.completed, - ATH11K_SCAN_TIMEOUT_HZ); - if (!left) - ath11k_dbg(ar->ab, ATH11K_DBG_REG, - "failed to receive hw scan complete: timed out\n"); - - ath11k_dbg(ar->ab, ATH11K_DBG_REG, - "hw scan wait left time %d\n", left); - } + int i, ret = 0;
if (ar->state == ATH11K_STATE_RESTARTING) return 0; @@ -231,6 +206,16 @@ int ath11k_reg_update_chan_list(struct ath11k *ar, bool wait) } }
+ if (wait) { + spin_lock_bh(&ar->data_lock); + list_add_tail(¶ms->list, &ar->channel_update_queue); + spin_unlock_bh(&ar->data_lock); + + queue_work(ar->ab->workqueue, &ar->channel_update_work); + + return 0; + } + ret = ath11k_wmi_send_scan_chan_list_cmd(ar, params); kfree(params);
@@ -811,6 +796,54 @@ ath11k_reg_build_regd(struct ath11k_base *ab, return new_regd; }
+void ath11k_regd_update_chan_list_work(struct work_struct *work) +{ + struct ath11k *ar = container_of(work, struct ath11k, + channel_update_work); + struct scan_chan_list_params *params; + struct list_head local_update_list; + int left; + + INIT_LIST_HEAD(&local_update_list); + + spin_lock_bh(&ar->data_lock); + list_splice_tail_init(&ar->channel_update_queue, &local_update_list); + spin_unlock_bh(&ar->data_lock); + + while ((params = list_first_entry_or_null(&local_update_list, + struct scan_chan_list_params, + list))) { + if (ar->state_11d != ATH11K_11D_IDLE) { + left = wait_for_completion_timeout(&ar->completed_11d_scan, + ATH11K_SCAN_TIMEOUT_HZ); + if (!left) { + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "failed to receive 11d scan complete: timed out\n"); + ar->state_11d = ATH11K_11D_IDLE; + } + + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "reg 11d scan wait left time %d\n", left); + } + + if ((ar->scan.state == ATH11K_SCAN_STARTING || + ar->scan.state == ATH11K_SCAN_RUNNING)) { + left = wait_for_completion_timeout(&ar->scan.completed, + ATH11K_SCAN_TIMEOUT_HZ); + if (!left) + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "failed to receive hw scan complete: timed out\n"); + + ath11k_dbg(ar->ab, ATH11K_DBG_REG, + "reg hw scan wait left time %d\n", left); + } + + ath11k_wmi_send_scan_chan_list_cmd(ar, params); + list_del(¶ms->list); + kfree(params); + } +} + static bool ath11k_reg_is_world_alpha(char *alpha) { if (alpha[0] == '0' && alpha[1] == '0') diff --git a/drivers/net/wireless/ath/ath11k/reg.h b/drivers/net/wireless/ath/ath11k/reg.h index 263ea9061948..72b483594015 100644 --- a/drivers/net/wireless/ath/ath11k/reg.h +++ b/drivers/net/wireless/ath/ath11k/reg.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: BSD-3-Clause-Clear */ /* * Copyright (c) 2019 The Linux Foundation. All rights reserved. - * Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights reserved. + * Copyright (c) 2022-2025 Qualcomm Innovation Center, Inc. All rights reserved. */
#ifndef ATH11K_REG_H @@ -33,6 +33,7 @@ void ath11k_reg_init(struct ath11k *ar); void ath11k_reg_reset_info(struct cur_regulatory_info *reg_info); void ath11k_reg_free(struct ath11k_base *ab); void ath11k_regd_update_work(struct work_struct *work); +void ath11k_regd_update_chan_list_work(struct work_struct *work); struct ieee80211_regdomain * ath11k_reg_build_regd(struct ath11k_base *ab, struct cur_regulatory_info *reg_info, bool intersect, diff --git a/drivers/net/wireless/ath/ath11k/wmi.h b/drivers/net/wireless/ath/ath11k/wmi.h index 8982b909c821..30b4b0c17682 100644 --- a/drivers/net/wireless/ath/ath11k/wmi.h +++ b/drivers/net/wireless/ath/ath11k/wmi.h @@ -3817,6 +3817,7 @@ struct wmi_stop_scan_cmd { };
struct scan_chan_list_params { + struct list_head list; u32 pdev_id; u16 nallchans; struct channel_param ch_param[];
From: Wang Liang wangliang74@huawei.com
[ Upstream commit 0032c99e83b9ce6d5995d65900aa4b6ffb501cce ]
When delete l3s ipvlan:
ip link del link eth0 ipvlan1 type ipvlan mode l3s
This may cause a null pointer dereference:
Call trace: ip_rcv_finish+0x48/0xd0 ip_rcv+0x5c/0x100 __netif_receive_skb_one_core+0x64/0xb0 __netif_receive_skb+0x20/0x80 process_backlog+0xb4/0x204 napi_poll+0xe8/0x294 net_rx_action+0xd8/0x22c __do_softirq+0x12c/0x354
This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process like this:
(CPU1) | (CPU2) l3mdev_l3_rcv() | check dev->priv_flags: | master = skb->dev; | | | ipvlan_l3s_unregister() | set dev->priv_flags | dev->l3mdev_ops = NULL; | visit master->l3mdev_ops |
To avoid this by do not set dev->l3mdev_ops when unregister l3s ipvlan.
Suggested-by: David Ahern dsahern@kernel.org Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes") Signed-off-by: Wang Liang wangliang74@huawei.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250321090353.1170545-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 0032c99e83b9ce6d5995d65900aa4b6ffb501cce) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/net/ipvlan/ipvlan_l3s.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/net/ipvlan/ipvlan_l3s.c b/drivers/net/ipvlan/ipvlan_l3s.c index d5b05e803219..ca35a50bb640 100644 --- a/drivers/net/ipvlan/ipvlan_l3s.c +++ b/drivers/net/ipvlan/ipvlan_l3s.c @@ -224,5 +224,4 @@ void ipvlan_l3s_unregister(struct ipvl_port *port)
dev->priv_flags &= ~IFF_L3MDEV_RX_HANDLER; ipvlan_unregister_nf_hook(read_pnet(&port->pnet)); - dev->l3mdev_ops = NULL; }
From: Su Yue glass.su@suse.com
[ Upstream commit 6130825f34d41718c98a9b1504a79a23e379701e ]
In clustermd, separate write-intent-bitmaps are used for each cluster node:
0 4k 8k 12k ------------------------------------------------------------------- | idle | md super | bm super [0] + bits | | bm bits[0, contd] | bm super[1] + bits | bm bits[1, contd] | | bm super[2] + bits | bm bits [2, contd] | bm super[3] + bits | | bm bits [3, contd] | | |
So in node 1, pg_index in __write_sb_page() could equal to bitmap->storage.file_pages. Then bitmap_limit will be calculated to 0. md_super_write() will be called with 0 size. That means the first 4k sb area of node 1 will never be updated through filemap_write_page(). This bug causes hang of mdadm/clustermd_tests/01r1_Grow_resize.
Here use (pg_index % bitmap->storage.file_pages) to make calculation of bitmap_limit correct.
Fixes: ab99a87542f1 ("md/md-bitmap: fix writing non bitmap pages") Signed-off-by: Su Yue glass.su@suse.com Reviewed-by: Heming Zhao heming.zhao@suse.com Link: https://lore.kernel.org/linux-raid/20250303033918.32136-1-glass.su@suse.com Signed-off-by: Yu Kuai yukuai3@huawei.com (cherry picked from commit 6130825f34d41718c98a9b1504a79a23e379701e) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com --- drivers/md/md-bitmap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index 0da1d0723f88..e7e1833ff04b 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -426,8 +426,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, struct block_device *bdev; struct mddev *mddev = bitmap->mddev; struct bitmap_storage *store = &bitmap->storage; - unsigned int bitmap_limit = (bitmap->storage.file_pages - pg_index) << - PAGE_SHIFT; + unsigned long num_pages = bitmap->storage.file_pages; + unsigned int bitmap_limit = (num_pages - pg_index % num_pages) << PAGE_SHIFT; loff_t sboff, offset = mddev->bitmap_info.offset; sector_t ps = pg_index * PAGE_SIZE / SECTOR_SIZE; unsigned int size = PAGE_SIZE; @@ -436,7 +436,7 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; /* we compare length (page numbers), not page offset. */ - if ((pg_index - store->sb_index) == store->file_pages - 1) { + if ((pg_index - store->sb_index) == num_pages - 1) { unsigned int last_page_size = store->bytes & (PAGE_SIZE - 1);
if (last_page_size == 0)
linux-stable-mirror@lists.linaro.org