The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6abfe107894af7e8ce3a2e120c619d81ee764ad5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122901-exciting-strained-363f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6abfe107894af7e8ce3a2e120c619d81ee764ad5 Mon Sep 17 00:00:00 2001
From: Ye Bin <yebin10(a)huawei.com>
Date: Mon, 3 Nov 2025 09:01:23 +0800
Subject: [PATCH] jbd2: fix the inconsistency between checksum and data in
memory for journal sb
Copying the file system while it is mounted as read-only results in
a mount failure:
[~]# mkfs.ext4 -F /dev/sdc
[~]# mount /dev/sdc -o ro /mnt/test
[~]# dd if=/dev/sdc of=/dev/sda bs=1M
[~]# mount /dev/sda /mnt/test1
[ 1094.849826] JBD2: journal checksum error
[ 1094.850927] EXT4-fs (sda): Could not load journal inode
mount: mount /dev/sda on /mnt/test1 failed: Bad message
The process described above is just an abstracted way I came up with to
reproduce the issue. In the actual scenario, the file system was mounted
read-only and then copied while it was still mounted. It was found that
the mount operation failed. The user intended to verify the data or use
it as a backup, and this action was performed during a version upgrade.
Above issue may happen as follows:
ext4_fill_super
set_journal_csum_feature_set(sb)
if (ext4_has_metadata_csum(sb))
incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3;
if (test_opt(sb, JOURNAL_CHECKSUM)
jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat);
lock_buffer(journal->j_sb_buffer);
sb->s_feature_incompat |= cpu_to_be32(incompat);
//The data in the journal sb was modified, but the checksum was not
updated, so the data remaining in memory has a mismatch between the
data and the checksum.
unlock_buffer(journal->j_sb_buffer);
In this case, the journal sb copied over is in a state where the checksum
and data are inconsistent, so mounting fails.
To solve the above issue, update the checksum in memory after modifying
the journal sb.
Fixes: 4fd5ea43bc11 ("jbd2: checksum journal superblock")
Signed-off-by: Ye Bin <yebin10(a)huawei.com>
Reviewed-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Message-ID: <20251103010123.3753631-1-yebin(a)huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 2fe1786a8f1b..c973162d5b31 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2354,6 +2354,12 @@ int jbd2_journal_set_features(journal_t *journal, unsigned long compat,
sb->s_feature_compat |= cpu_to_be32(compat);
sb->s_feature_ro_compat |= cpu_to_be32(ro);
sb->s_feature_incompat |= cpu_to_be32(incompat);
+ /*
+ * Update the checksum now so that it is valid even for read-only
+ * filesystems where jbd2_write_superblock() doesn't get called.
+ */
+ if (jbd2_journal_has_csum_v2or3(journal))
+ sb->s_checksum = jbd2_superblock_csum(sb);
unlock_buffer(journal->j_sb_buffer);
jbd2_journal_init_transaction_limits(journal);
@@ -2383,9 +2389,17 @@ void jbd2_journal_clear_features(journal_t *journal, unsigned long compat,
sb = journal->j_superblock;
+ lock_buffer(journal->j_sb_buffer);
sb->s_feature_compat &= ~cpu_to_be32(compat);
sb->s_feature_ro_compat &= ~cpu_to_be32(ro);
sb->s_feature_incompat &= ~cpu_to_be32(incompat);
+ /*
+ * Update the checksum now so that it is valid even for read-only
+ * filesystems where jbd2_write_superblock() doesn't get called.
+ */
+ if (jbd2_journal_has_csum_v2or3(journal))
+ sb->s_checksum = jbd2_superblock_csum(sb);
+ unlock_buffer(journal->j_sb_buffer);
jbd2_journal_init_transaction_limits(journal);
}
EXPORT_SYMBOL(jbd2_journal_clear_features);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 6abfe107894af7e8ce3a2e120c619d81ee764ad5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122927-germicide-venus-61d2@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6abfe107894af7e8ce3a2e120c619d81ee764ad5 Mon Sep 17 00:00:00 2001
From: Ye Bin <yebin10(a)huawei.com>
Date: Mon, 3 Nov 2025 09:01:23 +0800
Subject: [PATCH] jbd2: fix the inconsistency between checksum and data in
memory for journal sb
Copying the file system while it is mounted as read-only results in
a mount failure:
[~]# mkfs.ext4 -F /dev/sdc
[~]# mount /dev/sdc -o ro /mnt/test
[~]# dd if=/dev/sdc of=/dev/sda bs=1M
[~]# mount /dev/sda /mnt/test1
[ 1094.849826] JBD2: journal checksum error
[ 1094.850927] EXT4-fs (sda): Could not load journal inode
mount: mount /dev/sda on /mnt/test1 failed: Bad message
The process described above is just an abstracted way I came up with to
reproduce the issue. In the actual scenario, the file system was mounted
read-only and then copied while it was still mounted. It was found that
the mount operation failed. The user intended to verify the data or use
it as a backup, and this action was performed during a version upgrade.
Above issue may happen as follows:
ext4_fill_super
set_journal_csum_feature_set(sb)
if (ext4_has_metadata_csum(sb))
incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3;
if (test_opt(sb, JOURNAL_CHECKSUM)
jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat);
lock_buffer(journal->j_sb_buffer);
sb->s_feature_incompat |= cpu_to_be32(incompat);
//The data in the journal sb was modified, but the checksum was not
updated, so the data remaining in memory has a mismatch between the
data and the checksum.
unlock_buffer(journal->j_sb_buffer);
In this case, the journal sb copied over is in a state where the checksum
and data are inconsistent, so mounting fails.
To solve the above issue, update the checksum in memory after modifying
the journal sb.
Fixes: 4fd5ea43bc11 ("jbd2: checksum journal superblock")
Signed-off-by: Ye Bin <yebin10(a)huawei.com>
Reviewed-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Message-ID: <20251103010123.3753631-1-yebin(a)huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 2fe1786a8f1b..c973162d5b31 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2354,6 +2354,12 @@ int jbd2_journal_set_features(journal_t *journal, unsigned long compat,
sb->s_feature_compat |= cpu_to_be32(compat);
sb->s_feature_ro_compat |= cpu_to_be32(ro);
sb->s_feature_incompat |= cpu_to_be32(incompat);
+ /*
+ * Update the checksum now so that it is valid even for read-only
+ * filesystems where jbd2_write_superblock() doesn't get called.
+ */
+ if (jbd2_journal_has_csum_v2or3(journal))
+ sb->s_checksum = jbd2_superblock_csum(sb);
unlock_buffer(journal->j_sb_buffer);
jbd2_journal_init_transaction_limits(journal);
@@ -2383,9 +2389,17 @@ void jbd2_journal_clear_features(journal_t *journal, unsigned long compat,
sb = journal->j_superblock;
+ lock_buffer(journal->j_sb_buffer);
sb->s_feature_compat &= ~cpu_to_be32(compat);
sb->s_feature_ro_compat &= ~cpu_to_be32(ro);
sb->s_feature_incompat &= ~cpu_to_be32(incompat);
+ /*
+ * Update the checksum now so that it is valid even for read-only
+ * filesystems where jbd2_write_superblock() doesn't get called.
+ */
+ if (jbd2_journal_has_csum_v2or3(journal))
+ sb->s_checksum = jbd2_superblock_csum(sb);
+ unlock_buffer(journal->j_sb_buffer);
jbd2_journal_init_transaction_limits(journal);
}
EXPORT_SYMBOL(jbd2_journal_clear_features);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4cfc7d5a4a01d2133b278cdbb1371fba1b419174
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122907-defense-blanching-5c39@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4cfc7d5a4a01d2133b278cdbb1371fba1b419174 Mon Sep 17 00:00:00 2001
From: Alexey Velichayshiy <a.velichayshiy(a)ispras.ru>
Date: Mon, 17 Nov 2025 12:05:18 +0300
Subject: [PATCH] gfs2: fix freeze error handling
After commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"),
the freeze error handling is broken because gfs2_do_thaw()
overwrites the 'error' variable, causing incorrect processing
of the original freeze error.
Fix this by calling gfs2_do_thaw() when gfs2_lock_fs_check_clean()
fails but ignoring its return value to preserve the original
freeze error for proper reporting.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic")
Cc: stable(a)vger.kernel.org # v6.5+
Signed-off-by: Alexey Velichayshiy <a.velichayshiy(a)ispras.ru>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 644b2d1e7276..54c6f2098f01 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -749,9 +749,7 @@ static int gfs2_freeze_super(struct super_block *sb, enum freeze_holder who,
break;
}
- error = gfs2_do_thaw(sdp, who, freeze_owner);
- if (error)
- goto out;
+ (void)gfs2_do_thaw(sdp, who, freeze_owner);
if (error == -EBUSY)
fs_err(sdp, "waiting for recovery before freeze\n");
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 4cfc7d5a4a01d2133b278cdbb1371fba1b419174
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122907-recount-stoning-1de1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4cfc7d5a4a01d2133b278cdbb1371fba1b419174 Mon Sep 17 00:00:00 2001
From: Alexey Velichayshiy <a.velichayshiy(a)ispras.ru>
Date: Mon, 17 Nov 2025 12:05:18 +0300
Subject: [PATCH] gfs2: fix freeze error handling
After commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"),
the freeze error handling is broken because gfs2_do_thaw()
overwrites the 'error' variable, causing incorrect processing
of the original freeze error.
Fix this by calling gfs2_do_thaw() when gfs2_lock_fs_check_clean()
fails but ignoring its return value to preserve the original
freeze error for proper reporting.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic")
Cc: stable(a)vger.kernel.org # v6.5+
Signed-off-by: Alexey Velichayshiy <a.velichayshiy(a)ispras.ru>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 644b2d1e7276..54c6f2098f01 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -749,9 +749,7 @@ static int gfs2_freeze_super(struct super_block *sb, enum freeze_holder who,
break;
}
- error = gfs2_do_thaw(sdp, who, freeze_owner);
- if (error)
- goto out;
+ (void)gfs2_do_thaw(sdp, who, freeze_owner);
if (error == -EBUSY)
fs_err(sdp, "waiting for recovery before freeze\n");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0185c2292c600993199bc6b1f342ad47a9e8c678
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122956-output-cornmeal-7c59@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0185c2292c600993199bc6b1f342ad47a9e8c678 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 18 Nov 2025 17:08:41 +0100
Subject: [PATCH] btrfs: don't rewrite ret from inode_permission
In our user safe ino resolve ioctl we'll just turn any ret into -EACCES
from inode_permission(). This is redundant, and could potentially be
wrong if we had an ENOMEM in the security layer or some such other
error, so simply return the actual return value.
Note: The patch was taken from v5 of fscrypt patchset
(https://lore.kernel.org/linux-btrfs/cover.1706116485.git.josef@toxicpanda.c…)
which was handled over time by various people: Omar Sandoval, Sweet Tea
Dorminy, Josef Bacik.
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Daniel Vacek <neelx(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 59cef7e376a0..a10b60439718 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1910,10 +1910,8 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
ret = inode_permission(idmap, &temp_inode->vfs_inode,
MAY_READ | MAY_EXEC);
iput(&temp_inode->vfs_inode);
- if (ret) {
- ret = -EACCES;
+ if (ret)
goto out_put;
- }
if (key.offset == upper_limit)
break;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 4cfc7d5a4a01d2133b278cdbb1371fba1b419174
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122906-preshow-nearest-b359@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4cfc7d5a4a01d2133b278cdbb1371fba1b419174 Mon Sep 17 00:00:00 2001
From: Alexey Velichayshiy <a.velichayshiy(a)ispras.ru>
Date: Mon, 17 Nov 2025 12:05:18 +0300
Subject: [PATCH] gfs2: fix freeze error handling
After commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"),
the freeze error handling is broken because gfs2_do_thaw()
overwrites the 'error' variable, causing incorrect processing
of the original freeze error.
Fix this by calling gfs2_do_thaw() when gfs2_lock_fs_check_clean()
fails but ignoring its return value to preserve the original
freeze error for proper reporting.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic")
Cc: stable(a)vger.kernel.org # v6.5+
Signed-off-by: Alexey Velichayshiy <a.velichayshiy(a)ispras.ru>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 644b2d1e7276..54c6f2098f01 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -749,9 +749,7 @@ static int gfs2_freeze_super(struct super_block *sb, enum freeze_holder who,
break;
}
- error = gfs2_do_thaw(sdp, who, freeze_owner);
- if (error)
- goto out;
+ (void)gfs2_do_thaw(sdp, who, freeze_owner);
if (error == -EBUSY)
fs_err(sdp, "waiting for recovery before freeze\n");
When hfs_bnode_create() finds that a node is already hashed (which should
not happen in normal operation), it currently returns the existing node
without incrementing its reference count. This causes a reference count
inconsistency that leads to a kernel panic when the node is later freed
in hfs_bnode_put():
kernel BUG at fs/hfsplus/bnode.c:676!
BUG_ON(!atomic_read(&node->refcnt))
This scenario can occur when hfs_bmap_alloc() attempts to allocate a node
that is already in use (e.g., when node 0's bitmap bit is incorrectly
unset), or due to filesystem corruption.
Returning an existing node from a create path is not normal operation.
Fix this by returning ERR_PTR(-EEXIST) instead of the node when it's
already hashed. This properly signals the error condition to callers,
which already check for IS_ERR() return values.
Reported-by: syzbot+1c8ff72d0cd8a50dfeaa(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=1c8ff72d0cd8a50dfeaa
Link: https://lore.kernel.org/all/784415834694f39902088fa8946850fc1779a318.camel@…
Fixes: 634725a92938 ("[PATCH] hfs: cleanup HFS+ prints")
Signed-off-by: Shardul Bankar <shardul.b(a)mpiricsoftware.com>
---
v3 changes:
- This is posted standalone as discussed in the v2 thread.
v2 changes:
- Implement Slava's suggestion: return ERR_PTR(-EEXIST) for already-hashed nodes.
- Keep the node-0 allocation guard as a minimal, targeted hardening measure.
fs/hfsplus/bnode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 191661af9677..250a226336ea 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -629,7 +629,7 @@ struct hfs_bnode *hfs_bnode_create(struct hfs_btree *tree, u32 num)
if (node) {
pr_crit("new node %u already hashed?\n", num);
WARN_ON(1);
- return node;
+ return ERR_PTR(-EEXIST);
}
node = __hfs_bnode_create(tree, num);
if (!node)
--
2.34.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0185c2292c600993199bc6b1f342ad47a9e8c678
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122956-email-disrupt-f2c8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0185c2292c600993199bc6b1f342ad47a9e8c678 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 18 Nov 2025 17:08:41 +0100
Subject: [PATCH] btrfs: don't rewrite ret from inode_permission
In our user safe ino resolve ioctl we'll just turn any ret into -EACCES
from inode_permission(). This is redundant, and could potentially be
wrong if we had an ENOMEM in the security layer or some such other
error, so simply return the actual return value.
Note: The patch was taken from v5 of fscrypt patchset
(https://lore.kernel.org/linux-btrfs/cover.1706116485.git.josef@toxicpanda.c…)
which was handled over time by various people: Omar Sandoval, Sweet Tea
Dorminy, Josef Bacik.
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Daniel Vacek <neelx(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 59cef7e376a0..a10b60439718 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1910,10 +1910,8 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
ret = inode_permission(idmap, &temp_inode->vfs_inode,
MAY_READ | MAY_EXEC);
iput(&temp_inode->vfs_inode);
- if (ret) {
- ret = -EACCES;
+ if (ret)
goto out_put;
- }
if (key.offset == upper_limit)
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0185c2292c600993199bc6b1f342ad47a9e8c678
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122955-nephew-flick-11f1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0185c2292c600993199bc6b1f342ad47a9e8c678 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 18 Nov 2025 17:08:41 +0100
Subject: [PATCH] btrfs: don't rewrite ret from inode_permission
In our user safe ino resolve ioctl we'll just turn any ret into -EACCES
from inode_permission(). This is redundant, and could potentially be
wrong if we had an ENOMEM in the security layer or some such other
error, so simply return the actual return value.
Note: The patch was taken from v5 of fscrypt patchset
(https://lore.kernel.org/linux-btrfs/cover.1706116485.git.josef@toxicpanda.c…)
which was handled over time by various people: Omar Sandoval, Sweet Tea
Dorminy, Josef Bacik.
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Daniel Vacek <neelx(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 59cef7e376a0..a10b60439718 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1910,10 +1910,8 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
ret = inode_permission(idmap, &temp_inode->vfs_inode,
MAY_READ | MAY_EXEC);
iput(&temp_inode->vfs_inode);
- if (ret) {
- ret = -EACCES;
+ if (ret)
goto out_put;
- }
if (key.offset == upper_limit)
break;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0185c2292c600993199bc6b1f342ad47a9e8c678
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122955-fever-mustiness-049e@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0185c2292c600993199bc6b1f342ad47a9e8c678 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 18 Nov 2025 17:08:41 +0100
Subject: [PATCH] btrfs: don't rewrite ret from inode_permission
In our user safe ino resolve ioctl we'll just turn any ret into -EACCES
from inode_permission(). This is redundant, and could potentially be
wrong if we had an ENOMEM in the security layer or some such other
error, so simply return the actual return value.
Note: The patch was taken from v5 of fscrypt patchset
(https://lore.kernel.org/linux-btrfs/cover.1706116485.git.josef@toxicpanda.c…)
which was handled over time by various people: Omar Sandoval, Sweet Tea
Dorminy, Josef Bacik.
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Daniel Vacek <neelx(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 59cef7e376a0..a10b60439718 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1910,10 +1910,8 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
ret = inode_permission(idmap, &temp_inode->vfs_inode,
MAY_READ | MAY_EXEC);
iput(&temp_inode->vfs_inode);
- if (ret) {
- ret = -EACCES;
+ if (ret)
goto out_put;
- }
if (key.offset == upper_limit)
break;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 0185c2292c600993199bc6b1f342ad47a9e8c678
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122954-glance-used-2782@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0185c2292c600993199bc6b1f342ad47a9e8c678 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 18 Nov 2025 17:08:41 +0100
Subject: [PATCH] btrfs: don't rewrite ret from inode_permission
In our user safe ino resolve ioctl we'll just turn any ret into -EACCES
from inode_permission(). This is redundant, and could potentially be
wrong if we had an ENOMEM in the security layer or some such other
error, so simply return the actual return value.
Note: The patch was taken from v5 of fscrypt patchset
(https://lore.kernel.org/linux-btrfs/cover.1706116485.git.josef@toxicpanda.c…)
which was handled over time by various people: Omar Sandoval, Sweet Tea
Dorminy, Josef Bacik.
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Daniel Vacek <neelx(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 59cef7e376a0..a10b60439718 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1910,10 +1910,8 @@ static int btrfs_search_path_in_tree_user(struct mnt_idmap *idmap,
ret = inode_permission(idmap, &temp_inode->vfs_inode,
MAY_READ | MAY_EXEC);
iput(&temp_inode->vfs_inode);
- if (ret) {
- ret = -EACCES;
+ if (ret)
goto out_put;
- }
if (key.offset == upper_limit)
break;
When compiling the device tree for the Integrator/AP with IM-PD1, the
following warning is observed regarding the display controller node:
arch/arm/boot/dts/arm/integratorap-im-pd1.dts:251.3-14: Warning
(dma_ranges_format):
/bus@c0000000/bus@c0000000/display@1000000:dma-ranges: empty
"dma-ranges" property but its #address-cells (2) differs from
/bus@c0000000/bus@c0000000 (1)
The "dma-ranges" property is intended to describe DMA address
translations for child nodes. However, the only child of the display
controller is a "port" node, which does not perform DMA memory
accesses.
Since the property is not required for the child node and triggers a
warning due to default address cell mismatch, remove it.
Fixes: 7bea67a99430 ("ARM: dts: integrator: Fix DMA ranges")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
---
Changes in v2:
- Switch approach to remove the unused "dma-ranges" property instead of
adding "#address-cells" and "#size-cells".
arch/arm/boot/dts/arm/integratorap-im-pd1.dts | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm/boot/dts/arm/integratorap-im-pd1.dts b/arch/arm/boot/dts/arm/integratorap-im-pd1.dts
index db13e09f2fab..0e568baf03b0 100644
--- a/arch/arm/boot/dts/arm/integratorap-im-pd1.dts
+++ b/arch/arm/boot/dts/arm/integratorap-im-pd1.dts
@@ -248,7 +248,6 @@ display@1000000 {
/* 640x480 16bpp @ 25.175MHz is 36827428 bytes/s */
max-memory-bandwidth = <40000000>;
memory-region = <&impd1_ram>;
- dma-ranges;
port@0 {
#address-cells = <1>;
--
2.52.0.358.g0dd7633a29-goog
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 79f3f9bedd149ea438aaeb0fb6a083637affe205
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122902-untreated-unbiased-5922@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 79f3f9bedd149ea438aaeb0fb6a083637affe205 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 2 Apr 2025 20:07:34 +0200
Subject: [PATCH] sched/eevdf: Fix min_vruntime vs avg_vruntime
Basically, from the constraint that the sum of lag is zero, you can
infer that the 0-lag point is the weighted average of the individual
vruntime, which is what we're trying to compute:
\Sum w_i * v_i
avg = --------------
\Sum w_i
Now, since vruntime takes the whole u64 (worse, it wraps), this
multiplication term in the numerator is not something we can compute;
instead we do the min_vruntime (v0 henceforth) thing like:
v_i = (v_i - v0) + v0
This does two things:
- it keeps the key: (v_i - v0) 'small';
- it creates a relative 0-point in the modular space.
If you do that subtitution and work it all out, you end up with:
\Sum w_i * (v_i - v0)
avg = --------------------- + v0
\Sum w_i
Since you cannot very well track a ratio like that (and not suffer
terrible numerical problems) we simpy track the numerator and
denominator individually and only perform the division when strictly
needed.
Notably, the numerator lives in cfs_rq->avg_vruntime and the denominator
lives in cfs_rq->avg_load.
The one extra 'funny' is that these numbers track the entities in the
tree, and current is typically outside of the tree, so avg_vruntime()
adds current when needed before doing the division.
(vruntime_eligible() elides the division by cross-wise multiplication)
Anyway, as mentioned above, we currently use the CFS era min_vruntime
for this purpose. However, this thing can only move forward, while the
above avg can in fact move backward (when a non-eligible task leaves,
the average becomes smaller), this can cause trouble when through
happenstance (or construction) these values drift far enough apart to
wreck the game.
Replace cfs_rq::min_vruntime with cfs_rq::zero_vruntime which is kept
near/at avg_vruntime, following its motion.
The down-side is that this requires computing the avg more often.
Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Reported-by: Zicheng Qu <quzicheng(a)huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://patch.msgid.link/20251106111741.GC4068168@noisy.programming.kicks-a…
Cc: stable(a)vger.kernel.org
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 02e16b70a790..41caa22e0680 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -796,7 +796,7 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
- s64 left_vruntime = -1, min_vruntime, right_vruntime = -1, left_deadline = -1, spread;
+ s64 left_vruntime = -1, zero_vruntime, right_vruntime = -1, left_deadline = -1, spread;
struct sched_entity *last, *first, *root;
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
@@ -819,15 +819,15 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
last = __pick_last_entity(cfs_rq);
if (last)
right_vruntime = last->vruntime;
- min_vruntime = cfs_rq->min_vruntime;
+ zero_vruntime = cfs_rq->zero_vruntime;
raw_spin_rq_unlock_irqrestore(rq, flags);
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline",
SPLIT_NS(left_deadline));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_vruntime",
SPLIT_NS(left_vruntime));
- SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime",
- SPLIT_NS(min_vruntime));
+ SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "zero_vruntime",
+ SPLIT_NS(zero_vruntime));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime",
SPLIT_NS(avg_vruntime(cfs_rq)));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime",
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4a11a832d63e..8d971d48669f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -554,7 +554,7 @@ static inline bool entity_before(const struct sched_entity *a,
static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- return (s64)(se->vruntime - cfs_rq->min_vruntime);
+ return (s64)(se->vruntime - cfs_rq->zero_vruntime);
}
#define __node_2_se(node) \
@@ -606,13 +606,13 @@ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
*
* Which we track using:
*
- * v0 := cfs_rq->min_vruntime
+ * v0 := cfs_rq->zero_vruntime
* \Sum (v_i - v0) * w_i := cfs_rq->avg_vruntime
* \Sum w_i := cfs_rq->avg_load
*
- * Since min_vruntime is a monotonic increasing variable that closely tracks
- * the per-task service, these deltas: (v_i - v), will be in the order of the
- * maximal (virtual) lag induced in the system due to quantisation.
+ * Since zero_vruntime closely tracks the per-task service, these
+ * deltas: (v_i - v), will be in the order of the maximal (virtual) lag
+ * induced in the system due to quantisation.
*
* Also, we use scale_load_down() to reduce the size.
*
@@ -671,7 +671,7 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
avg = div_s64(avg, load);
}
- return cfs_rq->min_vruntime + avg;
+ return cfs_rq->zero_vruntime + avg;
}
/*
@@ -732,7 +732,7 @@ static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime)
load += weight;
}
- return avg >= (s64)(vruntime - cfs_rq->min_vruntime) * load;
+ return avg >= (s64)(vruntime - cfs_rq->zero_vruntime) * load;
}
int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -740,42 +740,14 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
return vruntime_eligible(cfs_rq, se->vruntime);
}
-static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
+static void update_zero_vruntime(struct cfs_rq *cfs_rq)
{
- u64 min_vruntime = cfs_rq->min_vruntime;
- /*
- * open coded max_vruntime() to allow updating avg_vruntime
- */
- s64 delta = (s64)(vruntime - min_vruntime);
- if (delta > 0) {
- avg_vruntime_update(cfs_rq, delta);
- min_vruntime = vruntime;
- }
- return min_vruntime;
-}
+ u64 vruntime = avg_vruntime(cfs_rq);
+ s64 delta = (s64)(vruntime - cfs_rq->zero_vruntime);
-static void update_min_vruntime(struct cfs_rq *cfs_rq)
-{
- struct sched_entity *se = __pick_root_entity(cfs_rq);
- struct sched_entity *curr = cfs_rq->curr;
- u64 vruntime = cfs_rq->min_vruntime;
+ avg_vruntime_update(cfs_rq, delta);
- if (curr) {
- if (curr->on_rq)
- vruntime = curr->vruntime;
- else
- curr = NULL;
- }
-
- if (se) {
- if (!curr)
- vruntime = se->min_vruntime;
- else
- vruntime = min_vruntime(vruntime, se->min_vruntime);
- }
-
- /* ensure we never gain time by being placed backwards. */
- cfs_rq->min_vruntime = __update_min_vruntime(cfs_rq, vruntime);
+ cfs_rq->zero_vruntime = vruntime;
}
static inline u64 cfs_rq_min_slice(struct cfs_rq *cfs_rq)
@@ -848,6 +820,7 @@ RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity,
static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
avg_vruntime_add(cfs_rq, se);
+ update_zero_vruntime(cfs_rq);
se->min_vruntime = se->vruntime;
se->min_slice = se->slice;
rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline,
@@ -859,6 +832,7 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
rb_erase_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline,
&min_vruntime_cb);
avg_vruntime_sub(cfs_rq, se);
+ update_zero_vruntime(cfs_rq);
}
struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq)
@@ -1226,7 +1200,6 @@ static void update_curr(struct cfs_rq *cfs_rq)
curr->vruntime += calc_delta_fair(delta_exec, curr);
resched = update_deadline(cfs_rq, curr);
- update_min_vruntime(cfs_rq);
if (entity_is_task(curr)) {
/*
@@ -3808,15 +3781,6 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
if (!curr)
__enqueue_entity(cfs_rq, se);
cfs_rq->nr_queued++;
-
- /*
- * The entity's vruntime has been adjusted, so let's check
- * whether the rq-wide min_vruntime needs updated too. Since
- * the calculations above require stable min_vruntime rather
- * than up-to-date one, we do the update at the end of the
- * reweight process.
- */
- update_min_vruntime(cfs_rq);
}
}
@@ -5429,15 +5393,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
update_cfs_group(se);
- /*
- * Now advance min_vruntime if @se was the entity holding it back,
- * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
- * put back on, and if we advance min_vruntime, we'll be placed back
- * further than we started -- i.e. we'll be penalized.
- */
- if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
- update_min_vruntime(cfs_rq);
-
if (flags & DEQUEUE_DELAYED)
finish_delayed_dequeue_entity(se);
@@ -9015,7 +8970,6 @@ static void yield_task_fair(struct rq *rq)
if (entity_eligible(cfs_rq, se)) {
se->vruntime = se->deadline;
se->deadline += calc_delta_fair(se->slice, se);
- update_min_vruntime(cfs_rq);
}
}
@@ -13078,23 +13032,6 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* Which shows that S and s_i transform alike (which makes perfect sense
* given that S is basically the (weighted) average of s_i).
*
- * Then:
- *
- * x -> s_min := min{s_i} (8)
- *
- * to obtain:
- *
- * \Sum_i w_i (s_i - s_min)
- * S = s_min + ------------------------ (9)
- * \Sum_i w_i
- *
- * Which already looks familiar, and is the basis for our current
- * approximation:
- *
- * S ~= s_min (10)
- *
- * Now, obviously, (10) is absolute crap :-), but it sorta works.
- *
* So the thing to remember is that the above is strictly UP. It is
* possible to generalize to multiple runqueues -- however it gets really
* yuck when you have to add affinity support, as illustrated by our very
@@ -13116,23 +13053,23 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* Let, for our runqueue 'k':
*
* T_k = \Sum_i w_i s_i
- * W_k = \Sum_i w_i ; for all i of k (11)
+ * W_k = \Sum_i w_i ; for all i of k (8)
*
* Then we can write (6) like:
*
* T_k
- * S_k = --- (12)
+ * S_k = --- (9)
* W_k
*
* From which immediately follows that:
*
* T_k + T_l
- * S_k+l = --------- (13)
+ * S_k+l = --------- (10)
* W_k + W_l
*
* On which we can define a combined lag:
*
- * lag_k+l(i) := S_k+l - s_i (14)
+ * lag_k+l(i) := S_k+l - s_i (11)
*
* And that gives us the tools to compare tasks across a combined runqueue.
*
@@ -13143,7 +13080,7 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* using (7); this only requires storing single 'time'-stamps.
*
* b) when comparing tasks between 2 runqueues of which one is forced-idle,
- * compare the combined lag, per (14).
+ * compare the combined lag, per (11).
*
* Now, of course cgroups (I so hate them) make this more interesting in
* that a) seems to suggest we need to iterate all cgroup on a CPU at such
@@ -13191,12 +13128,11 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* every tick. This limits the observed divergence due to the work
* conservancy.
*
- * On top of that, we can improve upon things by moving away from our
- * horrible (10) hack and moving to (9) and employing (13) here.
+ * On top of that, we can improve upon things by employing (10) here.
*/
/*
- * se_fi_update - Update the cfs_rq->min_vruntime_fi in a CFS hierarchy if needed.
+ * se_fi_update - Update the cfs_rq->zero_vruntime_fi in a CFS hierarchy if needed.
*/
static void se_fi_update(const struct sched_entity *se, unsigned int fi_seq,
bool forceidle)
@@ -13210,7 +13146,7 @@ static void se_fi_update(const struct sched_entity *se, unsigned int fi_seq,
cfs_rq->forceidle_seq = fi_seq;
}
- cfs_rq->min_vruntime_fi = cfs_rq->min_vruntime;
+ cfs_rq->zero_vruntime_fi = cfs_rq->zero_vruntime;
}
}
@@ -13263,11 +13199,11 @@ bool cfs_prio_less(const struct task_struct *a, const struct task_struct *b,
/*
* Find delta after normalizing se's vruntime with its cfs_rq's
- * min_vruntime_fi, which would have been updated in prior calls
+ * zero_vruntime_fi, which would have been updated in prior calls
* to se_fi_update().
*/
delta = (s64)(sea->vruntime - seb->vruntime) +
- (s64)(cfs_rqb->min_vruntime_fi - cfs_rqa->min_vruntime_fi);
+ (s64)(cfs_rqb->zero_vruntime_fi - cfs_rqa->zero_vruntime_fi);
return delta > 0;
}
@@ -13513,7 +13449,7 @@ static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
void init_cfs_rq(struct cfs_rq *cfs_rq)
{
cfs_rq->tasks_timeline = RB_ROOT_CACHED;
- cfs_rq->min_vruntime = (u64)(-(1LL << 20));
+ cfs_rq->zero_vruntime = (u64)(-(1LL << 20));
raw_spin_lock_init(&cfs_rq->removed.lock);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 82e74e8ca2ea..5a3cf81c27be 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -681,10 +681,10 @@ struct cfs_rq {
s64 avg_vruntime;
u64 avg_load;
- u64 min_vruntime;
+ u64 zero_vruntime;
#ifdef CONFIG_SCHED_CORE
unsigned int forceidle_seq;
- u64 min_vruntime_fi;
+ u64 zero_vruntime_fi;
#endif
struct rb_root_cached tasks_timeline;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 4012d78562193ef5eb613bad4b0c0fa187637cfe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122916-blimp-ladle-4239@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4012d78562193ef5eb613bad4b0c0fa187637cfe Mon Sep 17 00:00:00 2001
From: Junbeom Yeom <junbeom.yeom(a)samsung.com>
Date: Fri, 19 Dec 2025 21:40:31 +0900
Subject: [PATCH] erofs: fix unexpected EIO under memory pressure
erofs readahead could fail with ENOMEM under the memory pressure because
it tries to alloc_page with GFP_NOWAIT | GFP_NORETRY, while GFP_KERNEL
for a regular read. And if readahead fails (with non-uptodate folios),
the original request will then fall back to synchronous read, and
`.read_folio()` should return appropriate errnos.
However, in scenarios where readahead and read operations compete,
read operation could return an unintended EIO because of an incorrect
error propagation.
To resolve this, this patch modifies the behavior so that, when the
PCL is for read(which means pcl.besteffort is true), it attempts actual
decompression instead of propagating the privios error except initial EIO.
- Page size: 4K
- The original size of FileA: 16K
- Compress-ratio per PCL: 50% (Uncompressed 8K -> Compressed 4K)
[page0, page1] [page2, page3]
[PCL0]---------[PCL1]
- functions declaration:
. pread(fd, buf, count, offset)
. readahead(fd, offset, count)
- Thread A tries to read the last 4K
- Thread B tries to do readahead 8K from 4K
- RA, besteffort == false
- R, besteffort == true
<process A> <process B>
pread(FileA, buf, 4K, 12K)
do readahead(page3) // failed with ENOMEM
wait_lock(page3)
if (!uptodate(page3))
goto do_read
readahead(FileA, 4K, 8K)
// Here create PCL-chain like below:
// [null, page1] [page2, null]
// [PCL0:RA]-----[PCL1:RA]
...
do read(page3) // found [PCL1:RA] and add page3 into it,
// and then, change PCL1 from RA to R
...
// Now, PCL-chain is as below:
// [null, page1] [page2, page3]
// [PCL0:RA]-----[PCL1:R]
// try to decompress PCL-chain...
z_erofs_decompress_queue
err = 0;
// failed with ENOMEM, so page 1
// only for RA will not be uptodated.
// it's okay.
err = decompress([PCL0:RA], err)
// However, ENOMEM propagated to next
// PCL, even though PCL is not only
// for RA but also for R. As a result,
// it just failed with ENOMEM without
// trying any decompression, so page2
// and page3 will not be uptodated.
** BUG HERE ** --> err = decompress([PCL1:R], err)
return err as ENOMEM
...
wait_lock(page3)
if (!uptodate(page3))
return EIO <-- Return an unexpected EIO!
...
Fixes: 2349d2fa02db ("erofs: sunset unneeded NOFAILs")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jaewook Kim <jw5454.kim(a)samsung.com>
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Signed-off-by: Junbeom Yeom <junbeom.yeom(a)samsung.com>
Reviewed-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 65da21504632..3d31f7840ca0 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1262,7 +1262,7 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_backend *be, bool *overlapped)
return err;
}
-static int z_erofs_decompress_pcluster(struct z_erofs_backend *be, int err)
+static int z_erofs_decompress_pcluster(struct z_erofs_backend *be, bool eio)
{
struct erofs_sb_info *const sbi = EROFS_SB(be->sb);
struct z_erofs_pcluster *pcl = be->pcl;
@@ -1270,7 +1270,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_backend *be, int err)
const struct z_erofs_decompressor *alg =
z_erofs_decomp[pcl->algorithmformat];
bool try_free = true;
- int i, j, jtop, err2;
+ int i, j, jtop, err2, err = eio ? -EIO : 0;
struct page *page;
bool overlapped;
const char *reason;
@@ -1413,12 +1413,12 @@ static int z_erofs_decompress_queue(const struct z_erofs_decompressqueue *io,
.pcl = io->head,
};
struct z_erofs_pcluster *next;
- int err = io->eio ? -EIO : 0;
+ int err = 0;
for (; be.pcl != Z_EROFS_PCLUSTER_TAIL; be.pcl = next) {
DBG_BUGON(!be.pcl);
next = READ_ONCE(be.pcl->next);
- err = z_erofs_decompress_pcluster(&be, err) ?: err;
+ err = z_erofs_decompress_pcluster(&be, io->eio) ?: err;
}
return err;
}
Hi,
syzbot reported a warning and crash when mounting a corrupted HFS+ image where
the on-disk B-tree bitmap has node 0 (header node) marked free. In that case
hfs_bmap_alloc() can try to allocate node 0 and reach hfs_bnode_create() with
an already-hashed node number.
Patch 1 prevents allocating the reserved header node (node 0) even if the bitmap
is corrupted.
Patch 2 follows Slava's review suggestion and changes the "already hashed" path
in hfs_bnode_create() to return ERR_PTR(-EEXIST) instead of returning the existing
node pointer, so we don't continue in a non-"business as usual" situation.
v2 changes:
- Implement Slava's suggestion: return ERR_PTR(-EEXIST) for already-hashed nodes.
- Keep the node-0 allocation guard as a minimal, targeted hardening measure.
Reported-by: syzbot+1c8ff72d0cd8a50dfeaa(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=1c8ff72d0cd8a50dfeaa
Link: https://lore.kernel.org/all/20251213233215.368558-1-shardul.b@mpiricsoftwar…
Shardul Bankar (2):
hfsplus: skip node 0 in hfs_bmap_alloc
hfsplus: return error when node already exists in hfs_bnode_create
fs/hfsplus/bnode.c | 2 +-
fs/hfsplus/btree.c | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
--
2.34.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 1dd6c84f1c544e552848a8968599220bd464e338
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122909-remover-casino-d84a@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1dd6c84f1c544e552848a8968599220bd464e338 Mon Sep 17 00:00:00 2001
From: Zqiang <qiang.zhang(a)linux.dev>
Date: Mon, 1 Dec 2025 19:25:40 +0800
Subject: [PATCH] sched_ext: Fix incorrect sched_class settings for per-cpu
migration tasks
When loading the ebpf scheduler, the tasks in the scx_tasks list will
be traversed and invoke __setscheduler_class() to get new sched_class.
however, this would also incorrectly set the per-cpu migration
task's->sched_class to rt_sched_class, even after unload, the per-cpu
migration task's->sched_class remains sched_rt_class.
The log for this issue is as follows:
./scx_rustland --stats 1
[ 199.245639][ T630] sched_ext: "rustland" does not implement cgroup cpu.weight
[ 199.269213][ T630] sched_ext: BPF scheduler "rustland" enabled
04:25:09 [INFO] RustLand scheduler attached
bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0
sched_ext: BPF scheduler "rustland" disabled (unregistered from user space)
EXIT: unregistered from user space
04:25:21 [INFO] Unregister RustLand scheduler
bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0
This commit therefore generate a new scx_setscheduler_class() and
add check for stop_sched_class to replace __setscheduler_class().
Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable(a)vger.kernel.org # v6.12+
Signed-off-by: Zqiang <qiang.zhang(a)linux.dev>
Reviewed-by: Andrea Righi <arighi(a)nvidia.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index dca9ca0c1854..b563b8c3fd24 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -248,6 +248,14 @@ static struct scx_dispatch_q *find_user_dsq(struct scx_sched *sch, u64 dsq_id)
return rhashtable_lookup(&sch->dsq_hash, &dsq_id, dsq_hash_params);
}
+static const struct sched_class *scx_setscheduler_class(struct task_struct *p)
+{
+ if (p->sched_class == &stop_sched_class)
+ return &stop_sched_class;
+
+ return __setscheduler_class(p->policy, p->prio);
+}
+
/*
* scx_kf_mask enforcement. Some kfuncs can only be called from specific SCX
* ops. When invoking SCX ops, SCX_CALL_OP[_RET]() should be used to indicate
@@ -4241,8 +4249,7 @@ static void scx_disable_workfn(struct kthread_work *work)
while ((p = scx_task_iter_next_locked(&sti))) {
unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK;
const struct sched_class *old_class = p->sched_class;
- const struct sched_class *new_class =
- __setscheduler_class(p->policy, p->prio);
+ const struct sched_class *new_class = scx_setscheduler_class(p);
update_rq_clock(task_rq(p));
@@ -5042,8 +5049,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
while ((p = scx_task_iter_next_locked(&sti))) {
unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
const struct sched_class *old_class = p->sched_class;
- const struct sched_class *new_class =
- __setscheduler_class(p->policy, p->prio);
+ const struct sched_class *new_class = scx_setscheduler_class(p);
if (scx_get_task_state(p) != SCX_TASK_READY)
continue;
The patch below does not apply to the 6.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.18.y
git checkout FETCH_HEAD
git cherry-pick -x 1dd6c84f1c544e552848a8968599220bd464e338
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122908-magician-unlaced-bef4@gregkh' --subject-prefix 'PATCH 6.18.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1dd6c84f1c544e552848a8968599220bd464e338 Mon Sep 17 00:00:00 2001
From: Zqiang <qiang.zhang(a)linux.dev>
Date: Mon, 1 Dec 2025 19:25:40 +0800
Subject: [PATCH] sched_ext: Fix incorrect sched_class settings for per-cpu
migration tasks
When loading the ebpf scheduler, the tasks in the scx_tasks list will
be traversed and invoke __setscheduler_class() to get new sched_class.
however, this would also incorrectly set the per-cpu migration
task's->sched_class to rt_sched_class, even after unload, the per-cpu
migration task's->sched_class remains sched_rt_class.
The log for this issue is as follows:
./scx_rustland --stats 1
[ 199.245639][ T630] sched_ext: "rustland" does not implement cgroup cpu.weight
[ 199.269213][ T630] sched_ext: BPF scheduler "rustland" enabled
04:25:09 [INFO] RustLand scheduler attached
bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0
sched_ext: BPF scheduler "rustland" disabled (unregistered from user space)
EXIT: unregistered from user space
04:25:21 [INFO] Unregister RustLand scheduler
bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0
This commit therefore generate a new scx_setscheduler_class() and
add check for stop_sched_class to replace __setscheduler_class().
Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable(a)vger.kernel.org # v6.12+
Signed-off-by: Zqiang <qiang.zhang(a)linux.dev>
Reviewed-by: Andrea Righi <arighi(a)nvidia.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index dca9ca0c1854..b563b8c3fd24 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -248,6 +248,14 @@ static struct scx_dispatch_q *find_user_dsq(struct scx_sched *sch, u64 dsq_id)
return rhashtable_lookup(&sch->dsq_hash, &dsq_id, dsq_hash_params);
}
+static const struct sched_class *scx_setscheduler_class(struct task_struct *p)
+{
+ if (p->sched_class == &stop_sched_class)
+ return &stop_sched_class;
+
+ return __setscheduler_class(p->policy, p->prio);
+}
+
/*
* scx_kf_mask enforcement. Some kfuncs can only be called from specific SCX
* ops. When invoking SCX ops, SCX_CALL_OP[_RET]() should be used to indicate
@@ -4241,8 +4249,7 @@ static void scx_disable_workfn(struct kthread_work *work)
while ((p = scx_task_iter_next_locked(&sti))) {
unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK;
const struct sched_class *old_class = p->sched_class;
- const struct sched_class *new_class =
- __setscheduler_class(p->policy, p->prio);
+ const struct sched_class *new_class = scx_setscheduler_class(p);
update_rq_clock(task_rq(p));
@@ -5042,8 +5049,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
while ((p = scx_task_iter_next_locked(&sti))) {
unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
const struct sched_class *old_class = p->sched_class;
- const struct sched_class *new_class =
- __setscheduler_class(p->policy, p->prio);
+ const struct sched_class *new_class = scx_setscheduler_class(p);
if (scx_get_task_state(p) != SCX_TASK_READY)
continue;
The patch below does not apply to the 6.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.18.y
git checkout FETCH_HEAD
git cherry-pick -x 79f3f9bedd149ea438aaeb0fb6a083637affe205
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122900-pug-tartly-0d84@gregkh' --subject-prefix 'PATCH 6.18.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 79f3f9bedd149ea438aaeb0fb6a083637affe205 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 2 Apr 2025 20:07:34 +0200
Subject: [PATCH] sched/eevdf: Fix min_vruntime vs avg_vruntime
Basically, from the constraint that the sum of lag is zero, you can
infer that the 0-lag point is the weighted average of the individual
vruntime, which is what we're trying to compute:
\Sum w_i * v_i
avg = --------------
\Sum w_i
Now, since vruntime takes the whole u64 (worse, it wraps), this
multiplication term in the numerator is not something we can compute;
instead we do the min_vruntime (v0 henceforth) thing like:
v_i = (v_i - v0) + v0
This does two things:
- it keeps the key: (v_i - v0) 'small';
- it creates a relative 0-point in the modular space.
If you do that subtitution and work it all out, you end up with:
\Sum w_i * (v_i - v0)
avg = --------------------- + v0
\Sum w_i
Since you cannot very well track a ratio like that (and not suffer
terrible numerical problems) we simpy track the numerator and
denominator individually and only perform the division when strictly
needed.
Notably, the numerator lives in cfs_rq->avg_vruntime and the denominator
lives in cfs_rq->avg_load.
The one extra 'funny' is that these numbers track the entities in the
tree, and current is typically outside of the tree, so avg_vruntime()
adds current when needed before doing the division.
(vruntime_eligible() elides the division by cross-wise multiplication)
Anyway, as mentioned above, we currently use the CFS era min_vruntime
for this purpose. However, this thing can only move forward, while the
above avg can in fact move backward (when a non-eligible task leaves,
the average becomes smaller), this can cause trouble when through
happenstance (or construction) these values drift far enough apart to
wreck the game.
Replace cfs_rq::min_vruntime with cfs_rq::zero_vruntime which is kept
near/at avg_vruntime, following its motion.
The down-side is that this requires computing the avg more often.
Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Reported-by: Zicheng Qu <quzicheng(a)huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://patch.msgid.link/20251106111741.GC4068168@noisy.programming.kicks-a…
Cc: stable(a)vger.kernel.org
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 02e16b70a790..41caa22e0680 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -796,7 +796,7 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
- s64 left_vruntime = -1, min_vruntime, right_vruntime = -1, left_deadline = -1, spread;
+ s64 left_vruntime = -1, zero_vruntime, right_vruntime = -1, left_deadline = -1, spread;
struct sched_entity *last, *first, *root;
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
@@ -819,15 +819,15 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
last = __pick_last_entity(cfs_rq);
if (last)
right_vruntime = last->vruntime;
- min_vruntime = cfs_rq->min_vruntime;
+ zero_vruntime = cfs_rq->zero_vruntime;
raw_spin_rq_unlock_irqrestore(rq, flags);
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline",
SPLIT_NS(left_deadline));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_vruntime",
SPLIT_NS(left_vruntime));
- SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime",
- SPLIT_NS(min_vruntime));
+ SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "zero_vruntime",
+ SPLIT_NS(zero_vruntime));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime",
SPLIT_NS(avg_vruntime(cfs_rq)));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime",
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4a11a832d63e..8d971d48669f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -554,7 +554,7 @@ static inline bool entity_before(const struct sched_entity *a,
static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- return (s64)(se->vruntime - cfs_rq->min_vruntime);
+ return (s64)(se->vruntime - cfs_rq->zero_vruntime);
}
#define __node_2_se(node) \
@@ -606,13 +606,13 @@ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
*
* Which we track using:
*
- * v0 := cfs_rq->min_vruntime
+ * v0 := cfs_rq->zero_vruntime
* \Sum (v_i - v0) * w_i := cfs_rq->avg_vruntime
* \Sum w_i := cfs_rq->avg_load
*
- * Since min_vruntime is a monotonic increasing variable that closely tracks
- * the per-task service, these deltas: (v_i - v), will be in the order of the
- * maximal (virtual) lag induced in the system due to quantisation.
+ * Since zero_vruntime closely tracks the per-task service, these
+ * deltas: (v_i - v), will be in the order of the maximal (virtual) lag
+ * induced in the system due to quantisation.
*
* Also, we use scale_load_down() to reduce the size.
*
@@ -671,7 +671,7 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
avg = div_s64(avg, load);
}
- return cfs_rq->min_vruntime + avg;
+ return cfs_rq->zero_vruntime + avg;
}
/*
@@ -732,7 +732,7 @@ static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime)
load += weight;
}
- return avg >= (s64)(vruntime - cfs_rq->min_vruntime) * load;
+ return avg >= (s64)(vruntime - cfs_rq->zero_vruntime) * load;
}
int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -740,42 +740,14 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
return vruntime_eligible(cfs_rq, se->vruntime);
}
-static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
+static void update_zero_vruntime(struct cfs_rq *cfs_rq)
{
- u64 min_vruntime = cfs_rq->min_vruntime;
- /*
- * open coded max_vruntime() to allow updating avg_vruntime
- */
- s64 delta = (s64)(vruntime - min_vruntime);
- if (delta > 0) {
- avg_vruntime_update(cfs_rq, delta);
- min_vruntime = vruntime;
- }
- return min_vruntime;
-}
+ u64 vruntime = avg_vruntime(cfs_rq);
+ s64 delta = (s64)(vruntime - cfs_rq->zero_vruntime);
-static void update_min_vruntime(struct cfs_rq *cfs_rq)
-{
- struct sched_entity *se = __pick_root_entity(cfs_rq);
- struct sched_entity *curr = cfs_rq->curr;
- u64 vruntime = cfs_rq->min_vruntime;
+ avg_vruntime_update(cfs_rq, delta);
- if (curr) {
- if (curr->on_rq)
- vruntime = curr->vruntime;
- else
- curr = NULL;
- }
-
- if (se) {
- if (!curr)
- vruntime = se->min_vruntime;
- else
- vruntime = min_vruntime(vruntime, se->min_vruntime);
- }
-
- /* ensure we never gain time by being placed backwards. */
- cfs_rq->min_vruntime = __update_min_vruntime(cfs_rq, vruntime);
+ cfs_rq->zero_vruntime = vruntime;
}
static inline u64 cfs_rq_min_slice(struct cfs_rq *cfs_rq)
@@ -848,6 +820,7 @@ RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity,
static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
avg_vruntime_add(cfs_rq, se);
+ update_zero_vruntime(cfs_rq);
se->min_vruntime = se->vruntime;
se->min_slice = se->slice;
rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline,
@@ -859,6 +832,7 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
rb_erase_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline,
&min_vruntime_cb);
avg_vruntime_sub(cfs_rq, se);
+ update_zero_vruntime(cfs_rq);
}
struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq)
@@ -1226,7 +1200,6 @@ static void update_curr(struct cfs_rq *cfs_rq)
curr->vruntime += calc_delta_fair(delta_exec, curr);
resched = update_deadline(cfs_rq, curr);
- update_min_vruntime(cfs_rq);
if (entity_is_task(curr)) {
/*
@@ -3808,15 +3781,6 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
if (!curr)
__enqueue_entity(cfs_rq, se);
cfs_rq->nr_queued++;
-
- /*
- * The entity's vruntime has been adjusted, so let's check
- * whether the rq-wide min_vruntime needs updated too. Since
- * the calculations above require stable min_vruntime rather
- * than up-to-date one, we do the update at the end of the
- * reweight process.
- */
- update_min_vruntime(cfs_rq);
}
}
@@ -5429,15 +5393,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
update_cfs_group(se);
- /*
- * Now advance min_vruntime if @se was the entity holding it back,
- * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
- * put back on, and if we advance min_vruntime, we'll be placed back
- * further than we started -- i.e. we'll be penalized.
- */
- if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
- update_min_vruntime(cfs_rq);
-
if (flags & DEQUEUE_DELAYED)
finish_delayed_dequeue_entity(se);
@@ -9015,7 +8970,6 @@ static void yield_task_fair(struct rq *rq)
if (entity_eligible(cfs_rq, se)) {
se->vruntime = se->deadline;
se->deadline += calc_delta_fair(se->slice, se);
- update_min_vruntime(cfs_rq);
}
}
@@ -13078,23 +13032,6 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* Which shows that S and s_i transform alike (which makes perfect sense
* given that S is basically the (weighted) average of s_i).
*
- * Then:
- *
- * x -> s_min := min{s_i} (8)
- *
- * to obtain:
- *
- * \Sum_i w_i (s_i - s_min)
- * S = s_min + ------------------------ (9)
- * \Sum_i w_i
- *
- * Which already looks familiar, and is the basis for our current
- * approximation:
- *
- * S ~= s_min (10)
- *
- * Now, obviously, (10) is absolute crap :-), but it sorta works.
- *
* So the thing to remember is that the above is strictly UP. It is
* possible to generalize to multiple runqueues -- however it gets really
* yuck when you have to add affinity support, as illustrated by our very
@@ -13116,23 +13053,23 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* Let, for our runqueue 'k':
*
* T_k = \Sum_i w_i s_i
- * W_k = \Sum_i w_i ; for all i of k (11)
+ * W_k = \Sum_i w_i ; for all i of k (8)
*
* Then we can write (6) like:
*
* T_k
- * S_k = --- (12)
+ * S_k = --- (9)
* W_k
*
* From which immediately follows that:
*
* T_k + T_l
- * S_k+l = --------- (13)
+ * S_k+l = --------- (10)
* W_k + W_l
*
* On which we can define a combined lag:
*
- * lag_k+l(i) := S_k+l - s_i (14)
+ * lag_k+l(i) := S_k+l - s_i (11)
*
* And that gives us the tools to compare tasks across a combined runqueue.
*
@@ -13143,7 +13080,7 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* using (7); this only requires storing single 'time'-stamps.
*
* b) when comparing tasks between 2 runqueues of which one is forced-idle,
- * compare the combined lag, per (14).
+ * compare the combined lag, per (11).
*
* Now, of course cgroups (I so hate them) make this more interesting in
* that a) seems to suggest we need to iterate all cgroup on a CPU at such
@@ -13191,12 +13128,11 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* every tick. This limits the observed divergence due to the work
* conservancy.
*
- * On top of that, we can improve upon things by moving away from our
- * horrible (10) hack and moving to (9) and employing (13) here.
+ * On top of that, we can improve upon things by employing (10) here.
*/
/*
- * se_fi_update - Update the cfs_rq->min_vruntime_fi in a CFS hierarchy if needed.
+ * se_fi_update - Update the cfs_rq->zero_vruntime_fi in a CFS hierarchy if needed.
*/
static void se_fi_update(const struct sched_entity *se, unsigned int fi_seq,
bool forceidle)
@@ -13210,7 +13146,7 @@ static void se_fi_update(const struct sched_entity *se, unsigned int fi_seq,
cfs_rq->forceidle_seq = fi_seq;
}
- cfs_rq->min_vruntime_fi = cfs_rq->min_vruntime;
+ cfs_rq->zero_vruntime_fi = cfs_rq->zero_vruntime;
}
}
@@ -13263,11 +13199,11 @@ bool cfs_prio_less(const struct task_struct *a, const struct task_struct *b,
/*
* Find delta after normalizing se's vruntime with its cfs_rq's
- * min_vruntime_fi, which would have been updated in prior calls
+ * zero_vruntime_fi, which would have been updated in prior calls
* to se_fi_update().
*/
delta = (s64)(sea->vruntime - seb->vruntime) +
- (s64)(cfs_rqb->min_vruntime_fi - cfs_rqa->min_vruntime_fi);
+ (s64)(cfs_rqb->zero_vruntime_fi - cfs_rqa->zero_vruntime_fi);
return delta > 0;
}
@@ -13513,7 +13449,7 @@ static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
void init_cfs_rq(struct cfs_rq *cfs_rq)
{
cfs_rq->tasks_timeline = RB_ROOT_CACHED;
- cfs_rq->min_vruntime = (u64)(-(1LL << 20));
+ cfs_rq->zero_vruntime = (u64)(-(1LL << 20));
raw_spin_lock_init(&cfs_rq->removed.lock);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 82e74e8ca2ea..5a3cf81c27be 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -681,10 +681,10 @@ struct cfs_rq {
s64 avg_vruntime;
u64 avg_load;
- u64 min_vruntime;
+ u64 zero_vruntime;
#ifdef CONFIG_SCHED_CORE
unsigned int forceidle_seq;
- u64 min_vruntime_fi;
+ u64 zero_vruntime_fi;
#endif
struct rb_root_cached tasks_timeline;
The issue occurs when gfs2_freeze_lock_shared() fails in
gfs2_fill_super(). If !sb_rdonly(sb), threads for the quotad and logd
were started, however, in the error path for gfs2_freeze_lock_shared(),
the threads are not stopped by gfs2_destroy_threads() before jumping to
fail_per_node.
This patch introduces fail_threads to handle stopping the threads if the
threads were started.
Reported-by: syzbot+4cb0d0336db6bc6930e9(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4cb0d0336db6bc6930e9
Fixes: a28dc123fa66 ("gfs2: init system threads before freeze lock")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ryota Sakamoto <sakamo.ryota(a)gmail.com>
---
fs/gfs2/ops_fstype.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e7a88b717991ae3647c1da039636daef7005a7f0..4b5ac1a7050f1fd34e10be4100a2bc381f49c83d 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1269,21 +1269,23 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc)
error = gfs2_freeze_lock_shared(sdp);
if (error)
- goto fail_per_node;
+ goto fail_threads;
if (!sb_rdonly(sb))
error = gfs2_make_fs_rw(sdp);
if (error) {
gfs2_freeze_unlock(sdp);
- gfs2_destroy_threads(sdp);
fs_err(sdp, "can't make FS RW: %d\n", error);
- goto fail_per_node;
+ goto fail_threads;
}
gfs2_glock_dq_uninit(&mount_gh);
gfs2_online_uevent(sdp);
return 0;
+fail_threads:
+ if (!sb_rdonly(sb))
+ gfs2_destroy_threads(sdp);
fail_per_node:
init_per_node(sdp, UNDO);
fail_inodes:
---
base-commit: 7839932417dd53bb09eb5a585a7a92781dfd7cb2
change-id: 20251230-fix-use-after-free-gfs2-66cfbe23baa8
Best regards,
--
Ryota Sakamoto <sakamo.ryota(a)gmail.com>
Fix a bug where an empty FDA (fd array) object with 0 fds would cause an
out-of-bounds error. The previous implementation used `skip == 0` to
mean "this is a pointer fixup", but 0 is also the correct skip length
for an empty FDA. If the FDA is at the end of the buffer, then this
results in an attempt to write 8-bytes out of bounds. This is caught and
results in an EINVAL error being returned to userspace.
The pattern of using `skip == 0` as a special value originates from the
C-implementation of Binder. As part of fixing this bug, this pattern is
replaced with a Rust enum.
I considered the alternate option of not pushing a fixup when the length
is zero, but I think it's cleaner to just get rid of the zero-is-special
stuff.
The root cause of this bug was diagnosed by Gemini CLI on first try. I
used the following prompt:
> There appears to be a bug in @drivers/android/binder/thread.rs where
> the Fixups oob bug is triggered with 316 304 316 324. This implies
> that we somehow ended up with a fixup where buffer A has a pointer to
> buffer B, but the pointer is located at an index in buffer A that is
> out of bounds. Please investigate the code to find the bug. You may
> compare with @drivers/android/binder.c that implements this correctly.
Cc: stable(a)vger.kernel.org
Reported-by: DeepChirp <DeepChirp(a)outlook.com>
Closes: https://github.com/waydroid/waydroid/issues/2157
Fixes: eafedbc7c050 ("rust_binder: add Rust Binder driver")
Tested-by: DeepChirp <DeepChirp(a)outlook.com>
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
drivers/android/binder/thread.rs | 59 +++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 25 deletions(-)
diff --git a/drivers/android/binder/thread.rs b/drivers/android/binder/thread.rs
index 1a8e6fdc0dc42369ee078e720aa02b2554fb7332..dcd47e10aeb8c748d04320fbbe15ad35201684b9 100644
--- a/drivers/android/binder/thread.rs
+++ b/drivers/android/binder/thread.rs
@@ -69,17 +69,24 @@ struct ScatterGatherEntry {
}
/// This entry specifies that a fixup should happen at `target_offset` of the
-/// buffer. If `skip` is nonzero, then the fixup is a `binder_fd_array_object`
-/// and is applied later. Otherwise if `skip` is zero, then the size of the
-/// fixup is `sizeof::<u64>()` and `pointer_value` is written to the buffer.
-struct PointerFixupEntry {
- /// The number of bytes to skip, or zero for a `binder_buffer_object` fixup.
- skip: usize,
- /// The translated pointer to write when `skip` is zero.
- pointer_value: u64,
- /// The offset at which the value should be written. The offset is relative
- /// to the original buffer.
- target_offset: usize,
+/// buffer.
+enum PointerFixupEntry {
+ /// A fixup for a `binder_buffer_object`.
+ Fixup {
+ /// The translated pointer to write.
+ pointer_value: u64,
+ /// The offset at which the value should be written. The offset is relative
+ /// to the original buffer.
+ target_offset: usize,
+ },
+ /// A skip for a `binder_fd_array_object`.
+ Skip {
+ /// The number of bytes to skip.
+ skip: usize,
+ /// The offset at which the skip should happen. The offset is relative
+ /// to the original buffer.
+ target_offset: usize,
+ },
}
/// Return type of `apply_and_validate_fixup_in_parent`.
@@ -762,8 +769,7 @@ fn translate_object(
parent_entry.fixup_min_offset = info.new_min_offset;
parent_entry.pointer_fixups.push(
- PointerFixupEntry {
- skip: 0,
+ PointerFixupEntry::Fixup {
pointer_value: buffer_ptr_in_user_space,
target_offset: info.target_offset,
},
@@ -807,9 +813,8 @@ fn translate_object(
parent_entry
.pointer_fixups
.push(
- PointerFixupEntry {
+ PointerFixupEntry::Skip {
skip: fds_len,
- pointer_value: 0,
target_offset: info.target_offset,
},
GFP_KERNEL,
@@ -871,17 +876,21 @@ fn apply_sg(&self, alloc: &mut Allocation, sg_state: &mut ScatterGatherState) ->
let mut reader =
UserSlice::new(UserPtr::from_addr(sg_entry.sender_uaddr), sg_entry.length).reader();
for fixup in &mut sg_entry.pointer_fixups {
- let fixup_len = if fixup.skip == 0 {
- size_of::<u64>()
- } else {
- fixup.skip
+ let (fixup_len, fixup_offset) = match fixup {
+ PointerFixupEntry::Fixup { target_offset, .. } => {
+ (size_of::<u64>(), *target_offset)
+ }
+ PointerFixupEntry::Skip {
+ skip,
+ target_offset,
+ } => (*skip, *target_offset),
};
- let target_offset_end = fixup.target_offset.checked_add(fixup_len).ok_or(EINVAL)?;
- if fixup.target_offset < end_of_previous_fixup || offset_end < target_offset_end {
+ let target_offset_end = fixup_offset.checked_add(fixup_len).ok_or(EINVAL)?;
+ if fixup_offset < end_of_previous_fixup || offset_end < target_offset_end {
pr_warn!(
"Fixups oob {} {} {} {}",
- fixup.target_offset,
+ fixup_offset,
end_of_previous_fixup,
offset_end,
target_offset_end
@@ -890,13 +899,13 @@ fn apply_sg(&self, alloc: &mut Allocation, sg_state: &mut ScatterGatherState) ->
}
let copy_off = end_of_previous_fixup;
- let copy_len = fixup.target_offset - end_of_previous_fixup;
+ let copy_len = fixup_offset - end_of_previous_fixup;
if let Err(err) = alloc.copy_into(&mut reader, copy_off, copy_len) {
pr_warn!("Failed copying into alloc: {:?}", err);
return Err(err.into());
}
- if fixup.skip == 0 {
- let res = alloc.write::<u64>(fixup.target_offset, &fixup.pointer_value);
+ if let PointerFixupEntry::Fixup { pointer_value, .. } = fixup {
+ let res = alloc.write::<u64>(fixup_offset, pointer_value);
if let Err(err) = res {
pr_warn!("Failed copying ptr into alloc: {:?}", err);
return Err(err.into());
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20251229-fda-zero-4e46e56be58d
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
Hi,
I’m reporting a performance regression of up to 6% sequential I/O
vdbench regression observed on 6.12.y kernel.
While running performance benchmarks on v6.12.60 kernel the sequential
I/O vdbench metrics are showing a 5-6% performance regression when
compared to v6.12.48
Bisect root cause commit
========================
- commit b39b62075ab4 ("cpuidle: menu: Remove iowait influence")
Things work fine again when the previously removed
performance-multiplier code is added back.
Test details
============
The system is connected to a number of disks in disk array using
multipathing and directio configuration in the vdbench profile.
wd=wd1,sd=sd*,rdpct=0,seekpct=sequential,xfersize=128k
rd=128k64T,wd=wd1,iorate=max,elapsed=600,interval=1,warmup=300,threads=64
Thanks,
Alok
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5654889a94b0de5ad6ceae3793e7f5e0b61b50b6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122905-crawfish-unaware-fb3a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5654889a94b0de5ad6ceae3793e7f5e0b61b50b6 Mon Sep 17 00:00:00 2001
From: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Date: Fri, 14 Nov 2025 11:33:13 +0100
Subject: [PATCH] ARM: dts: microchip: sama7g5: fix uart fifo size to 32
On some flexcom nodes related to uart, the fifo sizes were wrong: fix
them to 32 data.
Fixes: 7540629e2fc7 ("ARM: dts: at91: add sama7g5 SoC DT and sama7g5-ek")
Cc: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20251114103313.20220-2-nicolas.ferre@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea(a)tuxon.dev>
diff --git a/arch/arm/boot/dts/microchip/sama7g5.dtsi b/arch/arm/boot/dts/microchip/sama7g5.dtsi
index 381cbcfcb34a..03ef3d9aaeec 100644
--- a/arch/arm/boot/dts/microchip/sama7g5.dtsi
+++ b/arch/arm/boot/dts/microchip/sama7g5.dtsi
@@ -824,7 +824,7 @@ uart4: serial@200 {
dma-names = "tx", "rx";
atmel,use-dma-rx;
atmel,use-dma-tx;
- atmel,fifo-size = <16>;
+ atmel,fifo-size = <32>;
status = "disabled";
};
};
@@ -850,7 +850,7 @@ uart7: serial@200 {
dma-names = "tx", "rx";
atmel,use-dma-rx;
atmel,use-dma-tx;
- atmel,fifo-size = <16>;
+ atmel,fifo-size = <32>;
status = "disabled";
};
};
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 1cd5bb6e9e027bab33aafd58fe8340124869ba62
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122948-abide-broken-c7d3@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1cd5bb6e9e027bab33aafd58fe8340124869ba62 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)kernel.org>
Date: Sat, 6 Dec 2025 13:37:50 -0800
Subject: [PATCH] lib/crypto: riscv: Depend on
RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
Replace the RISCV_ISA_V dependency of the RISC-V crypto code with
RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS, which implies RISCV_ISA_V as
well as vector unaligned accesses being efficient.
This is necessary because this code assumes that vector unaligned
accesses are supported and are efficient. (It does so to avoid having
to use lots of extra vsetvli instructions to switch the element width
back and forth between 8 and either 32 or 64.)
This was omitted from the code originally just because the RISC-V kernel
support for detecting this feature didn't exist yet. Support has now
been added, but it's fragmented into per-CPU runtime detection, a
command-line parameter, and a kconfig option. The kconfig option is the
only reasonable way to do it, though, so let's just rely on that.
Fixes: eb24af5d7a05 ("crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}")
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Fixes: 600a3853dfa0 ("crypto: riscv - add vector crypto accelerated GHASH")
Fixes: 8c8e40470ffe ("crypto: riscv - add vector crypto accelerated SHA-{256,224}")
Fixes: b3415925a08b ("crypto: riscv - add vector crypto accelerated SHA-{512,384}")
Fixes: 563a5255afa2 ("crypto: riscv - add vector crypto accelerated SM3")
Fixes: b8d06352bbf3 ("crypto: riscv - add vector crypto accelerated SM4")
Cc: stable(a)vger.kernel.org
Reported-by: Vivian Wang <wangruikang(a)iscas.ac.cn>
Closes: https://lore.kernel.org/r/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/
Reviewed-by: Jerry Shih <jerry.shih(a)sifive.com>
Link: https://lore.kernel.org/r/20251206213750.81474-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index a75d6325607b..14c5acb935e9 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -4,7 +4,8 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
config CRYPTO_AES_RISCV64
tristate "Ciphers: AES, modes: ECB, CBC, CTS, CTR, XTS"
- depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ depends on 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
select CRYPTO_ALGAPI
select CRYPTO_LIB_AES
select CRYPTO_SKCIPHER
@@ -20,7 +21,8 @@ config CRYPTO_AES_RISCV64
config CRYPTO_GHASH_RISCV64
tristate "Hash functions: GHASH"
- depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ depends on 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
select CRYPTO_GCM
help
GCM GHASH function (NIST SP 800-38D)
@@ -30,7 +32,8 @@ config CRYPTO_GHASH_RISCV64
config CRYPTO_SM3_RISCV64
tristate "Hash functions: SM3 (ShangMi 3)"
- depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ depends on 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
select CRYPTO_HASH
select CRYPTO_LIB_SM3
help
@@ -42,7 +45,8 @@ config CRYPTO_SM3_RISCV64
config CRYPTO_SM4_RISCV64
tristate "Ciphers: SM4 (ShangMi 4)"
- depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ depends on 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
select CRYPTO_ALGAPI
select CRYPTO_SM4
help
diff --git a/lib/crypto/Kconfig b/lib/crypto/Kconfig
index a3647352bff6..6871a41e5069 100644
--- a/lib/crypto/Kconfig
+++ b/lib/crypto/Kconfig
@@ -61,7 +61,8 @@ config CRYPTO_LIB_CHACHA_ARCH
default y if ARM64 && KERNEL_MODE_NEON
default y if MIPS && CPU_MIPS32_R2
default y if PPC64 && CPU_LITTLE_ENDIAN && VSX
- default y if RISCV && 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ default y if RISCV && 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
default y if S390
default y if X86_64
@@ -184,7 +185,8 @@ config CRYPTO_LIB_SHA256_ARCH
default y if ARM64
default y if MIPS && CPU_CAVIUM_OCTEON
default y if PPC && SPE
- default y if RISCV && 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ default y if RISCV && 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
default y if S390
default y if SPARC64
default y if X86_64
@@ -202,7 +204,8 @@ config CRYPTO_LIB_SHA512_ARCH
default y if ARM && !CPU_V7M
default y if ARM64
default y if MIPS && CPU_CAVIUM_OCTEON
- default y if RISCV && 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ default y if RISCV && 64BIT && TOOLCHAIN_HAS_VECTOR_CRYPTO && \
+ RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
default y if S390
default y if SPARC64
default y if X86_64
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 47ef834209e5981f443240d8a8b45bf680df22aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122943-deprive-faster-f1e2@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 47ef834209e5981f443240d8a8b45bf680df22aa Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Thu, 4 Dec 2025 15:19:35 -0500
Subject: [PATCH] tracing: Fix fixed array of synthetic event
The commit 4d38328eb442d ("tracing: Fix synth event printk format for str
fields") replaced "%.*s" with "%s" but missed removing the number size of
the dynamic and static strings. The commit e1a453a57bc7 ("tracing: Do not
add length to print format in synthetic events") fixed the dynamic part
but did not fix the static part. That is, with the commands:
# echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
# echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger
That caused the output of:
<idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
<idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91
The commit e1a453a57bc7 fixed the part where the synthetic event had
"char[] wakee". But if one were to replace that with a static size string:
# echo 's:wake_lat char[16] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
Where "wakee" is defined as "char[16]" and not "char[]" making it a static
size, the code triggered the "(efaul)" again.
Remove the added STR_VAR_LEN_MAX size as the string is still going to be
nul terminated.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Douglas Raillard <douglas.raillard(a)arm.com>
Link: https://patch.msgid.link/20251204151935.5fa30355@gandalf.local.home
Fixes: e1a453a57bc7 ("tracing: Do not add length to print format in synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 2f19bbe73d27..4554c458b78c 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -375,7 +375,6 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64++;
} else {
trace_seq_printf(s, print_fmt, se->fields[i]->name,
- STR_VAR_LEN_MAX,
(char *)&entry->fields[n_u64].as_u64,
i == se->n_fields - 1 ? "" : " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 47ef834209e5981f443240d8a8b45bf680df22aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122942-dwelling-spelling-4707@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 47ef834209e5981f443240d8a8b45bf680df22aa Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Thu, 4 Dec 2025 15:19:35 -0500
Subject: [PATCH] tracing: Fix fixed array of synthetic event
The commit 4d38328eb442d ("tracing: Fix synth event printk format for str
fields") replaced "%.*s" with "%s" but missed removing the number size of
the dynamic and static strings. The commit e1a453a57bc7 ("tracing: Do not
add length to print format in synthetic events") fixed the dynamic part
but did not fix the static part. That is, with the commands:
# echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
# echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger
That caused the output of:
<idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
<idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91
The commit e1a453a57bc7 fixed the part where the synthetic event had
"char[] wakee". But if one were to replace that with a static size string:
# echo 's:wake_lat char[16] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
Where "wakee" is defined as "char[16]" and not "char[]" making it a static
size, the code triggered the "(efaul)" again.
Remove the added STR_VAR_LEN_MAX size as the string is still going to be
nul terminated.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Douglas Raillard <douglas.raillard(a)arm.com>
Link: https://patch.msgid.link/20251204151935.5fa30355@gandalf.local.home
Fixes: e1a453a57bc7 ("tracing: Do not add length to print format in synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 2f19bbe73d27..4554c458b78c 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -375,7 +375,6 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64++;
} else {
trace_seq_printf(s, print_fmt, se->fields[i]->name,
- STR_VAR_LEN_MAX,
(char *)&entry->fields[n_u64].as_u64,
i == se->n_fields - 1 ? "" : " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 47ef834209e5981f443240d8a8b45bf680df22aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122941-citric-sushi-4f65@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 47ef834209e5981f443240d8a8b45bf680df22aa Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Thu, 4 Dec 2025 15:19:35 -0500
Subject: [PATCH] tracing: Fix fixed array of synthetic event
The commit 4d38328eb442d ("tracing: Fix synth event printk format for str
fields") replaced "%.*s" with "%s" but missed removing the number size of
the dynamic and static strings. The commit e1a453a57bc7 ("tracing: Do not
add length to print format in synthetic events") fixed the dynamic part
but did not fix the static part. That is, with the commands:
# echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
# echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger
That caused the output of:
<idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
<idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91
The commit e1a453a57bc7 fixed the part where the synthetic event had
"char[] wakee". But if one were to replace that with a static size string:
# echo 's:wake_lat char[16] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
Where "wakee" is defined as "char[16]" and not "char[]" making it a static
size, the code triggered the "(efaul)" again.
Remove the added STR_VAR_LEN_MAX size as the string is still going to be
nul terminated.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Douglas Raillard <douglas.raillard(a)arm.com>
Link: https://patch.msgid.link/20251204151935.5fa30355@gandalf.local.home
Fixes: e1a453a57bc7 ("tracing: Do not add length to print format in synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 2f19bbe73d27..4554c458b78c 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -375,7 +375,6 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64++;
} else {
trace_seq_printf(s, print_fmt, se->fields[i]->name,
- STR_VAR_LEN_MAX,
(char *)&entry->fields[n_u64].as_u64,
i == se->n_fields - 1 ? "" : " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> On 19.12.25 20:31, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> cpufreq: dt-platdev: Add JH7110S SOC to the allowlist
>
> to the 6.18-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-
> queue.git;a=summary
>
> The filename of the patch is:
> cpufreq-dt-platdev-add-jh7110s-soc-to-the-allowlist.patch
> and it can be found in the queue-6.18 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree, please let
> <stable(a)vger.kernel.org> know about it.
As series [1] is accepted, this patch will be not needed and will be reverted in the mainline.
[1] https://lore.kernel.org/all/20251212211934.135602-1-e@freeshell.de/
So we should not add it to the stable tree. Thanks.
Best regards,
Hal
>
>
>
> commit f922a9b37397e7db449e3fa61c33c542ad783f87
> Author: Hal Feng <hal.feng(a)starfivetech.com>
> Date: Thu Oct 16 16:00:48 2025 +0800
>
> cpufreq: dt-platdev: Add JH7110S SOC to the allowlist
>
> [ Upstream commit 6e7970cab51d01b8f7c56f120486c571c22e1b80 ]
>
> Add the compatible strings for supporting the generic
> cpufreq driver on the StarFive JH7110S SoC.
>
> Signed-off-by: Hal Feng <hal.feng(a)starfivetech.com>
> Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt(a)canonical.com>
> Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-
> dt-platdev.c
> index cd1816a12bb99..dc11b62399ad5 100644
> --- a/drivers/cpufreq/cpufreq-dt-platdev.c
> +++ b/drivers/cpufreq/cpufreq-dt-platdev.c
> @@ -87,6 +87,7 @@ static const struct of_device_id allowlist[] __initconst = {
> { .compatible = "st-ericsson,u9540", },
>
> { .compatible = "starfive,jh7110", },
> + { .compatible = "starfive,jh7110s", },
>
> { .compatible = "ti,omap2", },
> { .compatible = "ti,omap4", },
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b14fad555302a2104948feaff70503b64c80ac01
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122931-palm-unfixed-3968@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b14fad555302a2104948feaff70503b64c80ac01 Mon Sep 17 00:00:00 2001
From: Prithvi Tambewagh <activprithvi(a)gmail.com>
Date: Thu, 25 Dec 2025 12:58:29 +0530
Subject: [PATCH] io_uring: fix filename leak in __io_openat_prep()
__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.
Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.
Reported-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f
Tested-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi(a)gmail.com>
Fixes: b9445598d8c6 ("io_uring: openat directly into fixed fd table")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index bfeb91b31bba..15dde9bd6ff6 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -73,13 +73,13 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
open->filename = NULL;
return ret;
}
+ req->flags |= REQ_F_NEED_CLEANUP;
open->file_slot = READ_ONCE(sqe->file_index);
if (open->file_slot && (open->how.flags & O_CLOEXEC))
return -EINVAL;
open->nofile = rlimit(RLIMIT_NOFILE);
- req->flags |= REQ_F_NEED_CLEANUP;
if (io_openat_force_async(open))
req->flags |= REQ_F_FORCE_ASYNC;
return 0;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x e15cb2200b934e507273510ba6bc747d5cde24a3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122915-sensually-wasting-f5f8@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e15cb2200b934e507273510ba6bc747d5cde24a3 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Tue, 9 Dec 2025 13:25:23 -0700
Subject: [PATCH] io_uring: fix min_wait wakeups for SQPOLL
Using min_wait, two timeouts are given:
1) The min_wait timeout, within which up to 'wait_nr' events are
waited for.
2) The overall long timeout, which is entered if no events are generated
in the min_wait window.
If the min_wait has expired, any event being posted must wake the task.
For SQPOLL, that isn't the case, as it won't trigger the io_has_work()
condition, as it will have already processed the task_work that happened
when an event was posted. This causes any event to trigger post the
min_wait to not always cause the waiting application to wakeup, and
instead it will wait until the overall timeout has expired. This can be
shown in a test case that has a 1 second min_wait, with a 5 second
overall wait, even if an event triggers after 1.5 seconds:
axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring
info: MIN_TIMEOUT supported: true, features: 0x3ffff
info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4
info: 1 cqes in 5000.2ms
where the expected result should be:
axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring
info: MIN_TIMEOUT supported: true, features: 0x3ffff
info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4
info: 1 cqes in 1500.3ms
When the min_wait timeout triggers, reset the number of completions
needed to wake the task. This should ensure that any future events will
wake the task, regardless of how many events it originally wanted to
wait for.
Reported-by: Tip ten Brink <tip(a)tenbrinkmeijs.com>
Cc: stable(a)vger.kernel.org
Fixes: 1100c4a2656d ("io_uring: add support for batch wait timeout")
Link: https://github.com/axboe/liburing/issues/1477
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5d130c578435..6cb24cdf8e68 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2536,6 +2536,9 @@ static enum hrtimer_restart io_cqring_min_timer_wakeup(struct hrtimer *timer)
goto out_wake;
}
+ /* any generated CQE posted past this time should wake us up */
+ iowq->cq_tail = iowq->cq_min_tail;
+
hrtimer_update_function(&iowq->t, io_cqring_timer_wakeup);
hrtimer_set_expires(timer, iowq->timeout);
return HRTIMER_RESTART;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 84230ad2d2afbf0c44c32967e525c0ad92e26b4e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122905-unused-cash-520e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 84230ad2d2afbf0c44c32967e525c0ad92e26b4e Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 1 Dec 2025 13:25:22 -0700
Subject: [PATCH] io_uring/poll: correctly handle io_poll_add() return value on
update
When the core of io_uring was updated to handle completions
consistently and with fixed return codes, the POLL_REMOVE opcode
with updates got slightly broken. If a POLL_ADD is pending and
then POLL_REMOVE is used to update the events of that request, if that
update causes the POLL_ADD to now trigger, then that completion is lost
and a CQE is never posted.
Additionally, ensure that if an update does cause an existing POLL_ADD
to complete, that the completion value isn't always overwritten with
-ECANCELED. For that case, whatever io_poll_add() set the value to
should just be retained.
Cc: stable(a)vger.kernel.org
Fixes: 97b388d70b53 ("io_uring: handle completions in the core")
Reported-by: syzbot+641eec6b7af1f62f2b99(a)syzkaller.appspotmail.com
Tested-by: syzbot+641eec6b7af1f62f2b99(a)syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 8aa4e3a31e73..3f1d716dcfab 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -937,12 +937,17 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED);
/* successfully updated, don't complete poll request */
- if (!ret2 || ret2 == -EIOCBQUEUED)
+ if (ret2 == IOU_ISSUE_SKIP_COMPLETE)
goto out;
+ /* request completed as part of the update, complete it */
+ else if (ret2 == IOU_COMPLETE)
+ goto complete;
}
- req_set_fail(preq);
io_req_set_res(preq, -ECANCELED, 0);
+complete:
+ if (preq->cqe.res < 0)
+ req_set_fail(preq);
preq->io_task_work.func = io_req_task_complete;
io_req_task_work_add(preq);
out:
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b14fad555302a2104948feaff70503b64c80ac01
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122931-primer-motivate-1780@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b14fad555302a2104948feaff70503b64c80ac01 Mon Sep 17 00:00:00 2001
From: Prithvi Tambewagh <activprithvi(a)gmail.com>
Date: Thu, 25 Dec 2025 12:58:29 +0530
Subject: [PATCH] io_uring: fix filename leak in __io_openat_prep()
__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.
Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.
Reported-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f
Tested-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi(a)gmail.com>
Fixes: b9445598d8c6 ("io_uring: openat directly into fixed fd table")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index bfeb91b31bba..15dde9bd6ff6 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -73,13 +73,13 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
open->filename = NULL;
return ret;
}
+ req->flags |= REQ_F_NEED_CLEANUP;
open->file_slot = READ_ONCE(sqe->file_index);
if (open->file_slot && (open->how.flags & O_CLOEXEC))
return -EINVAL;
open->nofile = rlimit(RLIMIT_NOFILE);
- req->flags |= REQ_F_NEED_CLEANUP;
if (io_openat_force_async(open))
req->flags |= REQ_F_FORCE_ASYNC;
return 0;
Since commit c6e126de43e7 ("of: Keep track of populated platform
devices") child devices will not be created by of_platform_populate()
if the devices had previously been deregistered individually so that the
OF_POPULATED flag is still set in the corresponding OF nodes.
Switch to using of_platform_depopulate() instead of open coding so that
the child devices are created if the driver is rebound.
Fixes: c6e126de43e7 ("of: Keep track of populated platform devices")
Cc: stable(a)vger.kernel.org # 3.16
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/mfd/omap-usb-host.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mfd/omap-usb-host.c b/drivers/mfd/omap-usb-host.c
index a77b6fc790f2..4d29a6e2ed87 100644
--- a/drivers/mfd/omap-usb-host.c
+++ b/drivers/mfd/omap-usb-host.c
@@ -819,8 +819,10 @@ static void usbhs_omap_remove(struct platform_device *pdev)
{
pm_runtime_disable(&pdev->dev);
- /* remove children */
- device_for_each_child(&pdev->dev, NULL, usbhs_omap_remove_child);
+ if (pdev->dev.of_node)
+ of_platform_depopulate(&pdev->dev);
+ else
+ device_for_each_child(&pdev->dev, NULL, usbhs_omap_remove_child);
}
static const struct dev_pm_ops usbhsomap_dev_pm_ops = {
--
2.51.2
Since commit c6e126de43e7 ("of: Keep track of populated platform
devices") child devices will not be created by of_platform_populate()
if the devices had previously been deregistered individually so that the
OF_POPULATED flag is still set in the corresponding OF nodes.
Switch to using of_platform_depopulate() instead of open coding so that
the child devices are created if the driver is rebound.
Fixes: c6e126de43e7 ("of: Keep track of populated platform devices")
Cc: stable(a)vger.kernel.org # 3.16
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/bus/omap-ocp2scp.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/drivers/bus/omap-ocp2scp.c b/drivers/bus/omap-ocp2scp.c
index e4dfda7b3b10..eee5ad191ea9 100644
--- a/drivers/bus/omap-ocp2scp.c
+++ b/drivers/bus/omap-ocp2scp.c
@@ -17,15 +17,6 @@
#define OCP2SCP_TIMING 0x18
#define SYNC2_MASK 0xf
-static int ocp2scp_remove_devices(struct device *dev, void *c)
-{
- struct platform_device *pdev = to_platform_device(dev);
-
- platform_device_unregister(pdev);
-
- return 0;
-}
-
static int omap_ocp2scp_probe(struct platform_device *pdev)
{
int ret;
@@ -79,7 +70,7 @@ static int omap_ocp2scp_probe(struct platform_device *pdev)
pm_runtime_disable(&pdev->dev);
err0:
- device_for_each_child(&pdev->dev, NULL, ocp2scp_remove_devices);
+ of_platform_depopulate(&pdev->dev);
return ret;
}
@@ -87,7 +78,7 @@ static int omap_ocp2scp_probe(struct platform_device *pdev)
static void omap_ocp2scp_remove(struct platform_device *pdev)
{
pm_runtime_disable(&pdev->dev);
- device_for_each_child(&pdev->dev, NULL, ocp2scp_remove_devices);
+ of_platform_depopulate(&pdev->dev);
}
#ifdef CONFIG_OF
--
2.51.2
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.1.y
git checkout FETCH_HEAD
git cherry-pick -x d1bea0ce35b6095544ee82bb54156fc62c067e58
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122919-skirt-skittle-4fab@gregkh' --subject-prefix 'PATCH 5.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d1bea0ce35b6095544ee82bb54156fc62c067e58 Mon Sep 17 00:00:00 2001
From: Joshua Rogers <linux(a)joshua.hu>
Date: Fri, 7 Nov 2025 10:09:49 -0500
Subject: [PATCH] svcrdma: bound check rq_pages index in inline path
svc_rdma_copy_inline_range indexed rqstp->rq_pages[rc_curpage] without
verifying rc_curpage stays within the allocated page array. Add guards
before the first use and after advancing to a new page.
Fixes: d7cc73972661 ("svcrdma: support multiple Read chunks per RPC")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joshua Rogers <linux(a)joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index e813e5463352..310de7a80be5 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -841,6 +841,9 @@ static int svc_rdma_copy_inline_range(struct svc_rqst *rqstp,
for (page_no = 0; page_no < numpages; page_no++) {
unsigned int page_len;
+ if (head->rc_curpage >= rqstp->rq_maxpages)
+ return -EINVAL;
+
page_len = min_t(unsigned int, remaining,
PAGE_SIZE - head->rc_pageoff);
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.13.y
git checkout FETCH_HEAD
git cherry-pick -x 7c8b465a1c91f674655ea9cec5083744ec5f796a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122924-outnumber-heroics-0ca8@gregkh' --subject-prefix 'PATCH 5.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c8b465a1c91f674655ea9cec5083744ec5f796a Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Mon, 22 Sep 2025 09:29:23 -0700
Subject: [PATCH] KVM: SVM: Mark VMCB_NPT as dirty on nested VMRUN
Mark the VMCB_NPT bit as dirty in nested_vmcb02_prepare_save()
on every nested VMRUN.
If L1 changes the PAT MSR between two VMRUN instructions on the same
L1 vCPU, the g_pat field in the associated vmcb02 will change, and the
VMCB_NPT clean bit should be cleared.
Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 35cea27862c6..83de3456df70 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -613,6 +613,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
struct kvm_vcpu *vcpu = &svm->vcpu;
nested_vmcb02_compute_g_pat(svm);
+ vmcb_mark_dirty(vmcb02, VMCB_NPT);
/* Load the nested guest state */
if (svm->nested.vmcb12_gpa != svm->nested.last_vmcb12_gpa) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 913f7cf77bf14c13cfea70e89bcb6d0b22239562
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122943-shuffling-jailhouse-320b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 913f7cf77bf14c13cfea70e89bcb6d0b22239562 Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 18 Nov 2025 19:51:19 -0500
Subject: [PATCH] NFSD: NFSv4 file creation neglects setting ACL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".
The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.
Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL.
Reported-by: Aurélien Couderc <aurelien.couderc2002(a)gmail.com>
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Roland Mainz <roland.mainz(a)nrubsig.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fa46f8b5f132..1dd3ae3ceb3a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -67,7 +67,8 @@ static inline bool nfsd_attrs_valid(struct nfsd_attrs *attrs)
struct iattr *iap = attrs->na_iattr;
return (iap->ia_valid || (attrs->na_seclabel &&
- attrs->na_seclabel->len));
+ attrs->na_seclabel->len) ||
+ attrs->na_pacl || attrs->na_dpacl);
}
__be32 nfserrno (int errno);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 913f7cf77bf14c13cfea70e89bcb6d0b22239562
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122942-omit-compress-a9fe@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 913f7cf77bf14c13cfea70e89bcb6d0b22239562 Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 18 Nov 2025 19:51:19 -0500
Subject: [PATCH] NFSD: NFSv4 file creation neglects setting ACL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".
The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.
Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL.
Reported-by: Aurélien Couderc <aurelien.couderc2002(a)gmail.com>
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Roland Mainz <roland.mainz(a)nrubsig.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fa46f8b5f132..1dd3ae3ceb3a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -67,7 +67,8 @@ static inline bool nfsd_attrs_valid(struct nfsd_attrs *attrs)
struct iattr *iap = attrs->na_iattr;
return (iap->ia_valid || (attrs->na_seclabel &&
- attrs->na_seclabel->len));
+ attrs->na_seclabel->len) ||
+ attrs->na_pacl || attrs->na_dpacl);
}
__be32 nfserrno (int errno);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 913f7cf77bf14c13cfea70e89bcb6d0b22239562
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122941-civic-revered-b250@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 913f7cf77bf14c13cfea70e89bcb6d0b22239562 Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 18 Nov 2025 19:51:19 -0500
Subject: [PATCH] NFSD: NFSv4 file creation neglects setting ACL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".
The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.
Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL.
Reported-by: Aurélien Couderc <aurelien.couderc2002(a)gmail.com>
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Roland Mainz <roland.mainz(a)nrubsig.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fa46f8b5f132..1dd3ae3ceb3a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -67,7 +67,8 @@ static inline bool nfsd_attrs_valid(struct nfsd_attrs *attrs)
struct iattr *iap = attrs->na_iattr;
return (iap->ia_valid || (attrs->na_seclabel &&
- attrs->na_seclabel->len));
+ attrs->na_seclabel->len) ||
+ attrs->na_pacl || attrs->na_dpacl);
}
__be32 nfserrno (int errno);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 913f7cf77bf14c13cfea70e89bcb6d0b22239562
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122941-crusher-hamstring-d100@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 913f7cf77bf14c13cfea70e89bcb6d0b22239562 Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 18 Nov 2025 19:51:19 -0500
Subject: [PATCH] NFSD: NFSv4 file creation neglects setting ACL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
An NFSv4 client that sets an ACL with a named principal during file
creation retrieves the ACL afterwards, and finds that it is only a
default ACL (based on the mode bits) and not the ACL that was
requested during file creation. This violates RFC 8881 section
6.4.1.3: "the ACL attribute is set as given".
The issue occurs in nfsd_create_setattr(), which calls
nfsd_attrs_valid() to determine whether to call nfsd_setattr().
However, nfsd_attrs_valid() checks only for iattr changes and
security labels, but not POSIX ACLs. When only an ACL is present,
the function returns false, nfsd_setattr() is skipped, and the
POSIX ACL is never applied to the inode.
Subsequently, when the client retrieves the ACL, the server finds
no POSIX ACL on the inode and returns one generated from the file's
mode bits rather than returning the originally-specified ACL.
Reported-by: Aurélien Couderc <aurelien.couderc2002(a)gmail.com>
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Roland Mainz <roland.mainz(a)nrubsig.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fa46f8b5f132..1dd3ae3ceb3a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -67,7 +67,8 @@ static inline bool nfsd_attrs_valid(struct nfsd_attrs *attrs)
struct iattr *iap = attrs->na_iattr;
return (iap->ia_valid || (attrs->na_seclabel &&
- attrs->na_seclabel->len));
+ attrs->na_seclabel->len) ||
+ attrs->na_pacl || attrs->na_dpacl);
}
__be32 nfserrno (int errno);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x dd75c723ef566f7f009c047f47e0eee95fe348ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122945-fender-caviar-a172@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dd75c723ef566f7f009c047f47e0eee95fe348ab Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ren=C3=A9=20Rebe?= <rene(a)exactco.de>
Date: Tue, 2 Dec 2025 19:41:37 +0100
Subject: [PATCH] r8169: fix RTL8117 Wake-on-Lan in DASH mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Wake-on-Lan does currently not work for r8169 in DASH mode, e.g. the
ASUS Pro WS X570-ACE with RTL8168fp/RTL8117.
Fix by not returning early in rtl_prepare_power_down when dash_enabled.
While this fixes WoL, it still kills the OOB RTL8117 remote management
BMC connection. Fix by not calling rtl8168_driver_stop if WoL is enabled.
Fixes: 065c27c184d6 ("r8169: phy power ops")
Signed-off-by: René Rebe <rene(a)exactco.de>
Cc: stable(a)vger.kernel.org
Reviewed-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Link: https://patch.msgid.link/20251202.194137.1647877804487085954.rene@exactco.de
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 405e91eb3141..755083852eef 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2655,9 +2655,6 @@ static void rtl_wol_enable_rx(struct rtl8169_private *tp)
static void rtl_prepare_power_down(struct rtl8169_private *tp)
{
- if (tp->dash_enabled)
- return;
-
if (tp->mac_version == RTL_GIGA_MAC_VER_32 ||
tp->mac_version == RTL_GIGA_MAC_VER_33)
rtl_ephy_write(tp, 0x19, 0xff64);
@@ -4812,7 +4809,7 @@ static void rtl8169_down(struct rtl8169_private *tp)
rtl_disable_exit_l1(tp);
rtl_prepare_power_down(tp);
- if (tp->dash_type != RTL_DASH_NONE)
+ if (tp->dash_type != RTL_DASH_NONE && !tp->saved_wolopts)
rtl8168_driver_stop(tp);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 93c9e107386dbe1243287a5b14ceca894de372b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122950-corroding-irritant-bfbf@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 93c9e107386dbe1243287a5b14ceca894de372b9 Mon Sep 17 00:00:00 2001
From: Jim Mattson <jmattson(a)google.com>
Date: Mon, 22 Sep 2025 09:29:22 -0700
Subject: [PATCH] KVM: SVM: Mark VMCB_PERM_MAP as dirty on nested VMRUN
Mark the VMCB_PERM_MAP bit as dirty in nested_vmcb02_prepare_control()
on every nested VMRUN.
If L1 changes MSR interception (INTERCEPT_MSR_PROT) between two VMRUN
instructions on the same L1 vCPU, the msrpm_base_pa in the associated
vmcb02 will change, and the VMCB_PERM_MAP clean bit should be cleared.
Fixes: 4bb170a5430b ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Reported-by: Matteo Rizzo <matteorizzo(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a6443feab252..35cea27862c6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -752,6 +752,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
+ vmcb_mark_dirty(vmcb02, VMCB_PERM_MAP);
/*
* Stash vmcb02's counter if the guest hasn't moved past the guilty
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 3d80f4c93d3d26d0f9a0dd2844961a632eeea634
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122954-waged-spendable-b898@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3d80f4c93d3d26d0f9a0dd2844961a632eeea634 Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Date: Fri, 24 Oct 2025 19:29:18 +0000
Subject: [PATCH] KVM: nSVM: Avoid incorrect injection of
SVM_EXIT_CR0_SEL_WRITE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When emulating L2 instructions, svm_check_intercept() checks whether a
write to CR0 should trigger a synthesized #VMEXIT with
SVM_EXIT_CR0_SEL_WRITE. However, it does not check whether L1 enabled
the intercept for SVM_EXIT_WRITE_CR0, which has higher priority
according to the APM (24593—Rev. 3.42—March 2024, Table 15-7):
When both selective and non-selective CR0-write intercepts are active at
the same time, the non-selective intercept takes priority. With respect
to exceptions, the priority of this intercept is the same as the generic
CR0-write intercept.
Make sure L1 does NOT intercept SVM_EXIT_WRITE_CR0 before checking if
SVM_EXIT_CR0_SEL_WRITE needs to be injected.
Opportunistically tweak the "not CR0" logic to explicitly bail early so
that it's more obvious that only CR0 has a selective intercept, and that
modifying icpt_info.exit_code is functionally necessary so that the call
to nested_svm_exit_handled() checks the correct exit code.
Fixes: cfec82cb7d31 ("KVM: SVM: Add intercept check for emulated cr accesses")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Link: https://patch.msgid.link/20251024192918.3191141-4-yosry.ahmed@linux.dev
[sean: isolate non-CR0 write logic, tweak comments accordingly]
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bd8df212a59d..1ae7b3c5a7c5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4535,15 +4535,29 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
case SVM_EXIT_WRITE_CR0: {
unsigned long cr0, val;
- if (info->intercept == x86_intercept_cr_write)
+ /*
+ * Adjust the exit code accordingly if a CR other than CR0 is
+ * being written, and skip straight to the common handling as
+ * only CR0 has an additional selective intercept.
+ */
+ if (info->intercept == x86_intercept_cr_write && info->modrm_reg) {
icpt_info.exit_code += info->modrm_reg;
+ break;
+ }
- if (icpt_info.exit_code != SVM_EXIT_WRITE_CR0 ||
- info->intercept == x86_intercept_clts)
+ /*
+ * Convert the exit_code to SVM_EXIT_CR0_SEL_WRITE if a
+ * selective CR0 intercept is triggered (the common logic will
+ * treat the selective intercept as being enabled). Note, the
+ * unconditional intercept has higher priority, i.e. this is
+ * only relevant if *only* the selective intercept is enabled.
+ */
+ if (vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_CR0_WRITE) ||
+ !(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SELECTIVE_CR0)))
break;
- if (!(vmcb12_is_intercept(&svm->nested.ctl,
- INTERCEPT_SELECTIVE_CR0)))
+ /* CLTS never triggers INTERCEPT_SELECTIVE_CR0 */
+ if (info->intercept == x86_intercept_clts)
break;
/* LMSW always triggers INTERCEPT_SELECTIVE_CR0 */
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3d80f4c93d3d26d0f9a0dd2844961a632eeea634
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122954-augmented-paralyses-a384@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3d80f4c93d3d26d0f9a0dd2844961a632eeea634 Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Date: Fri, 24 Oct 2025 19:29:18 +0000
Subject: [PATCH] KVM: nSVM: Avoid incorrect injection of
SVM_EXIT_CR0_SEL_WRITE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When emulating L2 instructions, svm_check_intercept() checks whether a
write to CR0 should trigger a synthesized #VMEXIT with
SVM_EXIT_CR0_SEL_WRITE. However, it does not check whether L1 enabled
the intercept for SVM_EXIT_WRITE_CR0, which has higher priority
according to the APM (24593—Rev. 3.42—March 2024, Table 15-7):
When both selective and non-selective CR0-write intercepts are active at
the same time, the non-selective intercept takes priority. With respect
to exceptions, the priority of this intercept is the same as the generic
CR0-write intercept.
Make sure L1 does NOT intercept SVM_EXIT_WRITE_CR0 before checking if
SVM_EXIT_CR0_SEL_WRITE needs to be injected.
Opportunistically tweak the "not CR0" logic to explicitly bail early so
that it's more obvious that only CR0 has a selective intercept, and that
modifying icpt_info.exit_code is functionally necessary so that the call
to nested_svm_exit_handled() checks the correct exit code.
Fixes: cfec82cb7d31 ("KVM: SVM: Add intercept check for emulated cr accesses")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Link: https://patch.msgid.link/20251024192918.3191141-4-yosry.ahmed@linux.dev
[sean: isolate non-CR0 write logic, tweak comments accordingly]
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bd8df212a59d..1ae7b3c5a7c5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4535,15 +4535,29 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
case SVM_EXIT_WRITE_CR0: {
unsigned long cr0, val;
- if (info->intercept == x86_intercept_cr_write)
+ /*
+ * Adjust the exit code accordingly if a CR other than CR0 is
+ * being written, and skip straight to the common handling as
+ * only CR0 has an additional selective intercept.
+ */
+ if (info->intercept == x86_intercept_cr_write && info->modrm_reg) {
icpt_info.exit_code += info->modrm_reg;
+ break;
+ }
- if (icpt_info.exit_code != SVM_EXIT_WRITE_CR0 ||
- info->intercept == x86_intercept_clts)
+ /*
+ * Convert the exit_code to SVM_EXIT_CR0_SEL_WRITE if a
+ * selective CR0 intercept is triggered (the common logic will
+ * treat the selective intercept as being enabled). Note, the
+ * unconditional intercept has higher priority, i.e. this is
+ * only relevant if *only* the selective intercept is enabled.
+ */
+ if (vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_CR0_WRITE) ||
+ !(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SELECTIVE_CR0)))
break;
- if (!(vmcb12_is_intercept(&svm->nested.ctl,
- INTERCEPT_SELECTIVE_CR0)))
+ /* CLTS never triggers INTERCEPT_SELECTIVE_CR0 */
+ if (info->intercept == x86_intercept_clts)
break;
/* LMSW always triggers INTERCEPT_SELECTIVE_CR0 */
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 43169328c7b4623b54b7713ec68479cebda5465f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122944-stonewall-evoke-f7f0@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43169328c7b4623b54b7713ec68479cebda5465f Mon Sep 17 00:00:00 2001
From: Vivian Wang <wangruikang(a)iscas.ac.cn>
Date: Tue, 2 Dec 2025 13:25:07 +0800
Subject: [PATCH] lib/crypto: riscv/chacha: Avoid s0/fp register
In chacha_zvkb, avoid using the s0 register, which is the frame pointer,
by reallocating KEY0 to t5. This makes stack traces available if e.g. a
crash happens in chacha_zvkb.
No frame pointer maintenance is otherwise required since this is a leaf
function.
Signed-off-by: Vivian Wang <wangruikang(a)iscas.ac.cn>
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20251202-riscv-chacha_zvkb-fp-v2-1-7bd00098c9dc@i…
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
diff --git a/lib/crypto/riscv/chacha-riscv64-zvkb.S b/lib/crypto/riscv/chacha-riscv64-zvkb.S
index b777d0b4e379..3d183ec818f5 100644
--- a/lib/crypto/riscv/chacha-riscv64-zvkb.S
+++ b/lib/crypto/riscv/chacha-riscv64-zvkb.S
@@ -60,7 +60,8 @@
#define VL t2
#define STRIDE t3
#define ROUND_CTR t4
-#define KEY0 s0
+#define KEY0 t5
+// Avoid s0/fp to allow for unwinding
#define KEY1 s1
#define KEY2 s2
#define KEY3 s3
@@ -143,7 +144,6 @@
// The updated 32-bit counter is written back to state->x[12] before returning.
SYM_FUNC_START(chacha_zvkb)
addi sp, sp, -96
- sd s0, 0(sp)
sd s1, 8(sp)
sd s2, 16(sp)
sd s3, 24(sp)
@@ -280,7 +280,6 @@ SYM_FUNC_START(chacha_zvkb)
bnez NBLOCKS, .Lblock_loop
sw COUNTER, 48(STATEP)
- ld s0, 0(sp)
ld s1, 8(sp)
ld s2, 16(sp)
ld s3, 24(sp)
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 84230ad2d2afbf0c44c32967e525c0ad92e26b4e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122911-bonding-sampling-8ca0@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 84230ad2d2afbf0c44c32967e525c0ad92e26b4e Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 1 Dec 2025 13:25:22 -0700
Subject: [PATCH] io_uring/poll: correctly handle io_poll_add() return value on
update
When the core of io_uring was updated to handle completions
consistently and with fixed return codes, the POLL_REMOVE opcode
with updates got slightly broken. If a POLL_ADD is pending and
then POLL_REMOVE is used to update the events of that request, if that
update causes the POLL_ADD to now trigger, then that completion is lost
and a CQE is never posted.
Additionally, ensure that if an update does cause an existing POLL_ADD
to complete, that the completion value isn't always overwritten with
-ECANCELED. For that case, whatever io_poll_add() set the value to
should just be retained.
Cc: stable(a)vger.kernel.org
Fixes: 97b388d70b53 ("io_uring: handle completions in the core")
Reported-by: syzbot+641eec6b7af1f62f2b99(a)syzkaller.appspotmail.com
Tested-by: syzbot+641eec6b7af1f62f2b99(a)syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 8aa4e3a31e73..3f1d716dcfab 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -937,12 +937,17 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED);
/* successfully updated, don't complete poll request */
- if (!ret2 || ret2 == -EIOCBQUEUED)
+ if (ret2 == IOU_ISSUE_SKIP_COMPLETE)
goto out;
+ /* request completed as part of the update, complete it */
+ else if (ret2 == IOU_COMPLETE)
+ goto complete;
}
- req_set_fail(preq);
io_req_set_res(preq, -ECANCELED, 0);
+complete:
+ if (preq->cqe.res < 0)
+ req_set_fail(preq);
preq->io_task_work.func = io_req_task_complete;
io_req_task_work_add(preq);
out:
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 84230ad2d2afbf0c44c32967e525c0ad92e26b4e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122926-disarray-agile-9880@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 84230ad2d2afbf0c44c32967e525c0ad92e26b4e Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 1 Dec 2025 13:25:22 -0700
Subject: [PATCH] io_uring/poll: correctly handle io_poll_add() return value on
update
When the core of io_uring was updated to handle completions
consistently and with fixed return codes, the POLL_REMOVE opcode
with updates got slightly broken. If a POLL_ADD is pending and
then POLL_REMOVE is used to update the events of that request, if that
update causes the POLL_ADD to now trigger, then that completion is lost
and a CQE is never posted.
Additionally, ensure that if an update does cause an existing POLL_ADD
to complete, that the completion value isn't always overwritten with
-ECANCELED. For that case, whatever io_poll_add() set the value to
should just be retained.
Cc: stable(a)vger.kernel.org
Fixes: 97b388d70b53 ("io_uring: handle completions in the core")
Reported-by: syzbot+641eec6b7af1f62f2b99(a)syzkaller.appspotmail.com
Tested-by: syzbot+641eec6b7af1f62f2b99(a)syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 8aa4e3a31e73..3f1d716dcfab 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -937,12 +937,17 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED);
/* successfully updated, don't complete poll request */
- if (!ret2 || ret2 == -EIOCBQUEUED)
+ if (ret2 == IOU_ISSUE_SKIP_COMPLETE)
goto out;
+ /* request completed as part of the update, complete it */
+ else if (ret2 == IOU_COMPLETE)
+ goto complete;
}
- req_set_fail(preq);
io_req_set_res(preq, -ECANCELED, 0);
+complete:
+ if (preq->cqe.res < 0)
+ req_set_fail(preq);
preq->io_task_work.func = io_req_task_complete;
io_req_task_work_add(preq);
out:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 782be79e4551550d7a82b1957fc0f7347e6d461f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122945-exemplary-politely-9c27@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782be79e4551550d7a82b1957fc0f7347e6d461f Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 18 Dec 2025 16:35:15 +0100
Subject: [PATCH] usb: gadget: lpc32xx_udc: fix clock imbalance in error path
A recent change fixing a device reference leak introduced a clock
imbalance by reusing an error path so that the clock may be disabled
before having been enabled.
Note that the clock framework allows for passing in NULL clocks so there
is no risk for a NULL pointer dereference.
Also drop the bogus I2C client NULL check added by the offending commit
as the pointer has already been verified to be non-NULL.
Fixes: c84117912bdd ("USB: lpc32xx_udc: Fix error handling in probe")
Cc: stable(a)vger.kernel.org
Cc: Ma Ke <make24(a)iscas.ac.cn>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: Vladimir Zapolskiy <vz(a)mleia.com>
Link: https://patch.msgid.link/20251218153519.19453-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/gadget/udc/lpc32xx_udc.c b/drivers/usb/gadget/udc/lpc32xx_udc.c
index 73c0f28a8585..a962d4294fbe 100644
--- a/drivers/usb/gadget/udc/lpc32xx_udc.c
+++ b/drivers/usb/gadget/udc/lpc32xx_udc.c
@@ -3020,7 +3020,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
pdev->dev.dma_mask = &lpc32xx_usbd_dmamask;
retval = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
if (retval)
- goto i2c_fail;
+ goto err_put_client;
udc->board = &lpc32xx_usbddata;
@@ -3040,7 +3040,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
udc->udp_irq[i] = platform_get_irq(pdev, i);
if (udc->udp_irq[i] < 0) {
retval = udc->udp_irq[i];
- goto i2c_fail;
+ goto err_put_client;
}
}
@@ -3048,7 +3048,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->udp_baseaddr)) {
dev_err(udc->dev, "IO map failure\n");
retval = PTR_ERR(udc->udp_baseaddr);
- goto i2c_fail;
+ goto err_put_client;
}
/* Get USB device clock */
@@ -3056,14 +3056,14 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->usb_slv_clk)) {
dev_err(udc->dev, "failed to acquire USB device clock\n");
retval = PTR_ERR(udc->usb_slv_clk);
- goto i2c_fail;
+ goto err_put_client;
}
/* Enable USB device clock */
retval = clk_prepare_enable(udc->usb_slv_clk);
if (retval < 0) {
dev_err(udc->dev, "failed to start USB device clock\n");
- goto i2c_fail;
+ goto err_put_client;
}
/* Setup deferred workqueue data */
@@ -3165,9 +3165,10 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
i2c_fail:
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
clk_disable_unprepare(udc->usb_slv_clk);
+err_put_client:
+ put_device(&udc->isp1301_i2c_client->dev);
+
dev_err(udc->dev, "%s probe failed, %d\n", driver_name, retval);
return retval;
@@ -3195,10 +3196,9 @@ static void lpc32xx_udc_remove(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
-
clk_disable_unprepare(udc->usb_slv_clk);
+
+ put_device(&udc->isp1301_i2c_client->dev);
}
#ifdef CONFIG_PM
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 782be79e4551550d7a82b1957fc0f7347e6d461f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122944-barcode-venue-6fe6@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782be79e4551550d7a82b1957fc0f7347e6d461f Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 18 Dec 2025 16:35:15 +0100
Subject: [PATCH] usb: gadget: lpc32xx_udc: fix clock imbalance in error path
A recent change fixing a device reference leak introduced a clock
imbalance by reusing an error path so that the clock may be disabled
before having been enabled.
Note that the clock framework allows for passing in NULL clocks so there
is no risk for a NULL pointer dereference.
Also drop the bogus I2C client NULL check added by the offending commit
as the pointer has already been verified to be non-NULL.
Fixes: c84117912bdd ("USB: lpc32xx_udc: Fix error handling in probe")
Cc: stable(a)vger.kernel.org
Cc: Ma Ke <make24(a)iscas.ac.cn>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: Vladimir Zapolskiy <vz(a)mleia.com>
Link: https://patch.msgid.link/20251218153519.19453-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/gadget/udc/lpc32xx_udc.c b/drivers/usb/gadget/udc/lpc32xx_udc.c
index 73c0f28a8585..a962d4294fbe 100644
--- a/drivers/usb/gadget/udc/lpc32xx_udc.c
+++ b/drivers/usb/gadget/udc/lpc32xx_udc.c
@@ -3020,7 +3020,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
pdev->dev.dma_mask = &lpc32xx_usbd_dmamask;
retval = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
if (retval)
- goto i2c_fail;
+ goto err_put_client;
udc->board = &lpc32xx_usbddata;
@@ -3040,7 +3040,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
udc->udp_irq[i] = platform_get_irq(pdev, i);
if (udc->udp_irq[i] < 0) {
retval = udc->udp_irq[i];
- goto i2c_fail;
+ goto err_put_client;
}
}
@@ -3048,7 +3048,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->udp_baseaddr)) {
dev_err(udc->dev, "IO map failure\n");
retval = PTR_ERR(udc->udp_baseaddr);
- goto i2c_fail;
+ goto err_put_client;
}
/* Get USB device clock */
@@ -3056,14 +3056,14 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->usb_slv_clk)) {
dev_err(udc->dev, "failed to acquire USB device clock\n");
retval = PTR_ERR(udc->usb_slv_clk);
- goto i2c_fail;
+ goto err_put_client;
}
/* Enable USB device clock */
retval = clk_prepare_enable(udc->usb_slv_clk);
if (retval < 0) {
dev_err(udc->dev, "failed to start USB device clock\n");
- goto i2c_fail;
+ goto err_put_client;
}
/* Setup deferred workqueue data */
@@ -3165,9 +3165,10 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
i2c_fail:
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
clk_disable_unprepare(udc->usb_slv_clk);
+err_put_client:
+ put_device(&udc->isp1301_i2c_client->dev);
+
dev_err(udc->dev, "%s probe failed, %d\n", driver_name, retval);
return retval;
@@ -3195,10 +3196,9 @@ static void lpc32xx_udc_remove(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
-
clk_disable_unprepare(udc->usb_slv_clk);
+
+ put_device(&udc->isp1301_i2c_client->dev);
}
#ifdef CONFIG_PM
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 782be79e4551550d7a82b1957fc0f7347e6d461f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122944-county-snowplow-ae8e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782be79e4551550d7a82b1957fc0f7347e6d461f Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 18 Dec 2025 16:35:15 +0100
Subject: [PATCH] usb: gadget: lpc32xx_udc: fix clock imbalance in error path
A recent change fixing a device reference leak introduced a clock
imbalance by reusing an error path so that the clock may be disabled
before having been enabled.
Note that the clock framework allows for passing in NULL clocks so there
is no risk for a NULL pointer dereference.
Also drop the bogus I2C client NULL check added by the offending commit
as the pointer has already been verified to be non-NULL.
Fixes: c84117912bdd ("USB: lpc32xx_udc: Fix error handling in probe")
Cc: stable(a)vger.kernel.org
Cc: Ma Ke <make24(a)iscas.ac.cn>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: Vladimir Zapolskiy <vz(a)mleia.com>
Link: https://patch.msgid.link/20251218153519.19453-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/gadget/udc/lpc32xx_udc.c b/drivers/usb/gadget/udc/lpc32xx_udc.c
index 73c0f28a8585..a962d4294fbe 100644
--- a/drivers/usb/gadget/udc/lpc32xx_udc.c
+++ b/drivers/usb/gadget/udc/lpc32xx_udc.c
@@ -3020,7 +3020,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
pdev->dev.dma_mask = &lpc32xx_usbd_dmamask;
retval = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
if (retval)
- goto i2c_fail;
+ goto err_put_client;
udc->board = &lpc32xx_usbddata;
@@ -3040,7 +3040,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
udc->udp_irq[i] = platform_get_irq(pdev, i);
if (udc->udp_irq[i] < 0) {
retval = udc->udp_irq[i];
- goto i2c_fail;
+ goto err_put_client;
}
}
@@ -3048,7 +3048,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->udp_baseaddr)) {
dev_err(udc->dev, "IO map failure\n");
retval = PTR_ERR(udc->udp_baseaddr);
- goto i2c_fail;
+ goto err_put_client;
}
/* Get USB device clock */
@@ -3056,14 +3056,14 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->usb_slv_clk)) {
dev_err(udc->dev, "failed to acquire USB device clock\n");
retval = PTR_ERR(udc->usb_slv_clk);
- goto i2c_fail;
+ goto err_put_client;
}
/* Enable USB device clock */
retval = clk_prepare_enable(udc->usb_slv_clk);
if (retval < 0) {
dev_err(udc->dev, "failed to start USB device clock\n");
- goto i2c_fail;
+ goto err_put_client;
}
/* Setup deferred workqueue data */
@@ -3165,9 +3165,10 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
i2c_fail:
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
clk_disable_unprepare(udc->usb_slv_clk);
+err_put_client:
+ put_device(&udc->isp1301_i2c_client->dev);
+
dev_err(udc->dev, "%s probe failed, %d\n", driver_name, retval);
return retval;
@@ -3195,10 +3196,9 @@ static void lpc32xx_udc_remove(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
-
clk_disable_unprepare(udc->usb_slv_clk);
+
+ put_device(&udc->isp1301_i2c_client->dev);
}
#ifdef CONFIG_PM
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 782be79e4551550d7a82b1957fc0f7347e6d461f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122943-coveting-geek-8c94@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782be79e4551550d7a82b1957fc0f7347e6d461f Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 18 Dec 2025 16:35:15 +0100
Subject: [PATCH] usb: gadget: lpc32xx_udc: fix clock imbalance in error path
A recent change fixing a device reference leak introduced a clock
imbalance by reusing an error path so that the clock may be disabled
before having been enabled.
Note that the clock framework allows for passing in NULL clocks so there
is no risk for a NULL pointer dereference.
Also drop the bogus I2C client NULL check added by the offending commit
as the pointer has already been verified to be non-NULL.
Fixes: c84117912bdd ("USB: lpc32xx_udc: Fix error handling in probe")
Cc: stable(a)vger.kernel.org
Cc: Ma Ke <make24(a)iscas.ac.cn>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: Vladimir Zapolskiy <vz(a)mleia.com>
Link: https://patch.msgid.link/20251218153519.19453-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/gadget/udc/lpc32xx_udc.c b/drivers/usb/gadget/udc/lpc32xx_udc.c
index 73c0f28a8585..a962d4294fbe 100644
--- a/drivers/usb/gadget/udc/lpc32xx_udc.c
+++ b/drivers/usb/gadget/udc/lpc32xx_udc.c
@@ -3020,7 +3020,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
pdev->dev.dma_mask = &lpc32xx_usbd_dmamask;
retval = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
if (retval)
- goto i2c_fail;
+ goto err_put_client;
udc->board = &lpc32xx_usbddata;
@@ -3040,7 +3040,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
udc->udp_irq[i] = platform_get_irq(pdev, i);
if (udc->udp_irq[i] < 0) {
retval = udc->udp_irq[i];
- goto i2c_fail;
+ goto err_put_client;
}
}
@@ -3048,7 +3048,7 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->udp_baseaddr)) {
dev_err(udc->dev, "IO map failure\n");
retval = PTR_ERR(udc->udp_baseaddr);
- goto i2c_fail;
+ goto err_put_client;
}
/* Get USB device clock */
@@ -3056,14 +3056,14 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
if (IS_ERR(udc->usb_slv_clk)) {
dev_err(udc->dev, "failed to acquire USB device clock\n");
retval = PTR_ERR(udc->usb_slv_clk);
- goto i2c_fail;
+ goto err_put_client;
}
/* Enable USB device clock */
retval = clk_prepare_enable(udc->usb_slv_clk);
if (retval < 0) {
dev_err(udc->dev, "failed to start USB device clock\n");
- goto i2c_fail;
+ goto err_put_client;
}
/* Setup deferred workqueue data */
@@ -3165,9 +3165,10 @@ static int lpc32xx_udc_probe(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
i2c_fail:
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
clk_disable_unprepare(udc->usb_slv_clk);
+err_put_client:
+ put_device(&udc->isp1301_i2c_client->dev);
+
dev_err(udc->dev, "%s probe failed, %d\n", driver_name, retval);
return retval;
@@ -3195,10 +3196,9 @@ static void lpc32xx_udc_remove(struct platform_device *pdev)
dma_free_coherent(&pdev->dev, UDCA_BUFF_SIZE,
udc->udca_v_base, udc->udca_p_base);
- if (udc->isp1301_i2c_client)
- put_device(&udc->isp1301_i2c_client->dev);
-
clk_disable_unprepare(udc->usb_slv_clk);
+
+ put_device(&udc->isp1301_i2c_client->dev);
}
#ifdef CONFIG_PM
On Sat, Dec 27, 2025 at 02:36:44PM -0500, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> usb: usb-storage: No additional quirks need to be added to the EL-R12 optical drive.
>
> to the 6.18-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> usb-usb-storage-no-additional-quirks-need-to-be-adde.patch
> and it can be found in the queue-6.18 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This commit will be modified by commit 0831269b5f71 ("usb: usb-storage:
Maintain minimal modifications to the bcdDevice range.") in Greg's
usb-linus branch, which has not yet been merged into the mainline
kernel. The two commits should be added to the -stable kernels at the
same time, if possible, which probably means holding off on this one
until the next round.
Alan Stern
> commit 4846de73fe267ccffe66a5bff7c7def201c34df1
> Author: Chen Changcheng <chenchangcheng(a)kylinos.cn>
> Date: Fri Nov 21 14:40:20 2025 +0800
>
> usb: usb-storage: No additional quirks need to be added to the EL-R12 optical drive.
>
> [ Upstream commit 955a48a5353f4fe009704a9a4272a3adf627cd35 ]
>
> The optical drive of EL-R12 has the same vid and pid as INIC-3069,
> as follows:
> T: Bus=02 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#= 3 Spd=5000 MxCh= 0
> D: Ver= 3.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1
> P: Vendor=13fd ProdID=3940 Rev= 3.10
> S: Manufacturer=HL-DT-ST
> S: Product= DVD+-RW GT80N
> S: SerialNumber=423349524E4E38303338323439202020
> C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=144mA
> I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=02 Prot=50 Driver=usb-storage
> E: Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
> E: Ad=0a(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
>
> This will result in the optical drive device also adding
> the quirks of US_FL_NO_ATA_1X. When performing an erase operation,
> it will fail, and the reason for the failure is as follows:
> [ 388.967742] sr 5:0:0:0: [sr0] tag#0 Send: scmd 0x00000000d20c33a7
> [ 388.967742] sr 5:0:0:0: [sr0] tag#0 CDB: ATA command pass through(12)/Blank a1 11 00 00 00 00 00 00 00 00 00 00
> [ 388.967773] sr 5:0:0:0: [sr0] tag#0 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s
> [ 388.967773] sr 5:0:0:0: [sr0] tag#0 CDB: ATA command pass through(12)/Blank a1 11 00 00 00 00 00 00 00 00 00 00
> [ 388.967803] sr 5:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
> [ 388.967803] sr 5:0:0:0: [sr0] tag#0 Add. Sense: Invalid field in cdb
> [ 388.967803] sr 5:0:0:0: [sr0] tag#0 scsi host busy 1 failed 0
> [ 388.967803] sr 5:0:0:0: Notifying upper driver of completion (result 8100002)
> [ 388.967834] sr 5:0:0:0: [sr0] tag#0 0 sectors total, 0 bytes done.
>
> For the EL-R12 standard optical drive, all operational commands
> and usage scenarios were tested without adding the IGNORE_RESIDUE quirks,
> and no issues were encountered. It can be reasonably concluded
> that removing the IGNORE_RESIDUE quirks has no impact.
>
> Signed-off-by: Chen Changcheng <chenchangcheng(a)kylinos.cn>
> Link: https://patch.msgid.link/20251121064020.29332-1-chenchangcheng@kylinos.cn
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/usb/storage/unusual_uas.h b/drivers/usb/storage/unusual_uas.h
> index 1477e31d7763..b695f5ba9a40 100644
> --- a/drivers/usb/storage/unusual_uas.h
> +++ b/drivers/usb/storage/unusual_uas.h
> @@ -98,7 +98,7 @@ UNUSUAL_DEV(0x125f, 0xa94a, 0x0160, 0x0160,
> US_FL_NO_ATA_1X),
>
> /* Reported-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com> */
> -UNUSUAL_DEV(0x13fd, 0x3940, 0x0000, 0x9999,
> +UNUSUAL_DEV(0x13fd, 0x3940, 0x0309, 0x0309,
> "Initio Corporation",
> "INIC-3069",
> USB_SC_DEVICE, USB_PR_DEVICE, NULL,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1f73b8b56cf35de29a433aee7bfff26cea98be3f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025120110-deuce-arrange-e66c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1f73b8b56cf35de29a433aee7bfff26cea98be3f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=81ukasz=20Bartosik?= <ukaszb(a)chromium.org>
Date: Wed, 19 Nov 2025 21:29:09 +0000
Subject: [PATCH] xhci: dbgtty: fix device unregister
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When DbC is disconnected then xhci_dbc_tty_unregister_device()
is called. However if there is any user space process blocked
on write to DbC terminal device then it will never be signalled
and thus stay blocked indifinitely.
This fix adds a tty_vhangup() call in xhci_dbc_tty_unregister_device().
The tty_vhangup() wakes up any blocked writers and causes subsequent
write attempts to DbC terminal device to fail.
Cc: stable <stable(a)kernel.org>
Fixes: dfba2174dc42 ("usb: xhci: Add DbC support in xHCI driver")
Signed-off-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Link: https://patch.msgid.link/20251119212910.1245694-1-ukaszb@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index b7f95565524d..57cdda4e09c8 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -550,6 +550,12 @@ static void xhci_dbc_tty_unregister_device(struct xhci_dbc *dbc)
if (!port->registered)
return;
+ /*
+ * Hang up the TTY. This wakes up any blocked
+ * writers and causes subsequent writes to fail.
+ */
+ tty_vhangup(port->port.tty);
+
tty_unregister_device(dbc_tty_driver, port->minor);
xhci_dbc_tty_exit_port(port);
port->registered = false;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2ea6190f42d0416a4310e60a7fcb0b49fcbbd4fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122952-supervise-regulate-a26b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ea6190f42d0416a4310e60a7fcb0b49fcbbd4fb Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Fri, 5 Dec 2025 19:55:16 +0100
Subject: [PATCH] mptcp: schedule rtx timer only after pushing data
The MPTCP protocol usually schedule the retransmission timer only
when there is some chances for such retransmissions to happen.
With a notable exception: __mptcp_push_pending() currently schedule
such timer unconditionally, potentially leading to unnecessary rtx
timer expiration.
The issue is present since the blamed commit below but become easily
reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization
in __mptcp_check_push()")
Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-3-9e4781…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e212c1374bd0..d8a7f7029164 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1623,7 +1623,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
struct mptcp_sendmsg_info info = {
.flags = flags,
};
- bool do_check_data_fin = false;
+ bool copied = false;
int push_count = 1;
while (mptcp_send_head(sk) && (push_count > 0)) {
@@ -1665,7 +1665,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
push_count--;
continue;
}
- do_check_data_fin = true;
+ copied = true;
}
}
}
@@ -1674,11 +1674,14 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
if (ssk)
mptcp_push_release(ssk, &info);
- /* ensure the rtx timer is running */
- if (!mptcp_rtx_timer_pending(sk))
- mptcp_reset_rtx_timer(sk);
- if (do_check_data_fin)
+ /* Avoid scheduling the rtx timer if no data has been pushed; the timer
+ * will be updated on positive acks by __mptcp_cleanup_una().
+ */
+ if (copied) {
+ if (!mptcp_rtx_timer_pending(sk))
+ mptcp_reset_rtx_timer(sk);
mptcp_check_send_data_fin(sk);
+ }
}
static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool first)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2ea6190f42d0416a4310e60a7fcb0b49fcbbd4fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122951-slurp-ascent-4d97@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ea6190f42d0416a4310e60a7fcb0b49fcbbd4fb Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Fri, 5 Dec 2025 19:55:16 +0100
Subject: [PATCH] mptcp: schedule rtx timer only after pushing data
The MPTCP protocol usually schedule the retransmission timer only
when there is some chances for such retransmissions to happen.
With a notable exception: __mptcp_push_pending() currently schedule
such timer unconditionally, potentially leading to unnecessary rtx
timer expiration.
The issue is present since the blamed commit below but become easily
reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization
in __mptcp_check_push()")
Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-3-9e4781…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e212c1374bd0..d8a7f7029164 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1623,7 +1623,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
struct mptcp_sendmsg_info info = {
.flags = flags,
};
- bool do_check_data_fin = false;
+ bool copied = false;
int push_count = 1;
while (mptcp_send_head(sk) && (push_count > 0)) {
@@ -1665,7 +1665,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
push_count--;
continue;
}
- do_check_data_fin = true;
+ copied = true;
}
}
}
@@ -1674,11 +1674,14 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
if (ssk)
mptcp_push_release(ssk, &info);
- /* ensure the rtx timer is running */
- if (!mptcp_rtx_timer_pending(sk))
- mptcp_reset_rtx_timer(sk);
- if (do_check_data_fin)
+ /* Avoid scheduling the rtx timer if no data has been pushed; the timer
+ * will be updated on positive acks by __mptcp_cleanup_una().
+ */
+ if (copied) {
+ if (!mptcp_rtx_timer_pending(sk))
+ mptcp_reset_rtx_timer(sk);
mptcp_check_send_data_fin(sk);
+ }
}
static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool first)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122947-enforced-elective-e753@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Fri, 5 Dec 2025 19:55:15 +0100
Subject: [PATCH] selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index ec6a87588191..123d9d7a0278 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -192,6 +192,10 @@ check "show_endpoints" \
flush_endpoint
check "show_endpoints" "" "flush addrs"
+add_endpoint 10.0.1.1 flags unknown
+check "show_endpoints" "$(format_endpoints "1,10.0.1.1")" "ignore unknown flags"
+flush_endpoint
+
set_limits 9 1 2>/dev/null
check "get_limits" "${default_limits}" "rcv addrs above hard limit"
diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 65b374232ff5..99eecccbf0c8 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -24,6 +24,8 @@
#define IPPROTO_MPTCP 262
#endif
+#define MPTCP_PM_ADDR_FLAG_UNKNOWN _BITUL(7)
+
static void syntax(char *argv[])
{
fprintf(stderr, "%s add|ann|rem|csf|dsf|get|set|del|flush|dump|events|listen|accept [<args>]\n", argv[0]);
@@ -836,6 +838,8 @@ int add_addr(int fd, int pm_family, int argc, char *argv[])
flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
else if (!strcmp(tok, "fullmesh"))
flags |= MPTCP_PM_ADDR_FLAG_FULLMESH;
+ else if (!strcmp(tok, "unknown"))
+ flags |= MPTCP_PM_ADDR_FLAG_UNKNOWN;
else
error(1, errno,
"unknown flag %s", argv[arg]);
@@ -1048,6 +1052,13 @@ static void print_addr(struct rtattr *attrs, int len)
printf(",");
}
+ if (flags & MPTCP_PM_ADDR_FLAG_UNKNOWN) {
+ printf("unknown");
+ flags &= ~MPTCP_PM_ADDR_FLAG_UNKNOWN;
+ if (flags)
+ printf(",");
+ }
+
/* bump unknown flags, if any */
if (flags)
printf("0x%x", flags);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122947-clapping-zookeeper-5d32@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Fri, 5 Dec 2025 19:55:15 +0100
Subject: [PATCH] selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index ec6a87588191..123d9d7a0278 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -192,6 +192,10 @@ check "show_endpoints" \
flush_endpoint
check "show_endpoints" "" "flush addrs"
+add_endpoint 10.0.1.1 flags unknown
+check "show_endpoints" "$(format_endpoints "1,10.0.1.1")" "ignore unknown flags"
+flush_endpoint
+
set_limits 9 1 2>/dev/null
check "get_limits" "${default_limits}" "rcv addrs above hard limit"
diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 65b374232ff5..99eecccbf0c8 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -24,6 +24,8 @@
#define IPPROTO_MPTCP 262
#endif
+#define MPTCP_PM_ADDR_FLAG_UNKNOWN _BITUL(7)
+
static void syntax(char *argv[])
{
fprintf(stderr, "%s add|ann|rem|csf|dsf|get|set|del|flush|dump|events|listen|accept [<args>]\n", argv[0]);
@@ -836,6 +838,8 @@ int add_addr(int fd, int pm_family, int argc, char *argv[])
flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
else if (!strcmp(tok, "fullmesh"))
flags |= MPTCP_PM_ADDR_FLAG_FULLMESH;
+ else if (!strcmp(tok, "unknown"))
+ flags |= MPTCP_PM_ADDR_FLAG_UNKNOWN;
else
error(1, errno,
"unknown flag %s", argv[arg]);
@@ -1048,6 +1052,13 @@ static void print_addr(struct rtattr *attrs, int len)
printf(",");
}
+ if (flags & MPTCP_PM_ADDR_FLAG_UNKNOWN) {
+ printf("unknown");
+ flags &= ~MPTCP_PM_ADDR_FLAG_UNKNOWN;
+ if (flags)
+ printf(",");
+ }
+
/* bump unknown flags, if any */
if (flags)
printf("0x%x", flags);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122946-wharf-satin-c518@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Fri, 5 Dec 2025 19:55:15 +0100
Subject: [PATCH] selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index ec6a87588191..123d9d7a0278 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -192,6 +192,10 @@ check "show_endpoints" \
flush_endpoint
check "show_endpoints" "" "flush addrs"
+add_endpoint 10.0.1.1 flags unknown
+check "show_endpoints" "$(format_endpoints "1,10.0.1.1")" "ignore unknown flags"
+flush_endpoint
+
set_limits 9 1 2>/dev/null
check "get_limits" "${default_limits}" "rcv addrs above hard limit"
diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 65b374232ff5..99eecccbf0c8 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -24,6 +24,8 @@
#define IPPROTO_MPTCP 262
#endif
+#define MPTCP_PM_ADDR_FLAG_UNKNOWN _BITUL(7)
+
static void syntax(char *argv[])
{
fprintf(stderr, "%s add|ann|rem|csf|dsf|get|set|del|flush|dump|events|listen|accept [<args>]\n", argv[0]);
@@ -836,6 +838,8 @@ int add_addr(int fd, int pm_family, int argc, char *argv[])
flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
else if (!strcmp(tok, "fullmesh"))
flags |= MPTCP_PM_ADDR_FLAG_FULLMESH;
+ else if (!strcmp(tok, "unknown"))
+ flags |= MPTCP_PM_ADDR_FLAG_UNKNOWN;
else
error(1, errno,
"unknown flag %s", argv[arg]);
@@ -1048,6 +1052,13 @@ static void print_addr(struct rtattr *attrs, int len)
printf(",");
}
+ if (flags & MPTCP_PM_ADDR_FLAG_UNKNOWN) {
+ printf("unknown");
+ flags &= ~MPTCP_PM_ADDR_FLAG_UNKNOWN;
+ if (flags)
+ printf(",");
+ }
+
/* bump unknown flags, if any */
if (flags)
printf("0x%x", flags);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122946-puppet-visiting-41e4@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Fri, 5 Dec 2025 19:55:15 +0100
Subject: [PATCH] selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index ec6a87588191..123d9d7a0278 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -192,6 +192,10 @@ check "show_endpoints" \
flush_endpoint
check "show_endpoints" "" "flush addrs"
+add_endpoint 10.0.1.1 flags unknown
+check "show_endpoints" "$(format_endpoints "1,10.0.1.1")" "ignore unknown flags"
+flush_endpoint
+
set_limits 9 1 2>/dev/null
check "get_limits" "${default_limits}" "rcv addrs above hard limit"
diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 65b374232ff5..99eecccbf0c8 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -24,6 +24,8 @@
#define IPPROTO_MPTCP 262
#endif
+#define MPTCP_PM_ADDR_FLAG_UNKNOWN _BITUL(7)
+
static void syntax(char *argv[])
{
fprintf(stderr, "%s add|ann|rem|csf|dsf|get|set|del|flush|dump|events|listen|accept [<args>]\n", argv[0]);
@@ -836,6 +838,8 @@ int add_addr(int fd, int pm_family, int argc, char *argv[])
flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
else if (!strcmp(tok, "fullmesh"))
flags |= MPTCP_PM_ADDR_FLAG_FULLMESH;
+ else if (!strcmp(tok, "unknown"))
+ flags |= MPTCP_PM_ADDR_FLAG_UNKNOWN;
else
error(1, errno,
"unknown flag %s", argv[arg]);
@@ -1048,6 +1052,13 @@ static void print_addr(struct rtattr *attrs, int len)
printf(",");
}
+ if (flags & MPTCP_PM_ADDR_FLAG_UNKNOWN) {
+ printf("unknown");
+ flags &= ~MPTCP_PM_ADDR_FLAG_UNKNOWN;
+ if (flags)
+ printf(",");
+ }
+
/* bump unknown flags, if any */
if (flags)
printf("0x%x", flags);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122945-litigator-machinist-2f66@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 29f4801e9c8dfd12bdcb33b61a6ac479c7162bd7 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Fri, 5 Dec 2025 19:55:15 +0100
Subject: [PATCH] selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index ec6a87588191..123d9d7a0278 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -192,6 +192,10 @@ check "show_endpoints" \
flush_endpoint
check "show_endpoints" "" "flush addrs"
+add_endpoint 10.0.1.1 flags unknown
+check "show_endpoints" "$(format_endpoints "1,10.0.1.1")" "ignore unknown flags"
+flush_endpoint
+
set_limits 9 1 2>/dev/null
check "get_limits" "${default_limits}" "rcv addrs above hard limit"
diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 65b374232ff5..99eecccbf0c8 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -24,6 +24,8 @@
#define IPPROTO_MPTCP 262
#endif
+#define MPTCP_PM_ADDR_FLAG_UNKNOWN _BITUL(7)
+
static void syntax(char *argv[])
{
fprintf(stderr, "%s add|ann|rem|csf|dsf|get|set|del|flush|dump|events|listen|accept [<args>]\n", argv[0]);
@@ -836,6 +838,8 @@ int add_addr(int fd, int pm_family, int argc, char *argv[])
flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
else if (!strcmp(tok, "fullmesh"))
flags |= MPTCP_PM_ADDR_FLAG_FULLMESH;
+ else if (!strcmp(tok, "unknown"))
+ flags |= MPTCP_PM_ADDR_FLAG_UNKNOWN;
else
error(1, errno,
"unknown flag %s", argv[arg]);
@@ -1048,6 +1052,13 @@ static void print_addr(struct rtattr *attrs, int len)
printf(",");
}
+ if (flags & MPTCP_PM_ADDR_FLAG_UNKNOWN) {
+ printf("unknown");
+ flags &= ~MPTCP_PM_ADDR_FLAG_UNKNOWN;
+ if (flags)
+ printf(",");
+ }
+
/* bump unknown flags, if any */
if (flags)
printf("0x%x", flags);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 8a0e4bdddd1c998b894d879a1d22f1e745606215
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122926-sizably-bountiful-a98b@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a0e4bdddd1c998b894d879a1d22f1e745606215 Mon Sep 17 00:00:00 2001
From: Wei Yang <richard.weiyang(a)gmail.com>
Date: Thu, 6 Nov 2025 03:41:55 +0000
Subject: [PATCH] mm/huge_memory: merge uniform_split_supported() and
non_uniform_split_supported()
uniform_split_supported() and non_uniform_split_supported() share
significantly similar logic.
The only functional difference is that uniform_split_supported() includes
an additional check on the requested @new_order.
The reason for this check comes from the following two aspects:
* some file system or swap cache just supports order-0 folio
* the behavioral difference between uniform/non-uniform split
The behavioral difference between uniform split and non-uniform:
* uniform split splits folio directly to @new_order
* non-uniform split creates after-split folios with orders from
folio_order(folio) - 1 to new_order.
This means for non-uniform split or !new_order split we should check the
file system and swap cache respectively.
This commit unifies the logic and merge the two functions into a single
combined helper, removing redundant code and simplifying the split
support checking mechanism.
Link: https://lkml.kernel.org/r/20251106034155.21398-3-richard.weiyang@gmail.com
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: "David Hildenbrand (Red Hat)" <david(a)kernel.org>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Lance Yang <lance.yang(a)linux.dev>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Nico Pache <npache(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b74708dc5b5f..19d4a5f52ca2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -374,10 +374,8 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
unsigned int new_order, bool unmapped);
int min_order_for_split(struct folio *folio);
int split_folio_to_list(struct folio *folio, struct list_head *list);
-bool uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns);
-bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns);
+bool folio_split_supported(struct folio *folio, unsigned int new_order,
+ enum split_type split_type, bool warns);
int folio_split(struct folio *folio, unsigned int new_order, struct page *page,
struct list_head *list);
@@ -408,7 +406,7 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
static inline int try_folio_split_to_order(struct folio *folio,
struct page *page, unsigned int new_order)
{
- if (!non_uniform_split_supported(folio, new_order, /* warns= */ false))
+ if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
return split_huge_page_to_order(&folio->page, new_order);
return folio_split(folio, new_order, page, NULL);
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4118f330c55e..d79a4bb363de 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3593,8 +3593,8 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
return 0;
}
-bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns)
+bool folio_split_supported(struct folio *folio, unsigned int new_order,
+ enum split_type split_type, bool warns)
{
if (folio_test_anon(folio)) {
/* order-1 is not supported for anonymous THP. */
@@ -3602,48 +3602,41 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order,
"Cannot split to order-1 folio");
if (new_order == 1)
return false;
- } else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
- !mapping_large_folio_support(folio->mapping)) {
- /*
- * No split if the file system does not support large folio.
- * Note that we might still have THPs in such mappings due to
- * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
- * does not actually support large folios properly.
- */
- VM_WARN_ONCE(warns,
- "Cannot split file folio to non-0 order");
- return false;
- }
-
- /* Only swapping a whole PMD-mapped folio is supported */
- if (folio_test_swapcache(folio)) {
- VM_WARN_ONCE(warns,
- "Cannot split swapcache folio to non-0 order");
- return false;
- }
-
- return true;
-}
-
-/* See comments in non_uniform_split_supported() */
-bool uniform_split_supported(struct folio *folio, unsigned int new_order,
- bool warns)
-{
- if (folio_test_anon(folio)) {
- VM_WARN_ONCE(warns && new_order == 1,
- "Cannot split to order-1 folio");
- if (new_order == 1)
- return false;
- } else if (new_order) {
+ } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
!mapping_large_folio_support(folio->mapping)) {
+ /*
+ * We can always split a folio down to a single page
+ * (new_order == 0) uniformly.
+ *
+ * For any other scenario
+ * a) uniform split targeting a large folio
+ * (new_order > 0)
+ * b) any non-uniform split
+ * we must confirm that the file system supports large
+ * folios.
+ *
+ * Note that we might still have THPs in such
+ * mappings, which is created from khugepaged when
+ * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
+ * case, the mapping does not actually support large
+ * folios properly.
+ */
VM_WARN_ONCE(warns,
"Cannot split file folio to non-0 order");
return false;
}
}
- if (new_order && folio_test_swapcache(folio)) {
+ /*
+ * swapcache folio could only be split to order 0
+ *
+ * non-uniform split creates after-split folios with orders from
+ * folio_order(folio) - 1 to new_order, making it not suitable for any
+ * swapcache folio split. Only uniform split to order-0 can be used
+ * here.
+ */
+ if ((split_type == SPLIT_TYPE_NON_UNIFORM || new_order) && folio_test_swapcache(folio)) {
VM_WARN_ONCE(warns,
"Cannot split swapcache folio to non-0 order");
return false;
@@ -3711,11 +3704,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
if (new_order >= old_order)
return -EINVAL;
- if (split_type == SPLIT_TYPE_UNIFORM && !uniform_split_supported(folio, new_order, true))
- return -EINVAL;
-
- if (split_type == SPLIT_TYPE_NON_UNIFORM &&
- !non_uniform_split_supported(folio, new_order, true))
+ if (!folio_split_supported(folio, new_order, split_type, /* warn = */ true))
return -EINVAL;
is_hzp = is_huge_zero_folio(folio);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 935a20d1bebf6236076785fac3ff81e3931834e9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122946-opt-excuse-6662@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 935a20d1bebf6236076785fac3ff81e3931834e9 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bvanassche(a)acm.org>
Date: Fri, 14 Nov 2025 13:04:07 -0800
Subject: [PATCH] block: Remove queue freezing from several sysfs store
callbacks
Freezing the request queue from inside sysfs store callbacks may cause a
deadlock in combination with the dm-multipath driver and the
queue_if_no_path option. Additionally, freezing the request queue slows
down system boot on systems where sysfs attributes are set synchronously.
Fix this by removing the blk_mq_freeze_queue() / blk_mq_unfreeze_queue()
calls from the store callbacks that do not strictly need these callbacks.
Add the __data_racy annotation to request_queue.rq_timeout to suppress
KCSAN data race reports about the rq_timeout reads.
This patch may cause a small delay in applying the new settings.
For all the attributes affected by this patch, I/O will complete
correctly whether the old or the new value of the attribute is used.
This patch affects the following sysfs attributes:
* io_poll_delay
* io_timeout
* nomerges
* read_ahead_kb
* rq_affinity
Here is an example of a deadlock triggered by running test srp/002
if this patch is not applied:
task:multipathd
Call Trace:
<TASK>
__schedule+0x8c1/0x1bf0
schedule+0xdd/0x270
schedule_preempt_disabled+0x1c/0x30
__mutex_lock+0xb89/0x1650
mutex_lock_nested+0x1f/0x30
dm_table_set_restrictions+0x823/0xdf0
__bind+0x166/0x590
dm_swap_table+0x2a7/0x490
do_resume+0x1b1/0x610
dev_suspend+0x55/0x1a0
ctl_ioctl+0x3a5/0x7e0
dm_ctl_ioctl+0x12/0x20
__x64_sys_ioctl+0x127/0x1a0
x64_sys_call+0xe2b/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
task:(udev-worker)
Call Trace:
<TASK>
__schedule+0x8c1/0x1bf0
schedule+0xdd/0x270
blk_mq_freeze_queue_wait+0xf2/0x140
blk_mq_freeze_queue_nomemsave+0x23/0x30
queue_ra_store+0x14e/0x290
queue_attr_store+0x23e/0x2c0
sysfs_kf_write+0xde/0x140
kernfs_fop_write_iter+0x3b2/0x630
vfs_write+0x4fd/0x1390
ksys_write+0xfd/0x230
__x64_sys_write+0x76/0xc0
x64_sys_call+0x276/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Nilay Shroff <nilay(a)linux.ibm.com>
Cc: Martin Wilck <mwilck(a)suse.com>
Cc: Benjamin Marzinski <bmarzins(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: af2814149883 ("block: freeze the queue in queue_attr_store")
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
Reviewed-by: Nilay Shroff <nilay(a)linux.ibm.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 76c47fe9b8d6..8684c57498cc 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -143,21 +143,22 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count)
{
unsigned long ra_kb;
ssize_t ret;
- unsigned int memflags;
struct request_queue *q = disk->queue;
ret = queue_var_store(&ra_kb, page, count);
if (ret < 0)
return ret;
/*
- * ->ra_pages is protected by ->limits_lock because it is usually
- * calculated from the queue limits by queue_limits_commit_update.
+ * The ->ra_pages change below is protected by ->limits_lock because it
+ * is usually calculated from the queue limits by
+ * queue_limits_commit_update().
+ *
+ * bdi->ra_pages reads are not serialized against bdi->ra_pages writes.
+ * Use WRITE_ONCE() to write bdi->ra_pages once.
*/
mutex_lock(&q->limits_lock);
- memflags = blk_mq_freeze_queue(q);
- disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10);
+ WRITE_ONCE(disk->bdi->ra_pages, ra_kb >> (PAGE_SHIFT - 10));
mutex_unlock(&q->limits_lock);
- blk_mq_unfreeze_queue(q, memflags);
return ret;
}
@@ -375,21 +376,18 @@ static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page,
size_t count)
{
unsigned long nm;
- unsigned int memflags;
struct request_queue *q = disk->queue;
ssize_t ret = queue_var_store(&nm, page, count);
if (ret < 0)
return ret;
- memflags = blk_mq_freeze_queue(q);
blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q);
blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q);
if (nm == 2)
blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q);
else if (nm)
blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
- blk_mq_unfreeze_queue(q, memflags);
return ret;
}
@@ -409,7 +407,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
#ifdef CONFIG_SMP
struct request_queue *q = disk->queue;
unsigned long val;
- unsigned int memflags;
ret = queue_var_store(&val, page, count);
if (ret < 0)
@@ -421,7 +418,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
* are accessed individually using atomic test_bit operation. So we
* don't grab any lock while updating these flags.
*/
- memflags = blk_mq_freeze_queue(q);
if (val == 2) {
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q);
@@ -432,7 +428,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
blk_queue_flag_clear(QUEUE_FLAG_SAME_COMP, q);
blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
}
- blk_mq_unfreeze_queue(q, memflags);
#endif
return ret;
}
@@ -446,11 +441,9 @@ static ssize_t queue_poll_delay_store(struct gendisk *disk, const char *page,
static ssize_t queue_poll_store(struct gendisk *disk, const char *page,
size_t count)
{
- unsigned int memflags;
ssize_t ret = count;
struct request_queue *q = disk->queue;
- memflags = blk_mq_freeze_queue(q);
if (!(q->limits.features & BLK_FEAT_POLL)) {
ret = -EINVAL;
goto out;
@@ -459,7 +452,6 @@ static ssize_t queue_poll_store(struct gendisk *disk, const char *page,
pr_info_ratelimited("writes to the poll attribute are ignored.\n");
pr_info_ratelimited("please use driver specific parameters instead.\n");
out:
- blk_mq_unfreeze_queue(q, memflags);
return ret;
}
@@ -472,7 +464,7 @@ static ssize_t queue_io_timeout_show(struct gendisk *disk, char *page)
static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page,
size_t count)
{
- unsigned int val, memflags;
+ unsigned int val;
int err;
struct request_queue *q = disk->queue;
@@ -480,9 +472,7 @@ static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page,
if (err || val == 0)
return -EINVAL;
- memflags = blk_mq_freeze_queue(q);
blk_queue_rq_timeout(q, msecs_to_jiffies(val));
- blk_mq_unfreeze_queue(q, memflags);
return count;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2fff8a80dbd2..cb4ba09959ee 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -495,7 +495,7 @@ struct request_queue {
*/
unsigned long queue_flags;
- unsigned int rq_timeout;
+ unsigned int __data_racy rq_timeout;
unsigned int queue_depth;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ee5a977b4e771cc181f39d504426dbd31ed701cc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122958-ambitious-prenatal-99c2@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee5a977b4e771cc181f39d504426dbd31ed701cc Mon Sep 17 00:00:00 2001
From: Fedor Pchelkin <pchelkin(a)ispras.ru>
Date: Sat, 1 Nov 2025 19:04:28 +0300
Subject: [PATCH] ext4: fix string copying in parse_apply_sb_mount_options()
strscpy_pad() can't be used to copy a non-NUL-term string into a NUL-term
string of possibly bigger size. Commit 0efc5990bca5 ("string.h: Introduce
memtostr() and memtostr_pad()") provides additional information in that
regard. So if this happens, the following warning is observed:
strnlen: detected buffer overflow: 65 byte read of buffer size 64
WARNING: CPU: 0 PID: 28655 at lib/string_helpers.c:1032 __fortify_report+0x96/0xc0 lib/string_helpers.c:1032
Modules linked in:
CPU: 0 UID: 0 PID: 28655 Comm: syz-executor.3 Not tainted 6.12.54-syzkaller-00144-g5f0270f1ba00 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:__fortify_report+0x96/0xc0 lib/string_helpers.c:1032
Call Trace:
<TASK>
__fortify_panic+0x1f/0x30 lib/string_helpers.c:1039
strnlen include/linux/fortify-string.h:235 [inline]
sized_strscpy include/linux/fortify-string.h:309 [inline]
parse_apply_sb_mount_options fs/ext4/super.c:2504 [inline]
__ext4_fill_super fs/ext4/super.c:5261 [inline]
ext4_fill_super+0x3c35/0xad00 fs/ext4/super.c:5706
get_tree_bdev_flags+0x387/0x620 fs/super.c:1636
vfs_get_tree+0x93/0x380 fs/super.c:1814
do_new_mount fs/namespace.c:3553 [inline]
path_mount+0x6ae/0x1f70 fs/namespace.c:3880
do_mount fs/namespace.c:3893 [inline]
__do_sys_mount fs/namespace.c:4103 [inline]
__se_sys_mount fs/namespace.c:4080 [inline]
__x64_sys_mount+0x280/0x300 fs/namespace.c:4080
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x64/0x140 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Since userspace is expected to provide s_mount_opts field to be at most 63
characters long with the ending byte being NUL-term, use a 64-byte buffer
which matches the size of s_mount_opts, so that strscpy_pad() does its job
properly. Return with error if the user still managed to provide a
non-NUL-term string here.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 8ecb790ea8c3 ("ext4: avoid potential buffer over-read in parse_apply_sb_mount_options()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
Reviewed-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Message-ID: <20251101160430.222297-1-pchelkin(a)ispras.ru>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7de15249e826..d1ba894c0e0a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2476,7 +2476,7 @@ static int parse_apply_sb_mount_options(struct super_block *sb,
struct ext4_fs_context *m_ctx)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
- char s_mount_opts[65];
+ char s_mount_opts[64];
struct ext4_fs_context *s_ctx = NULL;
struct fs_context *fc = NULL;
int ret = -ENOMEM;
@@ -2484,7 +2484,8 @@ static int parse_apply_sb_mount_options(struct super_block *sb,
if (!sbi->s_es->s_mount_opts[0])
return 0;
- strscpy_pad(s_mount_opts, sbi->s_es->s_mount_opts);
+ if (strscpy_pad(s_mount_opts, sbi->s_es->s_mount_opts) < 0)
+ return -E2BIG;
fc = kzalloc(sizeof(struct fs_context), GFP_KERNEL);
if (!fc)
Hi Greg,
On 2025-12-29, <gregkh(a)linuxfoundation.org> wrote:
> This is a note to let you know that I've just added the patch titled
>
> printk: Avoid scheduling irq_work on suspend
>
> to the 6.18-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> printk-avoid-scheduling-irq_work-on-suspend.patch
> and it can be found in the queue-6.18 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This patch should be accompanied with the preceeding commit:
d01ff281bd9b ("printk: Allow printk_trigger_flush() to flush all types")
and the later commit:
66e7c1e0ee08 ("printk: Avoid irq_work for printk_deferred() on suspend")
in order to completely avoid irq_work triggering during suspend for all
possible call paths.
John Ogness
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1f73b8b56cf35de29a433aee7bfff26cea98be3f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025120110-coastal-litigator-8952@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1f73b8b56cf35de29a433aee7bfff26cea98be3f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=81ukasz=20Bartosik?= <ukaszb(a)chromium.org>
Date: Wed, 19 Nov 2025 21:29:09 +0000
Subject: [PATCH] xhci: dbgtty: fix device unregister
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When DbC is disconnected then xhci_dbc_tty_unregister_device()
is called. However if there is any user space process blocked
on write to DbC terminal device then it will never be signalled
and thus stay blocked indifinitely.
This fix adds a tty_vhangup() call in xhci_dbc_tty_unregister_device().
The tty_vhangup() wakes up any blocked writers and causes subsequent
write attempts to DbC terminal device to fail.
Cc: stable <stable(a)kernel.org>
Fixes: dfba2174dc42 ("usb: xhci: Add DbC support in xHCI driver")
Signed-off-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Link: https://patch.msgid.link/20251119212910.1245694-1-ukaszb@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index b7f95565524d..57cdda4e09c8 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -550,6 +550,12 @@ static void xhci_dbc_tty_unregister_device(struct xhci_dbc *dbc)
if (!port->registered)
return;
+ /*
+ * Hang up the TTY. This wakes up any blocked
+ * writers and causes subsequent writes to fail.
+ */
+ tty_vhangup(port->port.tty);
+
tty_unregister_device(dbc_tty_driver, port->minor);
xhci_dbc_tty_exit_port(port);
port->registered = false;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 9f769637a93fac81689b80df6855f545839cf999
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122940-sneak-unvocal-9de2@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9f769637a93fac81689b80df6855f545839cf999 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj(a)kernel.org>
Date: Tue, 9 Dec 2025 11:04:33 -1000
Subject: [PATCH] sched_ext: Fix bypass depth leak on scx_enable() failure
scx_enable() calls scx_bypass(true) to initialize in bypass mode and then
scx_bypass(false) on success to exit. If scx_enable() fails during task
initialization - e.g. scx_cgroup_init() or scx_init_task() returns an error -
it jumps to err_disable while bypass is still active. scx_disable_workfn()
then calls scx_bypass(true/false) for its own bypass, leaving the bypass depth
at 1 instead of 0. This causes the system to remain permanently in bypass mode
after a failed scx_enable().
Failures after task initialization is complete - e.g. scx_tryset_enable_state()
at the end - already call scx_bypass(false) before reaching the error path and
are not affected. This only affects a subset of failure modes.
Fix it by tracking whether scx_enable() called scx_bypass(true) in a bool and
having scx_disable_workfn() call an extra scx_bypass(false) to clear it. This
is a temporary measure as the bypass depth will be moved into the sched
instance, which will make this tracking unnecessary.
Fixes: 8c2090c504e9 ("sched_ext: Initialize in bypass mode")
Cc: stable(a)vger.kernel.org # v6.12+
Reported-by: Chris Mason <clm(a)meta.com>
Reviewed-by: Emil Tsalapatis <emil(a)etsalapatis.com>
Link: https://lore.kernel.org/stable/286e6f7787a81239e1ce2989b52391ce%40kernel.org
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index bd74b371f52d..c4465ccefea4 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -41,6 +41,13 @@ static bool scx_init_task_enabled;
static bool scx_switching_all;
DEFINE_STATIC_KEY_FALSE(__scx_switched_all);
+/*
+ * Tracks whether scx_enable() called scx_bypass(true). Used to balance bypass
+ * depth on enable failure. Will be removed when bypass depth is moved into the
+ * sched instance.
+ */
+static bool scx_bypassed_for_enable;
+
static atomic_long_t scx_nr_rejected = ATOMIC_LONG_INIT(0);
static atomic_long_t scx_hotplug_seq = ATOMIC_LONG_INIT(0);
@@ -4318,6 +4325,11 @@ static void scx_disable_workfn(struct kthread_work *work)
scx_dsp_max_batch = 0;
free_kick_syncs();
+ if (scx_bypassed_for_enable) {
+ scx_bypassed_for_enable = false;
+ scx_bypass(false);
+ }
+
mutex_unlock(&scx_enable_mutex);
WARN_ON_ONCE(scx_set_enable_state(SCX_DISABLED) != SCX_DISABLING);
@@ -4970,6 +4982,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
* Init in bypass mode to guarantee forward progress.
*/
scx_bypass(true);
+ scx_bypassed_for_enable = true;
for (i = SCX_OPI_NORMAL_BEGIN; i < SCX_OPI_NORMAL_END; i++)
if (((void (**)(void))ops)[i])
@@ -5067,6 +5080,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
scx_task_iter_stop(&sti);
percpu_up_write(&scx_fork_rwsem);
+ scx_bypassed_for_enable = false;
scx_bypass(false);
if (!scx_tryset_enable_state(SCX_ENABLED, SCX_ENABLING)) {
When ECAM is enabled, the driver skipped calling dw_pcie_iatu_setup()
before configuring ECAM iATU entries. This left IO and MEM outbound
windows unprogrammed, resulting in broken IO transactions. Additionally,
dw_pcie_config_ecam_iatu() was only called during host initialization,
so ECAM-related iATU entries were not restored after suspend/resume,
leading to failures in configuration space access.
To resolve these issues, the ECAM iATU configuration is moved into
dw_pcie_setup_rc(). At the same time, dw_pcie_iatu_setup() is invoked
when ECAM is enabled.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
---
Changes in v2:
- Fixed the index 0 of the ATU window skipping.
- Keep the ob_atu_index in dw_pcie instead of dw_pcie_rp & couple of nitpicks (Bjorn).
- Link to v1: https://lore.kernel.org/r/20251203-ecam_io_fix-v1-0-5cc3d3769c18@oss.qualco…
---
Krishna Chaitanya Chundru (3):
PCI: dwc: Fix skipped index 0 in outbound ATU setup
PCI: dwc: Correct iATU index increment for MSG TLP region
PCI: dwc: Fix missing iATU setup when ECAM is enabled
drivers/pci/controller/dwc/pcie-designware-host.c | 53 ++++++++++++++---------
drivers/pci/controller/dwc/pcie-designware.c | 3 ++
drivers/pci/controller/dwc/pcie-designware.h | 2 +-
3 files changed, 37 insertions(+), 21 deletions(-)
---
base-commit: 3f9f0252130e7dd60d41be0802bf58f6471c691d
change-id: 20251203-ecam_io_fix-6e060fecd3b8
Best regards,
--
Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
The driver trusts the 'num' and 'entry_size' fields read from BAR2 and
uses them directly to compute the length for memcpy_fromio() without
any bounds checking. If these fields get corrupted or otherwise contain
invalid values, num * entry_size can exceed the size of
proc_mon_info.entries and lead to a potential out-of-bounds write.
Add validation for 'entry_size' by ensuring it is non-zero and that
num * entry_size does not exceed the size of proc_mon_info.entries.
Fixes: ff428d052b3b ("misc: bcm-vk: add get_card_info, peerlog_info, and proc_mon_info")
Cc: stable(a)vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244(a)gmail.com>
---
drivers/misc/bcm-vk/bcm_vk_dev.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c b/drivers/misc/bcm-vk/bcm_vk_dev.c
index a16b99bdaa13..a4a74c10f02b 100644
--- a/drivers/misc/bcm-vk/bcm_vk_dev.c
+++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
@@ -439,6 +439,7 @@ static void bcm_vk_get_proc_mon_info(struct bcm_vk *vk)
struct device *dev = &vk->pdev->dev;
struct bcm_vk_proc_mon_info *mon = &vk->proc_mon_info;
u32 num, entry_size, offset, buf_size;
+ size_t max_bytes;
u8 *dst;
/* calculate offset which is based on peerlog offset */
@@ -458,6 +459,9 @@ static void bcm_vk_get_proc_mon_info(struct bcm_vk *vk)
num, BCM_VK_PROC_MON_MAX);
return;
}
+ if (!entry_size || (size_t)num > max_bytes / entry_size) {
+ return;
+ }
mon->num = num;
mon->entry_size = entry_size;
--
2.43.0
Here are some device tree updates to improve sound output on RK3576
boards.
The first two patches fix analog audio output on FriendlyElec NanoPi M5,
as it doesn't work with the current device tree.
The third one is purely cosmetic, to present a more user-friendly sound
card name to the userspace on NanoPi M5.
The rest add new functionality: HDMI sound output on three boards that
didn't enable it, and analog sound on RK3576 EVB1.
Signed-off-by: Alexey Charkov <alchark(a)gmail.com>
---
Alexey Charkov (7):
arm64: dts: rockchip: Fix headphones widget name on NanoPi M5
arm64: dts: rockchip: Configure MCLK for analog sound on NanoPi M5
arm64: dts: rockchip: Use a readable audio card name on NanoPi M5
arm64: dts: rockchip: Enable HDMI sound on FriendlyElec NanoPi M5
arm64: dts: rockchip: Enable HDMI sound on Luckfox Core3576
arm64: dts: rockchip: Enable HDMI sound on RK3576 EVB1
arm64: dts: rockchip: Enable analog sound on RK3576 EVB1
arch/arm64/boot/dts/rockchip/rk3576-evb1-v10.dts | 107 +++++++++++++++++++++
.../boot/dts/rockchip/rk3576-luckfox-core3576.dtsi | 8 ++
arch/arm64/boot/dts/rockchip/rk3576-nanopi-m5.dts | 22 ++++-
3 files changed, 132 insertions(+), 5 deletions(-)
---
base-commit: cc3aa43b44bdb43dfbac0fcb51c56594a11338a8
change-id: 20251222-rk3576-sound-0c26e3b16924
Best regards,
--
Alexey Charkov <alchark(a)gmail.com>
When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>", the physical range that contains the carried
over IMA measurement list may fall outside the truncated RAM leading to
a kernel panic.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) – not-present page
Other architectures already validate the range with page_is_ram(), as
done in commit: cbf9c4b9617b ("of: check previous kernel's
ima-kexec-buffer against memory bounds") do a similar check on x86.
Cc: stable(a)vger.kernel.org
Fixes: b69a2afd5afc ("x86/kexec: Carry forward IMA measurement log on kexec")
Reported-by: Paul Webb <paul.x.webb(a)oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
---
Have tested the kexec for x86 kernel with IMA_KEXEC enabled and the
above patch works good. Paul initially reported this on 6.12 kernel but
I was able to reproduce this on 6.18, so I tried replicating how this
was fixed in drivers/of/kexec.c
---
arch/x86/kernel/setup.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1b2edd07a3e1..fcef197d180e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -439,9 +439,23 @@ int __init ima_free_kexec_buffer(void)
int __init ima_get_kexec_buffer(void **addr, size_t *size)
{
+ unsigned long start_pfn, end_pfn;
+
if (!ima_kexec_buffer_size)
return -ENOENT;
+ /*
+ * Calculate the PFNs for the buffer and ensure
+ * they are with in addressable memory.
+ */
+ start_pfn = PFN_DOWN(ima_kexec_buffer_phys);
+ end_pfn = PFN_DOWN(ima_kexec_buffer_phys + ima_kexec_buffer_size - 1);
+ if (!pfn_range_is_mapped(start_pfn, end_pfn)) {
+ pr_warn("IMA buffer at 0x%llx, size = 0x%zx beyond memory\n",
+ ima_kexec_buffer_phys, ima_kexec_buffer_size);
+ return -EINVAL;
+ }
+
*addr = __va(ima_kexec_buffer_phys);
*size = ima_kexec_buffer_size;
--
2.50.1
From: Wen Yang <wen.yang(a)linux.dev>
do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a
WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function.
Since do_softirq_post_smp_call_flush() is called with preempt disabled,
raising a SOFTIRQ during flush_smp_call_function_queue() can lead to
longer preempt disabled sections.
RPS distributes network processing load across CPUs by enqueuing
packets on a remote CPU's backlog and raising NET_RX_SOFTIRQ to
process them.
The following patches fixes this issue.
Sebastian Andrzej Siewior (2):
net: Remove conditional threaded-NAPI wakeup based on task state.
net: Allow to use SMP threads for backlog NAPI.
net/core/dev.c | 162 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 115 insertions(+), 47 deletions(-)
--
2.25.1
From: Alexis Lothoré <alexis.lothore(a)bootlin.com>
If the ptp_rate recorded earlier in the driver happens to be 0, this
bogus value will propagate up to EST configuration, where it will
trigger a division by 0.
Prevent this division by 0 by adding the corresponding check and error
code.
Suggested-by: Maxime Chevallier <maxime.chevallier(a)bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
Fixes: 8572aec3d0dc ("net: stmmac: Add basic EST support for XGMAC")
Link: https://patch.msgid.link/20250529-stmmac_tstamp_div-v4-2-d73340a794d5@bootl…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[ The context change is due to the commit c3f3b97238f6
("net: stmmac: Refactor EST implementation")
which is irrelevant to the logic of this patch. ]
Signed-off-by: Rahul Sharma <black.hawk(a)163.com>
---
drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 5 +++++
drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
index 8fd167501fa0..0afd4644a985 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
@@ -597,6 +597,11 @@ int dwmac5_est_configure(void __iomem *ioaddr, struct stmmac_est *cfg,
int i, ret = 0x0;
u32 ctrl;
+ if (!ptp_rate) {
+ pr_warn("Dwmac5: Invalid PTP rate");
+ return -EINVAL;
+ }
+
ret |= dwmac5_est_write(ioaddr, BTR_LOW, cfg->btr[0], false);
ret |= dwmac5_est_write(ioaddr, BTR_HIGH, cfg->btr[1], false);
ret |= dwmac5_est_write(ioaddr, TER, cfg->ter, false);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
index 0bcb378fa0bc..aab02328a613 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
@@ -1537,6 +1537,11 @@ static int dwxgmac3_est_configure(void __iomem *ioaddr, struct stmmac_est *cfg,
int i, ret = 0x0;
u32 ctrl;
+ if (!ptp_rate) {
+ pr_warn("Dwxgmac2: Invalid PTP rate");
+ return -EINVAL;
+ }
+
ret |= dwxgmac3_est_write(ioaddr, XGMAC_BTR_LOW, cfg->btr[0], false);
ret |= dwxgmac3_est_write(ioaddr, XGMAC_BTR_HIGH, cfg->btr[1], false);
ret |= dwxgmac3_est_write(ioaddr, XGMAC_TER, cfg->ter, false);
--
2.34.1
The patch titled
Subject: mm: add missing static initializer for init_mm::mm_cid.lock
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-add-missing-static-initializer-for-init_mm-mm_cidlock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm: add missing static initializer for init_mm::mm_cid.lock
Date: Wed, 24 Dec 2025 12:33:56 -0500
Initialize the mm_cid.lock struct member of init_mm.
Link: https://lkml.kernel.org/r/20251224173358.647691-2-mathieu.desnoyers@efficio…
Fixes: 8cea569ca785 ("sched/mmcid: Use proper data structures")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Christan K��nig <christian.koenig(a)amd.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: "Liam R . Howlett" <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Martin Liu <liumartin(a)google.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sweet Tea Dorminy <sweettea-kernel(a)dorminy.me>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/init-mm.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/init-mm.c~mm-add-missing-static-initializer-for-init_mm-mm_cidlock
+++ a/mm/init-mm.c
@@ -44,6 +44,9 @@ struct mm_struct init_mm = {
.mm_lock_seq = SEQCNT_ZERO(init_mm.mm_lock_seq),
#endif
.user_ns = &init_user_ns,
+#ifdef CONFIG_SCHED_MM_CID
+ .mm_cid.lock = __RAW_SPIN_LOCK_UNLOCKED(init_mm.mm_cid.lock),
+#endif
.cpu_bitmap = CPU_BITS_NONE,
INIT_MM_CONTEXT(init_mm)
};
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
mm-add-missing-static-initializer-for-init_mm-mm_cidlock.patch
mm-rename-cpu_bitmap-field-to-flexible_array.patch
mm-take-into-account-mm_cid-size-for-mm_struct-static-definitions.patch
tsacct-skip-all-kernel-threads.patch
lib-introduce-hierarchical-per-cpu-counters.patch
mm-fix-oom-killer-inaccuracy-on-large-many-core-systems.patch
mm-implement-precise-oom-killer-task-selection.patch
The patch titled
Subject: mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: "David Hildenbrand (Red Hat)" <david(a)kernel.org>
Subject: mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
Date: Tue, 23 Dec 2025 22:40:37 +0100
As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix
huge_pmd_unshare() vs GUP-fast race") we can end up in some situations
where we perform so many IPI broadcasts when unsharing hugetlb PMD page
tables that it severely regresses some workloads.
In particular, when we fork()+exit(), or when we munmap() a large
area backed by many shared PMD tables, we perform one IPI broadcast per
unshared PMD table.
There are two optimizations to be had:
(1) When we process (unshare) multiple such PMD tables, such as during
exit(), it is sufficient to send a single IPI broadcast (as long as
we respect locking rules) instead of one per PMD table.
Locking prevents that any of these PMD tables could get reused before
we drop the lock.
(2) When we are not the last sharer (> 2 users including us), there is
no need to send the IPI broadcast. The shared PMD tables cannot
become exclusive (fully unshared) before an IPI will be broadcasted
by the last sharer.
Concurrent GUP-fast could walk into a PMD table just before we
unshared it. It could then succeed in grabbing a page from the
shared page table even after munmap() etc succeeded (and supressed
an IPI). But there is not difference compared to GUP-fast just
sleeping for a while after grabbing the page and re-enabling IRQs.
Most importantly, GUP-fast will never walk into page tables that are
no-longer shared, because the last sharer will issue an IPI
broadcast.
(if ever required, checking whether the PUD changed in GUP-fast
after grabbing the page like we do in the PTE case could handle
this)
So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather
infrastructure so we can implement these optimizations and demystify the
code at least a bit. Extend the mmu_gather infrastructure to be able to
deal with our special hugetlb PMD table sharing implementation.
To make initialization of the mmu_gather easier when working on a single
VMA (in particular, when dealing with hugetlb), provide
tlb_gather_mmu_vma().
We'll consolidate the handling for (full) unsharing of PMD tables in
tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track
in "struct mmu_gather" whether we had (full) unsharing of PMD tables.
Because locking is very special (concurrent unsharing+reuse must be
prevented), we disallow deferring flushing to tlb_finish_mmu() and instead
require an explicit earlier call to tlb_flush_unshared_tables().
>From hugetlb code, we call huge_pmd_unshare_flush() where we make sure
that the expected lock protecting us from concurrent unsharing+reuse is
still held.
Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that
tlb_flush_unshared_tables() was properly called earlier.
Document it all properly.
Notes about tlb_remove_table_sync_one() interaction with unsharing:
There are two fairly tricky things:
(1) tlb_remove_table_sync_one() is a NOP on architectures without
CONFIG_MMU_GATHER_RCU_TABLE_FREE.
Here, the assumption is that the previous TLB flush would send an
IPI to all relevant CPUs. Careful: some architectures like x86 only
send IPIs to all relevant CPUs when tlb->freed_tables is set.
The relevant architectures should be selecting
MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable
kernels and it might have been problematic before this patch.
Also, the arch flushing behavior (independent of IPIs) is different
when tlb->freed_tables is set. Do we have to enlighten them to also
take care of tlb->unshared_tables? So far we didn't care, so
hopefully we are fine. Of course, we could be setting
tlb->freed_tables as well, but that might then unnecessarily flush
too much, because the semantics of tlb->freed_tables are a bit
fuzzy.
This patch changes nothing in this regard.
(2) tlb_remove_table_sync_one() is not a NOP on architectures with
CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync.
Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB)
we still issue IPIs during TLB flushes and don't actually need the
second tlb_remove_table_sync_one().
This optimized can be implemented on top of this, by checking e.g., in
tlb_remove_table_sync_one() whether we really need IPIs. But as
described in (1), it really must honor tlb->freed_tables then to
send IPIs to all relevant CPUs.
Notes on TLB flushing changes:
(1) Flushing for non-shared PMD tables
We're converting from flush_hugetlb_tlb_range() to
tlb_remove_huge_tlb_entry(). Given that we properly initialize the
MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to
__unmap_hugepage_range(), that should be fine.
(2) Flushing for shared PMD tables
We're converting from various things (flush_hugetlb_tlb_range(),
tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range().
tlb_flush_pmd_range() achieves the same that
tlb_remove_huge_tlb_entry() would achieve in these scenarios.
Note that tlb_remove_huge_tlb_entry() also calls
__tlb_remove_tlb_entry(), however that is only implemented on
powerpc, which does not support PMD table sharing.
Similar to (1), tlb_gather_mmu_vma() should make sure that TLB
flushing keeps on working as expected.
Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a
concern, as we are holding the i_mmap_lock the whole time, preventing
concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed
separately as a cleanup later.
There are plenty more cleanups to be had, but they have to wait until
this is fixed.
Link: https://lkml.kernel.org/r/20251223214037.580860-5-david@kernel.org
Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race")
Signed-off-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
Reported-by: Uschakow, Stanislav" <suschako(a)amazon.de>
Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/
Tested-by: Laurence Oberman <loberman(a)redhat.com>
Cc: Harry Yoo <harry.yoo(a)oracle.com>
Cc: Lance Yang <lance.yang(a)linux.dev>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/asm-generic/tlb.h | 77 +++++++++++++++++++++-
include/linux/hugetlb.h | 15 ++--
include/linux/mm_types.h | 1
mm/hugetlb.c | 123 +++++++++++++++++++++---------------
mm/mmu_gather.c | 33 +++++++++
mm/rmap.c | 25 ++++---
6 files changed, 208 insertions(+), 66 deletions(-)
--- a/include/asm-generic/tlb.h~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather
+++ a/include/asm-generic/tlb.h
@@ -46,7 +46,8 @@
*
* The mmu_gather API consists of:
*
- * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_finish_mmu()
+ * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_gather_mmu_vma() /
+ * tlb_finish_mmu()
*
* start and finish a mmu_gather
*
@@ -364,6 +365,20 @@ struct mmu_gather {
unsigned int vma_huge : 1;
unsigned int vma_pfn : 1;
+ /*
+ * Did we unshare (unmap) any shared page tables? For now only
+ * used for hugetlb PMD table sharing.
+ */
+ unsigned int unshared_tables : 1;
+
+ /*
+ * Did we unshare any page tables such that they are now exclusive
+ * and could get reused+modified by the new owner? When setting this
+ * flag, "unshared_tables" will be set as well. For now only used
+ * for hugetlb PMD table sharing.
+ */
+ unsigned int fully_unshared_tables : 1;
+
unsigned int batch_count;
#ifndef CONFIG_MMU_GATHER_NO_GATHER
@@ -400,6 +415,7 @@ static inline void __tlb_reset_range(str
tlb->cleared_pmds = 0;
tlb->cleared_puds = 0;
tlb->cleared_p4ds = 0;
+ tlb->unshared_tables = 0;
/*
* Do not reset mmu_gather::vma_* fields here, we do not
* call into tlb_start_vma() again to set them if there is an
@@ -484,7 +500,7 @@ static inline void tlb_flush_mmu_tlbonly
* these bits.
*/
if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
- tlb->cleared_puds || tlb->cleared_p4ds))
+ tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables))
return;
tlb_flush(tlb);
@@ -773,6 +789,63 @@ static inline bool huge_pmd_needs_flush(
}
#endif
+#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
+static inline void tlb_unshare_pmd_ptdesc(struct mmu_gather *tlb, struct ptdesc *pt,
+ unsigned long addr)
+{
+ /*
+ * The caller must make sure that concurrent unsharing + exclusive
+ * reuse is impossible until tlb_flush_unshared_tables() was called.
+ */
+ VM_WARN_ON_ONCE(!ptdesc_pmd_is_shared(pt));
+ ptdesc_pmd_pts_dec(pt);
+
+ /* Clearing a PUD pointing at a PMD table with PMD leaves. */
+ tlb_flush_pmd_range(tlb, addr & PUD_MASK, PUD_SIZE);
+
+ /*
+ * If the page table is now exclusively owned, we fully unshared
+ * a page table.
+ */
+ if (!ptdesc_pmd_is_shared(pt))
+ tlb->fully_unshared_tables = true;
+ tlb->unshared_tables = true;
+}
+
+static inline void tlb_flush_unshared_tables(struct mmu_gather *tlb)
+{
+ /*
+ * As soon as the caller drops locks to allow for reuse of
+ * previously-shared tables, these tables could get modified and
+ * even reused outside of hugetlb context, so we have to make sure that
+ * any page table walkers (incl. TLB, GUP-fast) are aware of that
+ * change.
+ *
+ * Even if we are not fully unsharing a PMD table, we must
+ * flush the TLB for the unsharer now.
+ */
+ if (tlb->unshared_tables)
+ tlb_flush_mmu_tlbonly(tlb);
+
+ /*
+ * Similarly, we must make sure that concurrent GUP-fast will not
+ * walk previously-shared page tables that are getting modified+reused
+ * elsewhere. So broadcast an IPI to wait for any concurrent GUP-fast.
+ *
+ * We only perform this when we are the last sharer of a page table,
+ * as the IPI will reach all CPUs: any GUP-fast.
+ *
+ * Note that on configs where tlb_remove_table_sync_one() is a NOP,
+ * the expectation is that the tlb_flush_mmu_tlbonly() would have issued
+ * required IPIs already for us.
+ */
+ if (tlb->fully_unshared_tables) {
+ tlb_remove_table_sync_one();
+ tlb->fully_unshared_tables = false;
+ }
+}
+#endif /* CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */
+
#endif /* CONFIG_MMU */
#endif /* _ASM_GENERIC__TLB_H */
--- a/include/linux/hugetlb.h~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather
+++ a/include/linux/hugetlb.h
@@ -240,8 +240,9 @@ pte_t *huge_pte_alloc(struct mm_struct *
pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz);
unsigned long hugetlb_mask_last_page(struct hstate *h);
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep);
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma);
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end);
@@ -300,13 +301,17 @@ static inline struct address_space *huge
return NULL;
}
-static inline int huge_pmd_unshare(struct mm_struct *mm,
- struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+static inline int huge_pmd_unshare(struct mmu_gather *tlb,
+ struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
{
return 0;
}
+static inline void huge_pmd_unshare_flush(struct mmu_gather *tlb,
+ struct vm_area_struct *vma)
+{
+}
+
static inline void adjust_range_if_pmd_sharing_possible(
struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
--- a/include/linux/mm_types.h~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather
+++ a/include/linux/mm_types.h
@@ -1530,6 +1530,7 @@ static inline unsigned int mm_cid_size(v
struct mmu_gather;
extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm);
+void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma);
extern void tlb_finish_mmu(struct mmu_gather *tlb);
struct vm_fault;
--- a/mm/hugetlb.c~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather
+++ a/mm/hugetlb.c
@@ -5112,7 +5112,7 @@ int move_hugetlb_page_tables(struct vm_a
unsigned long last_addr_mask;
pte_t *src_pte, *dst_pte;
struct mmu_notifier_range range;
- bool shared_pmd = false;
+ struct mmu_gather tlb;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, old_addr,
old_end);
@@ -5122,6 +5122,7 @@ int move_hugetlb_page_tables(struct vm_a
* range.
*/
flush_cache_range(vma, range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
mmu_notifier_invalidate_range_start(&range);
last_addr_mask = hugetlb_mask_last_page(h);
@@ -5138,8 +5139,7 @@ int move_hugetlb_page_tables(struct vm_a
if (huge_pte_none(huge_ptep_get(mm, old_addr, src_pte)))
continue;
- if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) {
- shared_pmd = true;
+ if (huge_pmd_unshare(&tlb, vma, old_addr, src_pte)) {
old_addr |= last_addr_mask;
new_addr |= last_addr_mask;
continue;
@@ -5150,15 +5150,16 @@ int move_hugetlb_page_tables(struct vm_a
break;
move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte, sz);
+ tlb_remove_huge_tlb_entry(h, &tlb, src_pte, old_addr);
}
- if (shared_pmd)
- flush_hugetlb_tlb_range(vma, range.start, range.end);
- else
- flush_hugetlb_tlb_range(vma, old_end - len, old_end);
+ tlb_flush_mmu_tlbonly(&tlb);
+ huge_pmd_unshare_flush(&tlb, vma);
+
mmu_notifier_invalidate_range_end(&range);
i_mmap_unlock_write(mapping);
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
return len + old_addr - old_end;
}
@@ -5177,7 +5178,6 @@ void __unmap_hugepage_range(struct mmu_g
unsigned long sz = huge_page_size(h);
bool adjust_reservation;
unsigned long last_addr_mask;
- bool force_flush = false;
WARN_ON(!is_vm_hugetlb_page(vma));
BUG_ON(start & ~huge_page_mask(h));
@@ -5200,10 +5200,8 @@ void __unmap_hugepage_range(struct mmu_g
}
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, vma, address, ptep)) {
+ if (huge_pmd_unshare(tlb, vma, address, ptep)) {
spin_unlock(ptl);
- tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
- force_flush = true;
address |= last_addr_mask;
continue;
}
@@ -5319,14 +5317,7 @@ void __unmap_hugepage_range(struct mmu_g
}
tlb_end_vma(tlb, vma);
- /*
- * There is nothing protecting a previously-shared page table that we
- * unshared through huge_pmd_unshare() from getting freed after we
- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare()
- * succeeded, flush the range corresponding to the pud.
- */
- if (force_flush)
- tlb_flush_mmu_tlbonly(tlb);
+ huge_pmd_unshare_flush(tlb, vma);
}
void __hugetlb_zap_begin(struct vm_area_struct *vma,
@@ -6425,11 +6416,11 @@ long hugetlb_change_protection(struct vm
pte_t pte;
struct hstate *h = hstate_vma(vma);
long pages = 0, psize = huge_page_size(h);
- bool shared_pmd = false;
struct mmu_notifier_range range;
unsigned long last_addr_mask;
bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+ struct mmu_gather tlb;
/*
* In the case of shared PMDs, the area to flush could be beyond
@@ -6442,6 +6433,7 @@ long hugetlb_change_protection(struct vm
BUG_ON(address >= end);
flush_cache_range(vma, range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
mmu_notifier_invalidate_range_start(&range);
hugetlb_vma_lock_write(vma);
@@ -6468,7 +6460,7 @@ long hugetlb_change_protection(struct vm
}
}
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, vma, address, ptep)) {
+ if (huge_pmd_unshare(&tlb, vma, address, ptep)) {
/*
* When uffd-wp is enabled on the vma, unshare
* shouldn't happen at all. Warn about it if it
@@ -6477,7 +6469,6 @@ long hugetlb_change_protection(struct vm
WARN_ON_ONCE(uffd_wp || uffd_wp_resolve);
pages++;
spin_unlock(ptl);
- shared_pmd = true;
address |= last_addr_mask;
continue;
}
@@ -6538,22 +6529,16 @@ long hugetlb_change_protection(struct vm
pte = huge_pte_clear_uffd_wp(pte);
huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
pages++;
+ tlb_remove_huge_tlb_entry(h, &tlb, ptep, address);
}
next:
spin_unlock(ptl);
cond_resched();
}
- /*
- * There is nothing protecting a previously-shared page table that we
- * unshared through huge_pmd_unshare() from getting freed after we
- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare()
- * succeeded, flush the range corresponding to the pud.
- */
- if (shared_pmd)
- flush_hugetlb_tlb_range(vma, range.start, range.end);
- else
- flush_hugetlb_tlb_range(vma, start, end);
+
+ tlb_flush_mmu_tlbonly(&tlb);
+ huge_pmd_unshare_flush(&tlb, vma);
/*
* No need to call mmu_notifier_arch_invalidate_secondary_tlbs() we are
* downgrading page table protection not changing it to point to a new
@@ -6564,6 +6549,7 @@ next:
i_mmap_unlock_write(vma->vm_file->f_mapping);
hugetlb_vma_unlock_write(vma);
mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
return pages > 0 ? (pages << h->order) : pages;
}
@@ -6920,18 +6906,27 @@ out:
return pte;
}
-/*
- * unmap huge page backed by shared pte.
+/**
+ * huge_pmd_unshare - Unmap a pmd table if it is shared by multiple users
+ * @tlb: the current mmu_gather.
+ * @vma: the vma covering the pmd table.
+ * @addr: the address we are trying to unshare.
+ * @ptep: pointer into the (pmd) page table.
+ *
+ * Called with the page table lock held, the i_mmap_rwsem held in write mode
+ * and the hugetlb vma lock held in write mode.
*
- * Called with page table lock held.
+ * Note: The caller must call huge_pmd_unshare_flush() before dropping the
+ * i_mmap_rwsem.
*
- * returns: 1 successfully unmapped a shared pte page
- * 0 the underlying pte page is not shared, or it is the last user
+ * Returns: 1 if it was a shared PMD table and it got unmapped, or 0 if it
+ * was not a shared PMD table.
*/
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
{
unsigned long sz = huge_page_size(hstate_vma(vma));
+ struct mm_struct *mm = vma->vm_mm;
pgd_t *pgd = pgd_offset(mm, addr);
p4d_t *p4d = p4d_offset(pgd, addr);
pud_t *pud = pud_offset(p4d, addr);
@@ -6943,18 +6938,36 @@ int huge_pmd_unshare(struct mm_struct *m
i_mmap_assert_write_locked(vma->vm_file->f_mapping);
hugetlb_vma_assert_locked(vma);
pud_clear(pud);
- /*
- * Once our caller drops the rmap lock, some other process might be
- * using this page table as a normal, non-hugetlb page table.
- * Wait for pending gup_fast() in other threads to finish before letting
- * that happen.
- */
- tlb_remove_table_sync_one();
- ptdesc_pmd_pts_dec(virt_to_ptdesc(ptep));
+
+ tlb_unshare_pmd_ptdesc(tlb, virt_to_ptdesc(ptep), addr);
+
mm_dec_nr_pmds(mm);
return 1;
}
+/*
+ * huge_pmd_unshare_flush - Complete a sequence of huge_pmd_unshare() calls
+ * @tlb: the current mmu_gather.
+ * @vma: the vma covering the pmd table.
+ *
+ * Perform necessary TLB flushes or IPI broadcasts to synchronize PMD table
+ * unsharing with concurrent page table walkers.
+ *
+ * This function must be called after a sequence of huge_pmd_unshare()
+ * calls while still holding the i_mmap_rwsem.
+ */
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ /*
+ * We must synchronize page table unsharing such that nobody will
+ * try reusing a previously-shared page table while it might still
+ * be in use by previous sharers (TLB, GUP_fast).
+ */
+ i_mmap_assert_write_locked(vma->vm_file->f_mapping);
+
+ tlb_flush_unshared_tables(tlb);
+}
+
#else /* !CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */
pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
@@ -6963,12 +6976,16 @@ pte_t *huge_pmd_share(struct mm_struct *
return NULL;
}
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
{
return 0;
}
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+}
+
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
{
@@ -7235,6 +7252,7 @@ static void hugetlb_unshare_pmds(struct
unsigned long sz = huge_page_size(h);
struct mm_struct *mm = vma->vm_mm;
struct mmu_notifier_range range;
+ struct mmu_gather tlb;
unsigned long address;
spinlock_t *ptl;
pte_t *ptep;
@@ -7246,6 +7264,8 @@ static void hugetlb_unshare_pmds(struct
return;
flush_cache_range(vma, start, end);
+ tlb_gather_mmu_vma(&tlb, vma);
+
/*
* No need to call adjust_range_if_pmd_sharing_possible(), because
* we have already done the PUD_SIZE alignment.
@@ -7264,10 +7284,10 @@ static void hugetlb_unshare_pmds(struct
if (!ptep)
continue;
ptl = huge_pte_lock(h, mm, ptep);
- huge_pmd_unshare(mm, vma, address, ptep);
+ huge_pmd_unshare(&tlb, vma, address, ptep);
spin_unlock(ptl);
}
- flush_hugetlb_tlb_range(vma, start, end);
+ huge_pmd_unshare_flush(&tlb, vma);
if (take_locks) {
i_mmap_unlock_write(vma->vm_file->f_mapping);
hugetlb_vma_unlock_write(vma);
@@ -7277,6 +7297,7 @@ static void hugetlb_unshare_pmds(struct
* Documentation/mm/mmu_notifier.rst.
*/
mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
}
/*
--- a/mm/mmu_gather.c~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather
+++ a/mm/mmu_gather.c
@@ -10,6 +10,7 @@
#include <linux/swap.h>
#include <linux/rmap.h>
#include <linux/pgalloc.h>
+#include <linux/hugetlb.h>
#include <asm/tlb.h>
@@ -426,6 +427,7 @@ static void __tlb_gather_mmu(struct mmu_
#endif
tlb->vma_pfn = 0;
+ tlb->fully_unshared_tables = 0;
__tlb_reset_range(tlb);
inc_tlb_flush_pending(tlb->mm);
}
@@ -460,6 +462,31 @@ void tlb_gather_mmu_fullmm(struct mmu_ga
}
/**
+ * tlb_gather_mmu - initialize an mmu_gather structure for operating on a single
+ * VMA
+ * @tlb: the mmu_gather structure to initialize
+ * @vma: the vm_area_struct
+ *
+ * Called to initialize an (on-stack) mmu_gather structure for operating on
+ * a single VMA. In contrast to tlb_gather_mmu(), calling this function will
+ * not require another call to tlb_start_vma(). In contrast to tlb_start_vma(),
+ * this function will *not* call flush_cache_range().
+ *
+ * For hugetlb VMAs, this function will also initialize the mmu_gather
+ * page_size accordingly, not requiring a separate call to
+ * tlb_change_page_size().
+ *
+ */
+void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ tlb_gather_mmu(tlb, vma->vm_mm);
+ tlb_update_vma_flags(tlb, vma);
+ if (is_vm_hugetlb_page(vma))
+ /* All entries have the same size. */
+ tlb_change_page_size(tlb, huge_page_size(hstate_vma(vma)));
+}
+
+/**
* tlb_finish_mmu - finish an mmu_gather structure
* @tlb: the mmu_gather structure to finish
*
@@ -469,6 +496,12 @@ void tlb_gather_mmu_fullmm(struct mmu_ga
void tlb_finish_mmu(struct mmu_gather *tlb)
{
/*
+ * We expect an earlier huge_pmd_unshare_flush() call to sort this out,
+ * due to complicated locking requirements with page table unsharing.
+ */
+ VM_WARN_ON_ONCE(tlb->fully_unshared_tables);
+
+ /*
* If there are parallel threads are doing PTE changes on same range
* under non-exclusive lock (e.g., mmap_lock read-side) but defer TLB
* flush by batching, one thread may end up seeing inconsistent PTEs
--- a/mm/rmap.c~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather
+++ a/mm/rmap.c
@@ -76,7 +76,7 @@
#include <linux/mm_inline.h>
#include <linux/oom.h>
-#include <asm/tlbflush.h>
+#include <asm/tlb.h>
#define CREATE_TRACE_POINTS
#include <trace/events/migrate.h>
@@ -2008,13 +2008,17 @@ static bool try_to_unmap_one(struct foli
* if unsuccessful.
*/
if (!anon) {
+ struct mmu_gather tlb;
+
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
if (!hugetlb_vma_trylock_write(vma))
goto walk_abort;
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+
+ tlb_gather_mmu_vma(&tlb, vma);
+ if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
+ huge_pmd_unshare_flush(&tlb, vma);
+ tlb_finish_mmu(&tlb);
/*
* The PMD table was unmapped,
* consequently unmapping the folio.
@@ -2022,6 +2026,7 @@ static bool try_to_unmap_one(struct foli
goto walk_done;
}
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
}
pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
if (pte_dirty(pteval))
@@ -2398,17 +2403,20 @@ static bool try_to_migrate_one(struct fo
* fail if unsuccessful.
*/
if (!anon) {
+ struct mmu_gather tlb;
+
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
if (!hugetlb_vma_trylock_write(vma)) {
page_vma_mapped_walk_done(&pvmw);
ret = false;
break;
}
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
- hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
+ if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
+ hugetlb_vma_unlock_write(vma);
+ huge_pmd_unshare_flush(&tlb, vma);
+ tlb_finish_mmu(&tlb);
/*
* The PMD table was unmapped,
* consequently unmapping the folio.
@@ -2417,6 +2425,7 @@ static bool try_to_migrate_one(struct fo
break;
}
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
}
/* Nuke the hugetlb page table entry */
pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
_
Patches currently in -mm which might be from david(a)kernel.org are
mm-hugetlb-fix-hugetlb_pmd_shared.patch
mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch
mm-rmap-fix-two-comments-related-to-huge_pmd_unshare.patch
mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch
mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather-fix.patch
The patch titled
Subject: mm/hugetlb: fix hugetlb_pmd_shared()
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-hugetlb_pmd_shared.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: "David Hildenbrand (Red Hat)" <david(a)kernel.org>
Subject: mm/hugetlb: fix hugetlb_pmd_shared()
Date: Tue, 23 Dec 2025 22:40:34 +0100
Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using
mmu_gather)", v3.
One functional fix, one performance regression fix, and two related
comment fixes.
I cleaned up my prototype I recently shared [1] for the performance fix,
deferring most of the cleanups I had in the prototype to a later point.
While doing that I identified the other things.
The goal of this patch set is to be backported to stable trees "fairly"
easily. At least patch #1 and #4.
Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing
Patch #2 + #3 are simple comment fixes that patch #4 interacts with.
Patch #4 is a fix for the reported performance regression due to excessive
IPI broadcasts during fork()+exit().
The last patch is all about TLB flushes, IPIs and mmu_gather.
Read: complicated
There are plenty of cleanups in the future to be had + one reasonable
optimization on x86. But that's all out of scope for this series.
Runtime tested, with a focus on fixing the performance regression using
the original reproducer [2] on x86.
This patch (of 4):
We switched from (wrongly) using the page count to an independent shared
count. Now, shared page tables have a refcount of 1 (excluding
speculative references) and instead use ptdesc->pt_share_count to identify
sharing.
We didn't convert hugetlb_pmd_shared(), so right now, we would never
detect a shared PMD table as such, because sharing/unsharing no longer
touches the refcount of a PMD table.
Page migration, like mbind() or migrate_pages() would allow for migrating
folios mapped into such shared PMD tables, even though the folios are not
exclusive. In smaps we would account them as "private" although they are
"shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the
pagemap interface.
Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared().
Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org
Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1]
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2]
Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count")
Signed-off-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
Reviewed-by: Rik van Riel <riel(a)surriel.com>
Reviewed-by: Lance Yang <lance.yang(a)linux.dev>
Tested-by: Lance Yang <lance.yang(a)linux.dev>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Tested-by: Laurence Oberman <loberman(a)redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Acked-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Cc: Uschakow, Stanislav" <suschako(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-fix-hugetlb_pmd_shared
+++ a/include/linux/hugetlb.h
@@ -1326,7 +1326,7 @@ static inline __init void hugetlb_cma_re
#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
static inline bool hugetlb_pmd_shared(pte_t *pte)
{
- return page_count(virt_to_page(pte)) > 1;
+ return ptdesc_pmd_is_shared(virt_to_ptdesc(pte));
}
#else
static inline bool hugetlb_pmd_shared(pte_t *pte)
_
Patches currently in -mm which might be from david(a)kernel.org are
mm-hugetlb-fix-hugetlb_pmd_shared.patch
mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch
mm-rmap-fix-two-comments-related-to-huge_pmd_unshare.patch
mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch
mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather-fix.patch
The patch titled
Subject: mm: take into account mm_cid size for mm_struct static definitions
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-take-into-account-mm_cid-size-for-mm_struct-static-definitions.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm: take into account mm_cid size for mm_struct static definitions
Date: Wed, 24 Dec 2025 12:33:58 -0500
Both init_mm and efi_mm static definitions need to make room for the 2
mm_cid cpumasks.
This fixes possible out-of-bounds accesses to init_mm and efi_mm.
Add a space between # and define for the mm_alloc_cid() definition to make
it consistent with the coding style used in the rest of this header file.
Link: https://lkml.kernel.org/r/20251224173358.647691-4-mathieu.desnoyers@efficio…
Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Christan K��nig <christian.koenig(a)amd.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: "Liam R . Howlett" <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Martin Liu <liumartin(a)google.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sweet Tea Dorminy <sweettea-kernel(a)dorminy.me>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm_types.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/include/linux/mm_types.h~mm-take-into-account-mm_cid-size-for-mm_struct-static-definitions
+++ a/include/linux/mm_types.h
@@ -1368,7 +1368,7 @@ extern struct mm_struct init_mm;
#define MM_STRUCT_FLEXIBLE_ARRAY_INIT \
{ \
- [0 ... sizeof(cpumask_t)-1] = 0 \
+ [0 ... sizeof(cpumask_t) + MM_CID_STATIC_SIZE - 1] = 0 \
}
/* Pointer magic because the dynamic array size confuses some compilers. */
@@ -1500,7 +1500,7 @@ static inline int mm_alloc_cid_noprof(st
mm_init_cid(mm, p);
return 0;
}
-#define mm_alloc_cid(...) alloc_hooks(mm_alloc_cid_noprof(__VA_ARGS__))
+# define mm_alloc_cid(...) alloc_hooks(mm_alloc_cid_noprof(__VA_ARGS__))
static inline void mm_destroy_cid(struct mm_struct *mm)
{
@@ -1514,6 +1514,8 @@ static inline unsigned int mm_cid_size(v
return cpumask_size() + bitmap_size(num_possible_cpus());
}
+/* Use 2 * NR_CPUS as worse case for static allocation. */
+# define MM_CID_STATIC_SIZE (2 * sizeof(cpumask_t))
#else /* CONFIG_SCHED_MM_CID */
static inline void mm_init_cid(struct mm_struct *mm, struct task_struct *p) { }
static inline int mm_alloc_cid(struct mm_struct *mm, struct task_struct *p) { return 0; }
@@ -1522,6 +1524,7 @@ static inline unsigned int mm_cid_size(v
{
return 0;
}
+# define MM_CID_STATIC_SIZE 0
#endif /* CONFIG_SCHED_MM_CID */
struct mmu_gather;
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
mm-add-missing-static-initializer-for-init_mm-mm_cidlock.patch
mm-rename-cpu_bitmap-field-to-flexible_array.patch
mm-take-into-account-mm_cid-size-for-mm_struct-static-definitions.patch
tsacct-skip-all-kernel-threads.patch
lib-introduce-hierarchical-per-cpu-counters.patch
mm-fix-oom-killer-inaccuracy-on-large-many-core-systems.patch
mm-implement-precise-oom-killer-task-selection.patch
The patch titled
Subject: mm: rename cpu_bitmap field to flexible_array
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-rename-cpu_bitmap-field-to-flexible_array.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm: rename cpu_bitmap field to flexible_array
Date: Wed, 24 Dec 2025 12:33:57 -0500
The cpu_bitmap flexible array now contains more than just the cpu_bitmap.
In preparation for changing the static mm_struct definitions to cover for
the additional space required, change the cpu_bitmap type from "unsigned
long" to "char", require an unsigned long alignment of the flexible array,
and rename the field from "cpu_bitmap" to "flexible_array".
Introduce the MM_STRUCT_FLEXIBLE_ARRAY_INIT macro to statically initialize
the flexible array. This covers the init_mm and efi_mm static
definitions.
This is a preparation step for fixing the missing mm_cid size for static
mm_struct definitions.
Link: https://lkml.kernel.org/r/20251224173358.647691-3-mathieu.desnoyers@efficio…
Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Christan K��nig <christian.koenig(a)amd.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: "Liam R . Howlett" <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Martin Liu <liumartin(a)google.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sweet Tea Dorminy <sweettea-kernel(a)dorminy.me>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/firmware/efi/efi.c | 2 +-
include/linux/mm_types.h | 13 +++++++++----
mm/init-mm.c | 2 +-
3 files changed, 11 insertions(+), 6 deletions(-)
--- a/drivers/firmware/efi/efi.c~mm-rename-cpu_bitmap-field-to-flexible_array
+++ a/drivers/firmware/efi/efi.c
@@ -73,10 +73,10 @@ struct mm_struct efi_mm = {
MMAP_LOCK_INITIALIZER(efi_mm)
.page_table_lock = __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
- .cpu_bitmap = { [BITS_TO_LONGS(NR_CPUS)] = 0},
#ifdef CONFIG_SCHED_MM_CID
.mm_cid.lock = __RAW_SPIN_LOCK_UNLOCKED(efi_mm.mm_cid.lock),
#endif
+ .flexible_array = MM_STRUCT_FLEXIBLE_ARRAY_INIT,
};
struct workqueue_struct *efi_rts_wq;
--- a/include/linux/mm_types.h~mm-rename-cpu_bitmap-field-to-flexible_array
+++ a/include/linux/mm_types.h
@@ -1329,7 +1329,7 @@ struct mm_struct {
* The mm_cpumask needs to be at the end of mm_struct, because it
* is dynamically sized based on nr_cpu_ids.
*/
- unsigned long cpu_bitmap[];
+ char flexible_array[] __aligned(__alignof__(unsigned long));
};
/* Copy value to the first system word of mm flags, non-atomically. */
@@ -1366,19 +1366,24 @@ static inline void __mm_flags_set_mask_b
MT_FLAGS_USE_RCU)
extern struct mm_struct init_mm;
+#define MM_STRUCT_FLEXIBLE_ARRAY_INIT \
+{ \
+ [0 ... sizeof(cpumask_t)-1] = 0 \
+}
+
/* Pointer magic because the dynamic array size confuses some compilers. */
static inline void mm_init_cpumask(struct mm_struct *mm)
{
unsigned long cpu_bitmap = (unsigned long)mm;
- cpu_bitmap += offsetof(struct mm_struct, cpu_bitmap);
+ cpu_bitmap += offsetof(struct mm_struct, flexible_array);
cpumask_clear((struct cpumask *)cpu_bitmap);
}
/* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
{
- return (struct cpumask *)&mm->cpu_bitmap;
+ return (struct cpumask *)&mm->flexible_array;
}
#ifdef CONFIG_LRU_GEN
@@ -1469,7 +1474,7 @@ static inline cpumask_t *mm_cpus_allowed
{
unsigned long bitmap = (unsigned long)mm;
- bitmap += offsetof(struct mm_struct, cpu_bitmap);
+ bitmap += offsetof(struct mm_struct, flexible_array);
/* Skip cpu_bitmap */
bitmap += cpumask_size();
return (struct cpumask *)bitmap;
--- a/mm/init-mm.c~mm-rename-cpu_bitmap-field-to-flexible_array
+++ a/mm/init-mm.c
@@ -47,7 +47,7 @@ struct mm_struct init_mm = {
#ifdef CONFIG_SCHED_MM_CID
.mm_cid.lock = __RAW_SPIN_LOCK_UNLOCKED(init_mm.mm_cid.lock),
#endif
- .cpu_bitmap = CPU_BITS_NONE,
+ .flexible_array = MM_STRUCT_FLEXIBLE_ARRAY_INIT,
INIT_MM_CONTEXT(init_mm)
};
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
mm-add-missing-static-initializer-for-init_mm-mm_cidlock.patch
mm-rename-cpu_bitmap-field-to-flexible_array.patch
mm-take-into-account-mm_cid-size-for-mm_struct-static-definitions.patch
tsacct-skip-all-kernel-threads.patch
lib-introduce-hierarchical-per-cpu-counters.patch
mm-fix-oom-killer-inaccuracy-on-large-many-core-systems.patch
mm-implement-precise-oom-killer-task-selection.patch
The patch titled
Subject: mm/damon/sysfs-scheme: cleanup access_pattern subdirs on scheme dir setup failure
has been added to the -mm mm-new branch. Its filename is
mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs-scheme: cleanup access_pattern subdirs on scheme dir setup failure
Date: Wed, 24 Dec 2025 18:30:37 -0800
When a DAMOS-scheme DAMON sysfs directory setup fails after setup of
access_pattern/ directory, subdirectories of access_pattern/ directory are
not cleaned up. As a result, DAMON sysfs interface is nearly broken until
the system reboots, and the memory for the unremoved directory is leaked.
Cleanup the directories under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-5-sj@kernel.org
Fixes: 9bbb820a5bd5 ("mm/damon/sysfs: support DAMOS quotas")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: chongjiapeng <jiapeng.chong(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> # 5.18.x
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs-schemes.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/damon/sysfs-schemes.c~mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure
+++ a/mm/damon/sysfs-schemes.c
@@ -2193,7 +2193,7 @@ static int damon_sysfs_scheme_add_dirs(s
return err;
err = damos_sysfs_set_dests(scheme);
if (err)
- goto put_access_pattern_out;
+ goto rmdir_put_access_pattern_out;
err = damon_sysfs_scheme_set_quotas(scheme);
if (err)
goto put_dests_out;
@@ -2231,7 +2231,8 @@ rmdir_put_quotas_access_pattern_out:
put_dests_out:
kobject_put(&scheme->dests->kobj);
scheme->dests = NULL;
-put_access_pattern_out:
+rmdir_put_access_pattern_out:
+ damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern);
kobject_put(&scheme->access_pattern->kobj);
scheme->access_pattern = NULL;
return err;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-remove-call_control-in-inactive-contexts.patch
mm-damon-core-introduce-nr_snapshots-damos-stat.patch
mm-damon-sysfs-schemes-introduce-nr_snapshots-damos-stat-file.patch
docs-mm-damon-design-update-for-nr_snapshots-damos-stat.patch
docs-admin-guide-mm-damon-usage-update-for-nr_snapshots-damos-stat.patch
docs-abi-damon-update-for-nr_snapshots-damos-stat.patch
mm-damon-update-damos-kerneldoc-for-stat-field.patch
mm-damon-core-implement-max_nr_snapshots.patch
mm-damon-sysfs-schemes-implement-max_nr_snapshots-file.patch
docs-mm-damon-design-update-for-max_nr_snapshots.patch
docs-admin-guide-mm-damon-usage-update-for-max_nr_snapshots.patch
docs-abi-damon-update-for-max_nr_snapshots.patch
mm-damon-core-add-trace-point-for-damos-stat-per-apply-interval.patch
mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure.patch
mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure.patch
The patch titled
Subject: mm/damon/sysfs-scheme: cleanup quotas subdirs on scheme dir setup failure
has been added to the -mm mm-new branch. Its filename is
mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs-scheme: cleanup quotas subdirs on scheme dir setup failure
Date: Wed, 24 Dec 2025 18:30:36 -0800
When a DAMOS-scheme DAMON sysfs directory setup fails after setup of
quotas/ directory, subdirectories of quotas/ directory are not cleaned up.
As a result, DAMON sysfs interface is nearly broken until the system
reboots, and the memory for the unremoved directory is leaked.
Cleanup the directories under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-4-sj@kernel.org
Fixes: 1b32234ab087 ("mm/damon/sysfs: support DAMOS watermarks")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: chongjiapeng <jiapeng.chong(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> # 5.18.x
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs-schemes.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/damon/sysfs-schemes.c~mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure
+++ a/mm/damon/sysfs-schemes.c
@@ -2199,7 +2199,7 @@ static int damon_sysfs_scheme_add_dirs(s
goto put_dests_out;
err = damon_sysfs_scheme_set_watermarks(scheme);
if (err)
- goto put_quotas_access_pattern_out;
+ goto rmdir_put_quotas_access_pattern_out;
err = damos_sysfs_set_filter_dirs(scheme);
if (err)
goto put_watermarks_quotas_access_pattern_out;
@@ -2224,7 +2224,8 @@ put_filters_watermarks_quotas_access_pat
put_watermarks_quotas_access_pattern_out:
kobject_put(&scheme->watermarks->kobj);
scheme->watermarks = NULL;
-put_quotas_access_pattern_out:
+rmdir_put_quotas_access_pattern_out:
+ damon_sysfs_quotas_rm_dirs(scheme->quotas);
kobject_put(&scheme->quotas->kobj);
scheme->quotas = NULL;
put_dests_out:
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-remove-call_control-in-inactive-contexts.patch
mm-damon-core-introduce-nr_snapshots-damos-stat.patch
mm-damon-sysfs-schemes-introduce-nr_snapshots-damos-stat-file.patch
docs-mm-damon-design-update-for-nr_snapshots-damos-stat.patch
docs-admin-guide-mm-damon-usage-update-for-nr_snapshots-damos-stat.patch
docs-abi-damon-update-for-nr_snapshots-damos-stat.patch
mm-damon-update-damos-kerneldoc-for-stat-field.patch
mm-damon-core-implement-max_nr_snapshots.patch
mm-damon-sysfs-schemes-implement-max_nr_snapshots-file.patch
docs-mm-damon-design-update-for-max_nr_snapshots.patch
docs-admin-guide-mm-damon-usage-update-for-max_nr_snapshots.patch
docs-abi-damon-update-for-max_nr_snapshots.patch
mm-damon-core-add-trace-point-for-damos-stat-per-apply-interval.patch
mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure.patch
mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure.patch
The patch titled
Subject: mm/damon/sysfs: cleanup attrs subdirs on context dir setup failure
has been added to the -mm mm-new branch. Its filename is
mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs: cleanup attrs subdirs on context dir setup failure
Date: Wed, 24 Dec 2025 18:30:35 -0800
When a context DAMON sysfs directory setup is failed after setup of attrs/
directory, subdirectories of attrs/ directory are not cleaned up. As a
result, DAMON sysfs interface is nearly broken until the system reboots,
and the memory for the unremoved directory is leaked.
Cleanup the directories under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-3-sj@kernel.org
Fixes: c951cd3b8901 ("mm/damon: implement a minimal stub for sysfs-based DAMON interface")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: chongjiapeng <jiapeng.chong(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> # 5.18.x
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure
+++ a/mm/damon/sysfs.c
@@ -950,7 +950,7 @@ static int damon_sysfs_context_add_dirs(
err = damon_sysfs_context_set_targets(context);
if (err)
- goto put_attrs_out;
+ goto rmdir_put_attrs_out;
err = damon_sysfs_context_set_schemes(context);
if (err)
@@ -960,7 +960,8 @@ static int damon_sysfs_context_add_dirs(
put_targets_attrs_out:
kobject_put(&context->targets->kobj);
context->targets = NULL;
-put_attrs_out:
+rmdir_put_attrs_out:
+ damon_sysfs_attrs_rm_dirs(context->attrs);
kobject_put(&context->attrs->kobj);
context->attrs = NULL;
return err;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-remove-call_control-in-inactive-contexts.patch
mm-damon-core-introduce-nr_snapshots-damos-stat.patch
mm-damon-sysfs-schemes-introduce-nr_snapshots-damos-stat-file.patch
docs-mm-damon-design-update-for-nr_snapshots-damos-stat.patch
docs-admin-guide-mm-damon-usage-update-for-nr_snapshots-damos-stat.patch
docs-abi-damon-update-for-nr_snapshots-damos-stat.patch
mm-damon-update-damos-kerneldoc-for-stat-field.patch
mm-damon-core-implement-max_nr_snapshots.patch
mm-damon-sysfs-schemes-implement-max_nr_snapshots-file.patch
docs-mm-damon-design-update-for-max_nr_snapshots.patch
docs-admin-guide-mm-damon-usage-update-for-max_nr_snapshots.patch
docs-abi-damon-update-for-max_nr_snapshots.patch
mm-damon-core-add-trace-point-for-damos-stat-per-apply-interval.patch
mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure.patch
mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure.patch
The patch titled
Subject: mm/damon/sysfs: cleanup intervals subdirs on attrs dir setup failure
has been added to the -mm mm-new branch. Its filename is
mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs: cleanup intervals subdirs on attrs dir setup failure
Date: Wed, 24 Dec 2025 18:30:34 -0800
Patch series "mm/damon/sysfs: free setup failures generated zombie sub-sub
dirs".
Some DAMON sysfs directory setup functions generates its sub and sub-sub
directories. For example, 'monitoring_attrs/' directory setup creates
'intervals/' and 'intervals/intervals_goal/' directories under
'monitoring_attrs/' directory. When such sub-sub directories are
successfully made but followup setup is failed, the setup function should
recursively clean up the subdirectories.
However, such setup functions are only dereferencing sub directory
reference counters. As a result, under certain setup failures, the
sub-sub directories keep having non-zero reference counters. It means the
directories cannot be removed like zombies, and the memory for the
directories cannot be freed.
The user impact of this issue is limited due to the following reasons.
When the issue happens, the zombie directories are still taking the path.
Hence attempts to generate the directories again will fail, without
additional memory leak. This means the upper bound memory leak is
limited. Nonetheless this also implies controlling DAMON with a feature
that requires the setup-failed sysfs files will be impossible until the
system reboots.
Also, the setup operations are quite simple. The certain failures would
hence only rarely happen, and are difficult to artificially trigger.
This patch (of 4):
When attrs/ DAMON sysfs directory setup is failed after setup of
intervals/ directory, intervals/intervals_goal/ directory is not cleaned
up. As a result, DAMON sysfs interface is nearly broken until the system
reboots, and the memory for the unremoved directory is leaked.
Cleanup the directory under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251225023043.18579-2-sj@kernel.org
Fixes: 8fbbcbeaafeb ("mm/damon/sysfs: implement intervals tuning goal directory")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: chongjiapeng <jiapeng.chong(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> # 6.15.x
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure
+++ a/mm/damon/sysfs.c
@@ -792,7 +792,7 @@ static int damon_sysfs_attrs_add_dirs(st
nr_regions_range = damon_sysfs_ul_range_alloc(10, 1000);
if (!nr_regions_range) {
err = -ENOMEM;
- goto put_intervals_out;
+ goto rmdir_put_intervals_out;
}
err = kobject_init_and_add(&nr_regions_range->kobj,
@@ -806,6 +806,8 @@ static int damon_sysfs_attrs_add_dirs(st
put_nr_regions_intervals_out:
kobject_put(&nr_regions_range->kobj);
attrs->nr_regions_range = NULL;
+rmdir_put_intervals_out:
+ damon_sysfs_intervals_rm_dirs(intervals);
put_intervals_out:
kobject_put(&intervals->kobj);
attrs->intervals = NULL;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-remove-call_control-in-inactive-contexts.patch
mm-damon-core-introduce-nr_snapshots-damos-stat.patch
mm-damon-sysfs-schemes-introduce-nr_snapshots-damos-stat-file.patch
docs-mm-damon-design-update-for-nr_snapshots-damos-stat.patch
docs-admin-guide-mm-damon-usage-update-for-nr_snapshots-damos-stat.patch
docs-abi-damon-update-for-nr_snapshots-damos-stat.patch
mm-damon-update-damos-kerneldoc-for-stat-field.patch
mm-damon-core-implement-max_nr_snapshots.patch
mm-damon-sysfs-schemes-implement-max_nr_snapshots-file.patch
docs-mm-damon-design-update-for-max_nr_snapshots.patch
docs-admin-guide-mm-damon-usage-update-for-max_nr_snapshots.patch
docs-abi-damon-update-for-max_nr_snapshots.patch
mm-damon-core-add-trace-point-for-damos-stat-per-apply-interval.patch
mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure.patch
mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure.patch
Initialize the eb.vma array with values of 0 when the eb structure is
first set up. In particular, this sets the eb->vma[i].vma pointers to
NULL, simplifying cleanup and getting rid of the bug described below.
During the execution of eb_lookup_vmas(), the eb->vma array is
successively filled up with struct eb_vma objects. This process includes
calling eb_add_vma(), which might fail; however, even in the event of
failure, eb->vma[i].vma is set for the currently processed buffer.
If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which
prompts a call to eb_release_vmas() to clean up the mess. Since
eb_lookup_vmas() might fail during processing any (possibly not first)
buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know
at what point did the lookup function fail.
In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper
function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is
set to NULL in case i915_gem_object_userptr_submit_init() fails; the
current one needs to be cleaned up by eb_release_vmas() at this point,
so the next one is set. If eb_add_vma() fails, neither the current nor
the next vma is set to NULL, which is a source of a NULL deref bug
described in the issue linked in the Closes tag.
When entering eb_lookup_vmas(), the vma pointers are set to the slab
poison value, instead of NULL. This doesn't matter for the actual
lookup, since it gets overwritten anyway, however the eb_release_vmas()
function only recognizes NULL as the stopping value, hence the pointers
are being set to NULL as they go in case of intermediate failure. This
patch changes the approach to filling them all with NULL at the start
instead, rather than handling that manually during failure.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15062
Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Reported-by: Gangmin Kim <km.kim1503(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # 5.16.x
Signed-off-by: Krzysztof Niemiec <krzysztof.niemiec(a)intel.com>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas(a)intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
---
I messed up the continuity in previous revisions; the original patch
was sent as [1], and the first revision (which I didn't mark as v2 due
to the title change) was sent as [2].
This is the full current changelog:
v5:
- improve style and fix nits in commit log (Andi)
- fix typos and style in the code and comments (Andi)
- set args->buffer_count + 1 values to 0 instead of just
args->buffer_count (Andi)
v4:
- delete an empty line (Janusz), reword the comment a bit (Krzysztof,
Janusz)
v3:
- use memset() to fill the entire eb.vma array with zeros instead of
looping through the elements (Janusz)
- add a comment clarifying the mechanism of the initial allocation (Janusz)
- change the commit log again, including title
- rearrange the tags to keep checkpatch happy
v2:
- set the eb->vma[i].vma pointers to NULL during setup instead of
ad-hoc at failure (Janusz)
- romanize the reporter's name (Andi, offline)
- change the commit log, including title
[1] https://patchwork.freedesktop.org/series/156832/
[2] https://patchwork.freedesktop.org/series/158036/
.../gpu/drm/i915/gem/i915_gem_execbuffer.c | 37 +++++++++----------
1 file changed, 17 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b057c2fa03a4..d49e96f9be51 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -951,13 +951,13 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
vma = eb_lookup_vma(eb, eb->exec[i].handle);
if (IS_ERR(vma)) {
err = PTR_ERR(vma);
- goto err;
+ return err;
}
err = eb_validate_vma(eb, &eb->exec[i], vma);
if (unlikely(err)) {
i915_vma_put(vma);
- goto err;
+ return err;
}
err = eb_add_vma(eb, ¤t_batch, i, vma);
@@ -966,19 +966,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
if (i915_gem_object_is_userptr(vma->obj)) {
err = i915_gem_object_userptr_submit_init(vma->obj);
- if (err) {
- if (i + 1 < eb->buffer_count) {
- /*
- * Execbuffer code expects last vma entry to be NULL,
- * since we already initialized this entry,
- * set the next value to NULL or we mess up
- * cleanup handling.
- */
- eb->vma[i + 1].vma = NULL;
- }
-
+ if (err)
return err;
- }
eb->vma[i].flags |= __EXEC_OBJECT_USERPTR_INIT;
eb->args->flags |= __EXEC_USERPTR_USED;
@@ -986,10 +975,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
}
return 0;
-
-err:
- eb->vma[i].vma = NULL;
- return err;
}
static int eb_lock_vmas(struct i915_execbuffer *eb)
@@ -3375,7 +3360,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
eb.exec = exec;
eb.vma = (struct eb_vma *)(exec + args->buffer_count + 1);
- eb.vma[0].vma = NULL;
+ memset(eb.vma, 0, (args->buffer_count + 1) * sizeof(struct eb_vma));
+
eb.batch_pool = NULL;
eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
@@ -3584,7 +3570,18 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
if (err)
return err;
- /* Allocate extra slots for use by the command parser */
+ /*
+ * Allocate extra slots for use by the command parser.
+ *
+ * Note that this allocation handles two different arrays (the
+ * exec2_list array, and the eventual eb.vma array introduced in
+ * i915_gem_do_execbuffer()), that reside in virtually contiguous
+ * memory. Also note that the allocation intentionally doesn't fill the
+ * area with zeros, because the exec2_list part doesn't need to be, as
+ * it's immediately overwritten by user data a few lines below.
+ * However, the eb.vma part is explicitly zeroed later in
+ * i915_gem_do_execbuffer().
+ */
exec2_list = kvmalloc_array(count + 2, eb_element_size(),
__GFP_NOWARN | GFP_KERNEL);
if (exec2_list == NULL) {
--
2.45.2
The patch titled
Subject: mm/damon/core: remove call_control in inactive contexts
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-remove-call_control-in-inactive-contexts.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: remove call_control in inactive contexts
Date: Sun, 28 Dec 2025 10:31:01 -0800
If damon_call() is executed against a DAMON context that is not running,
the function returns error while keeping the damon_call_control object
linked to the context's call_controls list. Let's suppose the object is
deallocated after the damon_call(), and yet another damon_call() is
executed against the same context. The function tries to add the new
damon_call_control object to the call_controls list, which still has the
pointer to the previous damon_call_control object, which is deallocated.
As a result, use-after-free happens.
This can actually be triggered using the DAMON sysfs interface. It is not
easily exploitable since it requires the sysfs write permission and making
a definitely weird file writes, though. Please refer to the report for
more details about the issue reproduction steps.
Fix the issue by making damon_call() to cleanup the damon_call_control
object before returning the error.
Link: https://lkml.kernel.org/r/20251228183105.289441-1-sj@kernel.org
Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: JaeJoon Jung <rgbi3307(a)gmail.com>
Closes: https://lore.kernel.org/20251224094401.20384-1-rgbi3307@gmail.com
Cc: <stable(a)vger.kernel.org> [6.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
--- a/mm/damon/core.c~mm-damon-core-remove-call_control-in-inactive-contexts
+++ a/mm/damon/core.c
@@ -1431,6 +1431,35 @@ bool damon_is_running(struct damon_ctx *
return running;
}
+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ * an inactive context.
+ * @ctx: The inactive DAMON context.
+ * @control: Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive). See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+ struct damon_ctx *ctx, struct damon_call_control *control)
+{
+ struct damon_call_control *c;
+
+ mutex_lock(&ctx->call_controls_lock);
+ list_for_each_entry(c, &ctx->call_controls, list) {
+ if (c == control) {
+ list_del(&control->list);
+ mutex_unlock(&ctx->call_controls_lock);
+ return -EINVAL;
+ }
+ }
+ mutex_unlock(&ctx->call_controls_lock);
+ return 0;
+}
+
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1461,7 +1490,7 @@ int damon_call(struct damon_ctx *ctx, st
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
if (!damon_is_running(ctx))
- return -EINVAL;
+ return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-remove-call_control-in-inactive-contexts.patch
mm-damon-core-introduce-nr_snapshots-damos-stat.patch
mm-damon-sysfs-schemes-introduce-nr_snapshots-damos-stat-file.patch
docs-mm-damon-design-update-for-nr_snapshots-damos-stat.patch
docs-admin-guide-mm-damon-usage-update-for-nr_snapshots-damos-stat.patch
docs-abi-damon-update-for-nr_snapshots-damos-stat.patch
mm-damon-update-damos-kerneldoc-for-stat-field.patch
mm-damon-core-implement-max_nr_snapshots.patch
mm-damon-sysfs-schemes-implement-max_nr_snapshots-file.patch
docs-mm-damon-design-update-for-max_nr_snapshots.patch
docs-admin-guide-mm-damon-usage-update-for-max_nr_snapshots.patch
docs-abi-damon-update-for-max_nr_snapshots.patch
mm-damon-core-add-trace-point-for-damos-stat-per-apply-interval.patch
When simple_write_to_buffer() succeeds, it returns the number of bytes
actually copied to the buffer, which may be less than the requested
'count' if the buffer size is insufficient. However, the current code
incorrectly uses 'count' as the index for null termination instead of
the actual bytes copied, leading to out-of-bound write.
Add a check for the count and use the return value as the index.
Found via static analysis. This is similar to the
commit da9374819eb3 ("iio: backend: fix out-of-bound write")
Fixes: b1c5d68ea66e ("iio: dac: ad3552r-hs: add support for internal ramp")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
---
drivers/iio/dac/ad3552r-hs.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/dac/ad3552r-hs.c b/drivers/iio/dac/ad3552r-hs.c
index 41b96b48ba98..a9578afa7015 100644
--- a/drivers/iio/dac/ad3552r-hs.c
+++ b/drivers/iio/dac/ad3552r-hs.c
@@ -549,12 +549,15 @@ static ssize_t ad3552r_hs_write_data_source(struct file *f,
guard(mutex)(&st->lock);
+ if (count >= sizeof(buf))
+ return -ENOSPC;
+
ret = simple_write_to_buffer(buf, sizeof(buf) - 1, ppos, userbuf,
count);
if (ret < 0)
return ret;
- buf[count] = '\0';
+ buf[ret] = '\0';
ret = match_string(dbgfs_attr_source, ARRAY_SIZE(dbgfs_attr_source),
buf);
--
2.39.5 (Apple Git-154)
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
Change in V4:
- Fix typo error in code
Change in V3:
- Ensure variable is assigned when using cleanup attribute
Change in V2:
- Use __free() attribute instead of explicit of_node_put() calls
---
drivers/pmdomain/imx/gpc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/pmdomain/imx/gpc.c b/drivers/pmdomain/imx/gpc.c
index f18c7e6e75dd..56a78cc86584 100644
--- a/drivers/pmdomain/imx/gpc.c
+++ b/drivers/pmdomain/imx/gpc.c
@@ -403,13 +403,12 @@ static int imx_gpc_old_dt_init(struct device *dev, struct regmap *regmap,
static int imx_gpc_probe(struct platform_device *pdev)
{
const struct imx_gpc_dt_data *of_id_data = device_get_match_data(&pdev->dev);
- struct device_node *pgc_node;
+ struct device_node *pgc_node __free(device_node)
+ = of_get_child_by_name(pdev->dev.of_node, "pgc");
struct regmap *regmap;
void __iomem *base;
int ret;
- pgc_node = of_get_child_by_name(pdev->dev.of_node, "pgc");
-
/* bail out if DT too old and doesn't provide the necessary info */
if (!of_property_present(pdev->dev.of_node, "#power-domain-cells") &&
!pgc_node)
--
2.34.1
In the pruss_clk_mux_setup(), the devm_add_action_or_reset() indirectly
calls pruss_of_free_clk_provider(), which calls of_node_put(clk_mux_np)
on the error path. However, after the devm_add_action_or_reset()
returns, the of_node_put(clk_mux_np) is called again, causing a double
free.
Fix by using a separate label to avoid the duplicate of_node_put().
Fixes: ba59c9b43c86 ("soc: ti: pruss: support CORECLK_MUX and IEPCLK_MUX")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/soc/ti/pruss.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/ti/pruss.c b/drivers/soc/ti/pruss.c
index d7634bf5413a..c16d96bebe3f 100644
--- a/drivers/soc/ti/pruss.c
+++ b/drivers/soc/ti/pruss.c
@@ -368,13 +368,14 @@ static int pruss_clk_mux_setup(struct pruss *pruss, struct clk *clk_mux,
clk_mux_np);
if (ret) {
dev_err(dev, "failed to add clkmux free action %d", ret);
- goto put_clk_mux_np;
+ goto ret_error;
}
return 0;
put_clk_mux_np:
of_node_put(clk_mux_np);
+ret_error:
return ret;
}
--
2.34.1
The for_each_available_child_of_node() calls of_node_put() to
release child_np in each success loop. After breaking from the
loop with the child_np has been released, the code will jump to
the put_child label and will call the of_node_put() again if the
devm_request_threaded_irq() fails. These cause a double free bug.
Fix by using a separate label to avoid the duplicate of_node_put().
Fixes: ed2b5a8e6b98 ("phy: phy-rockchip-inno-usb2: support muxed interrupts")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/phy/rockchip/phy-rockchip-inno-usb2.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
index b0f23690ec30..f754c3b1c357 100644
--- a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
+++ b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
@@ -1491,7 +1491,7 @@ static int rockchip_usb2phy_probe(struct platform_device *pdev)
rphy);
if (ret) {
dev_err_probe(rphy->dev, ret, "failed to request usb2phy irq handle\n");
- goto put_child;
+ goto ret_error;
}
}
@@ -1499,6 +1499,7 @@ static int rockchip_usb2phy_probe(struct platform_device *pdev)
put_child:
of_node_put(child_np);
+ret_error:
return ret;
}
--
2.34.1
In w1_attach_slave_device(), if __w1_attach_slave_device() fails,
put_device() -> w1_slave_release() is called to do the cleanup job.
In w1_slave_release(), sl->family->refcnt and sl->master->slave_count
have already been decremented. There is no need to decrement twice
in w1_attach_slave_device().
Fixes: 2c927c0c73fd ("w1: Fix slave count on 1-Wire bus (resend)")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
---
drivers/w1/w1.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index 002d2639aa12..5f78b0a0b766 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -758,8 +758,6 @@ int w1_attach_slave_device(struct w1_master *dev, struct w1_reg_num *rn)
if (err < 0) {
dev_err(&dev->dev, "%s: Attaching %s failed.\n", __func__,
sl->name);
- dev->slave_count--;
- w1_family_put(sl->family);
atomic_dec(&sl->master->refcnt);
kfree(sl);
return err;
--
2.25.1
A deadlock can occur between nfc_unregister_device() and rfkill_fop_write()
due to lock ordering inversion between device_lock and rfkill_global_mutex.
The problematic lock order is:
Thread A (rfkill_fop_write):
rfkill_fop_write()
mutex_lock(&rfkill_global_mutex)
rfkill_set_block()
nfc_rfkill_set_block()
nfc_dev_down()
device_lock(&dev->dev) <- waits for device_lock
Thread B (nfc_unregister_device):
nfc_unregister_device()
device_lock(&dev->dev)
rfkill_unregister()
mutex_lock(&rfkill_global_mutex) <- waits for rfkill_global_mutex
This creates a classic ABBA deadlock scenario.
Fix this by moving rfkill_unregister() and rfkill_destroy() outside the
device_lock critical section. Store the rfkill pointer in a local variable
before releasing the lock, then call rfkill_unregister() after releasing
device_lock.
This change is safe because rfkill_fop_write() holds rfkill_global_mutex
while calling the rfkill callbacks, and rfkill_unregister() also acquires
rfkill_global_mutex before cleanup. Therefore, rfkill_unregister() will
wait for any ongoing callback to complete before proceeding, and
device_del() is only called after rfkill_unregister() returns, preventing
any use-after-free.
The similar lock ordering in nfc_register_device() (device_lock ->
rfkill_global_mutex via rfkill_register) is safe because during
registration the device is not yet in rfkill_list, so no concurrent
rfkill operations can occur on this device.
Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+4ef89409a235d804c6c2(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4ef89409a235d804c6c2
Link: https://lore.kernel.org/all/20251217054908.178907-1-kartikey406@gmail.com/T/ [v1]
Signed-off-by: Deepanshu Kartikey <kartikey406(a)gmail.com>
---
v2:
- Added explanation of why UAF is not possible
- Added explanation of why nfc_register_device() is safe
- Added Fixes and Cc: stable tags
- Fixed blank line after variable declaration (kept it)
---
net/nfc/core.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/nfc/core.c b/net/nfc/core.c
index ae1c842f9c64..82f023f37754 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -1154,6 +1154,7 @@ EXPORT_SYMBOL(nfc_register_device);
void nfc_unregister_device(struct nfc_dev *dev)
{
int rc;
+ struct rfkill *rfk = NULL;
pr_debug("dev_name=%s\n", dev_name(&dev->dev));
@@ -1164,13 +1165,17 @@ void nfc_unregister_device(struct nfc_dev *dev)
device_lock(&dev->dev);
if (dev->rfkill) {
- rfkill_unregister(dev->rfkill);
- rfkill_destroy(dev->rfkill);
+ rfk = dev->rfkill;
dev->rfkill = NULL;
}
dev->shutting_down = true;
device_unlock(&dev->dev);
+ if (rfk) {
+ rfkill_unregister(rfk);
+ rfkill_destroy(rfk);
+ }
+
if (dev->ops->check_presence) {
timer_delete_sync(&dev->check_pres_timer);
cancel_work_sync(&dev->check_pres_work);
--
2.43.0
When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.
This erratum affects any system topology in which the following
conditions apply:
- The Point of Serialization (PoS) is located downstream of the
interconnect.
- A downstream observer accesses memory directly, bypassing the
interconnect.
Conditions:
This erratum occurs only when all of the following conditions are met:
1. Software executes a data cache maintenance operation, specifically,
a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
IVAC), that hits on unique dirty data in the CPU or DSU cache. This
results in a combined CopyBack and CMO being issued to the
interconnect.
2. The interconnect splits the combined transaction into separate Write
and CMO transactions and returns an early completion response to the
CPU or DSU before the write has completed at the downstream memory
or PoS.
3. A downstream observer accesses the affected memory address after the
early completion response is issued but before the actual memory
write has completed. This allows the observer to read stale data
that has not yet been updated at the PoS or downstream memory.
The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC.. This way of implementation mitigates
performance panalty compared to purly duplicate orignial CMO.
Cc: stable(a)vger.kernel.org # 6.12.x
Signed-off-by: Lucas Wei <lucaswei(a)google.com>
---
Documentation/arch/arm64/silicon-errata.rst | 3 ++
arch/arm64/Kconfig | 19 +++++++++++++
arch/arm64/include/asm/assembler.h | 10 +++++++
arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++++++
arch/arm64/mm/cache.S | 13 ++++++++-
arch/arm64/tools/cpucaps | 1 +
6 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..98efdf528719 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -213,6 +213,9 @@ stable kernels.
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | SI L1 | #4311569 | ARM64_ERRATUM_4311569 |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 93173f0a09c7..89326bb26f48 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1155,6 +1155,25 @@ config ARM64_ERRATUM_3194386
If unsure, say Y.
+config ARM64_ERRATUM_4311569
+ bool "SI L1: 4311569: workaround for premature CMO completion erratum"
+ default y
+ help
+ This option adds the workaround for ARM SI L1 erratum 4311569.
+
+ The erratum of SI L1 can cause an early response to a combined write
+ and cache maintenance operation (WR+CMO) before the operation is fully
+ completed to the Point of Serialization (POS).
+ This can result in a non-I/O coherent agent observing stale data,
+ potentially leading to system instability or incorrect behavior.
+
+ Enabling this option implements a software workaround by inserting a
+ second loop of Cache Maintenance Operation (CMO) immediately following the
+ end of function to do CMOs. This ensures that the data is correctly serialized
+ before the buffer is handed off to a non-coherent agent.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index f0ca7196f6fa..d3d46e5f7188 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -381,6 +381,9 @@ alternative_endif
.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
sub \tmp, \linesz, #1
bic \start, \start, \tmp
+alternative_if ARM64_WORKAROUND_4311569
+ mov \tmp, \start
+alternative_else_nop_endif
.Ldcache_op\@:
.ifc \op, cvau
__dcache_op_workaround_clean_cache \op, \start
@@ -402,6 +405,13 @@ alternative_endif
add \start, \start, \linesz
cmp \start, \end
b.lo .Ldcache_op\@
+alternative_if ARM64_WORKAROUND_4311569
+ .ifnc \op, cvau
+ mov \start, \tmp
+ mov \tmp, xzr
+ cbnz \start, .Ldcache_op\@
+ .endif
+alternative_else_nop_endif
dsb \domain
_cond_uaccess_extable .Ldcache_op\@, \fixup
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 8cb3b575a031..c69678c512f1 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
return (ctr_real != sys) && (ctr_raw != sys);
}
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
+static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
+{
+ static_branch_enable(&arm_si_l1_workaround_4311569);
+ pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
+
+ return 0;
+}
+early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
+
+/*
+ * We have some earlier use cases to call cache maintenance operation functions, for example,
+ * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
+ * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
+ * safe to have the workaround off by default.
+ */
+static bool
+need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ return static_branch_unlikely(&arm_si_l1_workaround_4311569);
+}
+#endif
+
static void
cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *cap)
{
@@ -870,6 +894,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
ERRATA_MIDR_RANGE_LIST(erratum_spec_ssbs_list),
},
#endif
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+ {
+ .capability = ARM64_WORKAROUND_4311569,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = need_arm_si_l1_workaround_4311569,
+ },
+#endif
#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
{
.desc = "ARM errata 2966298, 3117295",
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 503567c864fd..ddf0097624ed 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -143,9 +143,14 @@ SYM_FUNC_END(dcache_clean_pou)
* - end - kernel end address of region
*/
SYM_FUNC_START(__pi_dcache_inval_poc)
+alternative_if ARM64_WORKAROUND_4311569
+ mov x4, x0
+ mov x5, x1
+ mov x6, #1
+alternative_else_nop_endif
dcache_line_size x2, x3
sub x3, x2, #1
- tst x1, x3 // end cache line aligned?
+again: tst x1, x3 // end cache line aligned?
bic x1, x1, x3
b.eq 1f
dc civac, x1 // clean & invalidate D / U line
@@ -158,6 +163,12 @@ SYM_FUNC_START(__pi_dcache_inval_poc)
3: add x0, x0, x2
cmp x0, x1
b.lo 2b
+alternative_if ARM64_WORKAROUND_4311569
+ mov x0, x4
+ mov x1, x5
+ sub x6, x6, #1
+ cbz x6, again
+alternative_else_nop_endif
dsb sy
ret
SYM_FUNC_END(__pi_dcache_inval_poc)
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 0fac75f01534..856b6cf6e71e 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -103,6 +103,7 @@ WORKAROUND_2077057
WORKAROUND_2457168
WORKAROUND_2645198
WORKAROUND_2658417
+WORKAROUND_4311569
WORKAROUND_AMPERE_AC03_CPU_38
WORKAROUND_AMPERE_AC04_CPU_23
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
--
2.52.0.358.g0dd7633a29-goog
If SMT is disabled or a partial SMT state is enabled, when a new kernel
image is loaded for kexec, on reboot the following warning is observed:
kexec: Waking offline cpu 228.
WARNING: CPU: 0 PID: 9062 at arch/powerpc/kexec/core_64.c:223 kexec_prepare_cpus+0x1b0/0x1bc
[snip]
NIP kexec_prepare_cpus+0x1b0/0x1bc
LR kexec_prepare_cpus+0x1a0/0x1bc
Call Trace:
kexec_prepare_cpus+0x1a0/0x1bc (unreliable)
default_machine_kexec+0x160/0x19c
machine_kexec+0x80/0x88
kernel_kexec+0xd0/0x118
__do_sys_reboot+0x210/0x2c4
system_call_exception+0x124/0x320
system_call_vectored_common+0x15c/0x2ec
This occurs as add_cpu() fails due to cpu_bootable() returning false for
CPUs that fail the cpu_smt_thread_allowed() check or non primary
threads if SMT is disabled.
Fix the issue by enabling SMT and resetting the number of SMT threads to
the number of threads per core, before attempting to wake up all present
CPUs.
Fixes: 38253464bc82 ("cpu/SMT: Create topology_smt_thread_allowed()")
Reported-by: Sachin P Bappalige <sachinpb(a)linux.ibm.com>
Cc: stable(a)vger.kernel.org # v6.6+
Signed-off-by: Nysal Jan K.A. <nysal(a)linux.ibm.com>
---
arch/powerpc/kexec/core_64.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 222aa326dace..ff6df43720c4 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -216,6 +216,11 @@ static void wake_offline_cpus(void)
{
int cpu = 0;
+ lock_device_hotplug();
+ cpu_smt_num_threads = threads_per_core;
+ cpu_smt_control = CPU_SMT_ENABLED;
+ unlock_device_hotplug();
+
for_each_present_cpu(cpu) {
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
--
2.51.0
Since Linux v6.7, booting using BootX on an Old World PowerMac produces
an early crash. Stan Johnson writes, "the symptoms are that the screen
goes blank and the backlight stays on, and the system freezes (Linux
doesn't boot)."
Further testing revealed that the failure can be avoided by disabling
CONFIG_BOOTX_TEXT. Bisection revealed that the regression was caused by
a change to the font bitmap pointer that's used when btext_init() begins
painting characters on the display, early in the boot process.
Christophe Leroy explains, "before kernel text is relocated to its final
location ... data is addressed with an offset which is added to the
Global Offset Table (GOT) entries at the start of bootx_init()
by function reloc_got2(). But the pointers that are located inside a
structure are not referenced in the GOT and are therefore not updated by
reloc_got2(). It is therefore needed to apply the offset manually by using
PTRRELOC() macro."
Cc: Cedar Maxwell <cedarmaxwell(a)mac.com>
Cc: Stan Johnson <userm57(a)yahoo.com>
Cc: "Dr. David Alan Gilbert" <linux(a)treblig.org>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: stable(a)vger.kernel.org
Link: https://lists.debian.org/debian-powerpc/2025/10/msg00111.html
Link: https://lore.kernel.org/linuxppc-dev/d81ddca8-c5ee-d583-d579-02b19ed95301@y…
Reported-by: Cedar Maxwell <cedarmaxwell(a)mac.com>
Closes: https://lists.debian.org/debian-powerpc/2025/09/msg00031.html
Bisected-by: Stan Johnson <userm57(a)yahoo.com>
Tested-by: Stan Johnson <userm57(a)yahoo.com>
Fixes: 0ebc7feae79a ("powerpc: Use shared font data")
Suggested-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Finn Thain <fthain(a)linux-m68k.org>
---
Changed since v1:
- Improved commit log entry to better explain the need for PTRRELOC().
---
arch/powerpc/kernel/btext.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index 7f63f1cdc6c3..ca00c4824e31 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -20,6 +20,7 @@
#include <asm/io.h>
#include <asm/processor.h>
#include <asm/udbg.h>
+#include <asm/setup.h>
#define NO_SCROLL
@@ -463,7 +464,7 @@ static noinline void draw_byte(unsigned char c, long locX, long locY)
{
unsigned char *base = calc_base(locX << 3, locY << 4);
unsigned int font_index = c * 16;
- const unsigned char *font = font_sun_8x16.data + font_index;
+ const unsigned char *font = PTRRELOC(font_sun_8x16.data) + font_index;
int rb = dispDeviceRowBytes;
rmci_maybe_on();
--
2.49.1
Dear linux-fbdev, stable,
On Mon, Oct 20, 2025 at 09:47:01PM +0800, Junjie Cao wrote:
> bit_putcs_aligned()/unaligned() derived the glyph pointer from the
> character value masked by 0xff/0x1ff, which may exceed the actual font's
> glyph count and read past the end of the built-in font array.
> Clamp the index to the actual glyph count before computing the address.
>
> This fixes a global out-of-bounds read reported by syzbot.
>
> Reported-by: syzbot+793cf822d213be1a74f2(a)syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=793cf822d213be1a74f2
> Tested-by: syzbot+793cf822d213be1a74f2(a)syzkaller.appspotmail.com
> Signed-off-by: Junjie Cao <junjie.cao(a)intel.com>
This commit is applied to v5.10.247 and causes a regression: when
switching VT with ctrl-alt-f2 the screen is blank or completely filled
with angle characters, then new text is not appearing (or not visible).
This commit is found with git bisect from v5.10.246 to v5.10.247:
0998a6cb232674408a03e8561dc15aa266b2f53b is the first bad commit
commit 0998a6cb232674408a03e8561dc15aa266b2f53b
Author: Junjie Cao <junjie.cao(a)intel.com>
AuthorDate: 2025-10-20 21:47:01 +0800
Commit: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
CommitDate: 2025-12-07 06:08:07 +0900
fbdev: bitblit: bound-check glyph index in bit_putcs*
commit 18c4ef4e765a798b47980555ed665d78b71aeadf upstream.
bit_putcs_aligned()/unaligned() derived the glyph pointer from the
character value masked by 0xff/0x1ff, which may exceed the actual font's
glyph count and read past the end of the built-in font array.
Clamp the index to the actual glyph count before computing the address.
This fixes a global out-of-bounds read reported by syzbot.
Reported-by: syzbot+793cf822d213be1a74f2(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=793cf822d213be1a74f2
Tested-by: syzbot+793cf822d213be1a74f2(a)syzkaller.appspotmail.com
Signed-off-by: Junjie Cao <junjie.cao(a)intel.com>
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
drivers/video/fbdev/core/bitblit.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
The minimal reproducer in cli, after kernel is booted:
date >/dev/tty2; chvt 2
and the date does not appear.
Thanks,
#regzbot introduced: 0998a6cb232674408a03e8561dc15aa266b2f53b
> ---
> v1: https://lore.kernel.org/linux-fbdev/5d237d1a-a528-4205-a4d8-71709134f1e1@su…
> v1 -> v2:
> - Fix indentation and add blank line after declarations with the .pl helper
> - No functional changes
>
> drivers/video/fbdev/core/bitblit.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/video/fbdev/core/bitblit.c b/drivers/video/fbdev/core/bitblit.c
> index 9d2e59796c3e..085ffb44c51a 100644
> --- a/drivers/video/fbdev/core/bitblit.c
> +++ b/drivers/video/fbdev/core/bitblit.c
> @@ -79,12 +79,16 @@ static inline void bit_putcs_aligned(struct vc_data *vc, struct fb_info *info,
> struct fb_image *image, u8 *buf, u8 *dst)
> {
> u16 charmask = vc->vc_hi_font_mask ? 0x1ff : 0xff;
> + unsigned int charcnt = vc->vc_font.charcount;
> u32 idx = vc->vc_font.width >> 3;
> u8 *src;
>
> while (cnt--) {
> - src = vc->vc_font.data + (scr_readw(s++)&
> - charmask)*cellsize;
> + u16 ch = scr_readw(s++) & charmask;
> +
> + if (ch >= charcnt)
> + ch = 0;
> + src = vc->vc_font.data + (unsigned int)ch * cellsize;
>
> if (attr) {
> update_attr(buf, src, attr, vc);
> @@ -112,14 +116,18 @@ static inline void bit_putcs_unaligned(struct vc_data *vc,
> u8 *dst)
> {
> u16 charmask = vc->vc_hi_font_mask ? 0x1ff : 0xff;
> + unsigned int charcnt = vc->vc_font.charcount;
> u32 shift_low = 0, mod = vc->vc_font.width % 8;
> u32 shift_high = 8;
> u32 idx = vc->vc_font.width >> 3;
> u8 *src;
>
> while (cnt--) {
> - src = vc->vc_font.data + (scr_readw(s++)&
> - charmask)*cellsize;
> + u16 ch = scr_readw(s++) & charmask;
> +
> + if (ch >= charcnt)
> + ch = 0;
> + src = vc->vc_font.data + (unsigned int)ch * cellsize;
>
> if (attr) {
> update_attr(buf, src, attr, vc);
> --
> 2.48.1
>
Commit under Fixes enabled loadable module support for the driver under
the assumption that it shall be the sole user of the Cadence Host and
Endpoint library APIs. This assumption guarantees that we won't end up
in a case where the driver is built-in and the library support is built
as a loadable module.
With the introduction of [1], this assumption is no longer valid. The
SG2042 driver could be built as a loadable module, implying that the
Cadence Host library is also selected as a loadable module. However, the
pci-j721e.c driver could be built-in as indicated by CONFIG_PCI_J721E=y
due to which the Cadence Endpoint library is built-in. Despite the
library drivers being built as specified by their respective consumers,
since the 'pci-j721e.c' driver has references to the Cadence Host
library APIs as well, we run into a build error as reported at [0].
Fix this by adding config guards as a temporary workaround. The proper
fix is to split the 'pci-j721e.c' driver into independent Host and
Endpoint drivers as aligned at [2].
Fixes: a2790bf81f0f ("PCI: j721e: Add support to build as a loadable module")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511111705.MZ7ls8Hm-lkp@intel.com/
Cc: <stable(a)vger.kernel.org>
[0]: https://lore.kernel.org/r/202511111705.MZ7ls8Hm-lkp@intel.com/
[1]: commit 1c72774df028 ("PCI: sg2042: Add Sophgo SG2042 PCIe driver")
[2]: https://lore.kernel.org/r/37f6f8ce-12b2-44ee-a94c-f21b29c98821@app.fastmail…
Suggested-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Siddharth Vadapalli <s-vadapalli(a)ti.com>
---
drivers/pci/controller/cadence/pci-j721e.c | 43 +++++++++++++---------
1 file changed, 26 insertions(+), 17 deletions(-)
diff --git a/drivers/pci/controller/cadence/pci-j721e.c b/drivers/pci/controller/cadence/pci-j721e.c
index 5bc5ab20aa6d..67c5e02afccf 100644
--- a/drivers/pci/controller/cadence/pci-j721e.c
+++ b/drivers/pci/controller/cadence/pci-j721e.c
@@ -628,10 +628,12 @@ static int j721e_pcie_probe(struct platform_device *pdev)
gpiod_set_value_cansleep(gpiod, 1);
}
- ret = cdns_pcie_host_setup(rc);
- if (ret < 0) {
- clk_disable_unprepare(pcie->refclk);
- goto err_pcie_setup;
+ if (IS_ENABLED(CONFIG_PCI_J721E_HOST)) {
+ ret = cdns_pcie_host_setup(rc);
+ if (ret < 0) {
+ clk_disable_unprepare(pcie->refclk);
+ goto err_pcie_setup;
+ }
}
break;
@@ -642,9 +644,11 @@ static int j721e_pcie_probe(struct platform_device *pdev)
goto err_get_sync;
}
- ret = cdns_pcie_ep_setup(ep);
- if (ret < 0)
- goto err_pcie_setup;
+ if (IS_ENABLED(CONFIG_PCI_J721E_EP)) {
+ ret = cdns_pcie_ep_setup(ep);
+ if (ret < 0)
+ goto err_pcie_setup;
+ }
break;
}
@@ -669,10 +673,11 @@ static void j721e_pcie_remove(struct platform_device *pdev)
struct cdns_pcie_ep *ep;
struct cdns_pcie_rc *rc;
- if (pcie->mode == PCI_MODE_RC) {
+ if (IS_ENABLED(CONFIG_PCI_J721E_HOST) &&
+ pcie->mode == PCI_MODE_RC) {
rc = container_of(cdns_pcie, struct cdns_pcie_rc, pcie);
cdns_pcie_host_disable(rc);
- } else {
+ } else if (IS_ENABLED(CONFIG_PCI_J721E_EP)) {
ep = container_of(cdns_pcie, struct cdns_pcie_ep, pcie);
cdns_pcie_ep_disable(ep);
}
@@ -739,10 +744,12 @@ static int j721e_pcie_resume_noirq(struct device *dev)
gpiod_set_value_cansleep(pcie->reset_gpio, 1);
}
- ret = cdns_pcie_host_link_setup(rc);
- if (ret < 0) {
- clk_disable_unprepare(pcie->refclk);
- return ret;
+ if (IS_ENABLED(CONFIG_PCI_J721E_HOST)) {
+ ret = cdns_pcie_host_link_setup(rc);
+ if (ret < 0) {
+ clk_disable_unprepare(pcie->refclk);
+ return ret;
+ }
}
/*
@@ -752,10 +759,12 @@ static int j721e_pcie_resume_noirq(struct device *dev)
for (enum cdns_pcie_rp_bar bar = RP_BAR0; bar <= RP_NO_BAR; bar++)
rc->avail_ib_bar[bar] = true;
- ret = cdns_pcie_host_init(rc);
- if (ret) {
- clk_disable_unprepare(pcie->refclk);
- return ret;
+ if (IS_ENABLED(CONFIG_PCI_J721E_HOST)) {
+ ret = cdns_pcie_host_init(rc);
+ if (ret) {
+ clk_disable_unprepare(pcie->refclk);
+ return ret;
+ }
}
}
--
2.51.1
Hi Alekséi,
On 2025/12/26 20:17, Alekséi Naidénov wrote:
> Hello,
>
> I am reporting a regression in the 6.12 stable series related to EROFS
> file-backed mounts.
>
> After updating from Linux 6.12.62 to 6.12.63, a previously working setup
> using OSTree-backed composefs mounts as Podman rootfs no longer works.
>
> The regression appears to be caused by the following commit:
>
> 34447aeedbaea8f9aad3da5b07030a1c0e124639 ("erofs: limit the level of fs
> stacking for file-backed mounts")
> (backport of upstream commit d53cd891f0e4311889349fff3a784dc552f814b9)
>
> ## Setup description
>
> We use OSTree to materialize filesystem trees, which are mounted via
> composefs (EROFS + overlayfs) as a read-only filesystem. This mounted
> composefs tree is then used as a Podman rootfs, with Podman mounting a
> writable overlayfs on top for each container.
>
> This setup worked correctly on Linux 6.12.62 and earlier.
The following issue just tracks this:
https://github.com/coreos/fedora-coreos-tracker/issues/2087
I don't think more information is needed, but I really think the EROFS
commit is needed to avoid kernel stack overflow due to nested fses.
>
> In short, the stacking looks like:
>
> EROFS (file-backed)
> -> composefs (EROFS + overlayfs with ostree repo as datadir, read-only)
> -> Podman rootfs overlays (RW upperdir)
>
> There is no recursive or self-stacking of EROFS.
Yes, but there are two overlayfs + one file-backed EROFS already, and
it exceeds FILESYSTEM_MAX_STACK_DEPTH.
That is overlayfs refuses to mount the nested fses.
Thanks,
Gao Xiang
Hello,
I am reporting a regression in the 6.12 stable series related to EROFS file-backed mounts.
After updating from Linux 6.12.62 to 6.12.63, a previously working setup using OSTree-backed
composefs mounts as Podman rootfs no longer works.
The regression appears to be caused by the following commit:
34447aeedbaea8f9aad3da5b07030a1c0e124639 ("erofs: limit the level of fs stacking for file-backed mounts")
(backport of upstream commit d53cd891f0e4311889349fff3a784dc552f814b9)
## Setup description
We use OSTree to materialize filesystem trees, which are mounted via composefs (EROFS + overlayfs)
as a read-only filesystem. This mounted composefs tree is then used as a Podman rootfs, with
Podman mounting a writable overlayfs on top for each container.
This setup worked correctly on Linux 6.12.62 and earlier.
In short, the stacking looks like:
EROFS (file-backed)
-> composefs (EROFS + overlayfs with ostree repo as datadir, read-only)
-> Podman rootfs overlays (RW upperdir)
There is no recursive or self-stacking of EROFS.
## Working case (6.12.62)
The composefs mount exists and Podman can successfully start a container using it as rootfs.
Example composefs mount:
❯ mount | grep a31550cc69eef0e3227fa700623250592711fdfd51b5403a74288b55e89e7e8c
a31550cc69eef0e3227fa700623250592711fdfd51b5403a74288b55e89e7e8c on /home/growler/.local/share/containers/ostree/a31550cc69eef0e3227fa700623250592711fdfd51b5403a74288b55e89e7e8c type overlay (ro,noatime,lowerdir+=/proc/self/fd/10,datadir+=/proc/self/fd/7,redirect_dir=on,metacopy=on)
(lowedir is a handle for the erofs file-backed mount, datadir is a handle for the ostree
repository objects directory)
Running Podman:
❯ podman run --rm -it --rootfs $HOME/.local/share/containers/ostree/a31550cc69eef0e3227fa700623250592711fdfd51b5403a74288b55e89e7e8c:O bash -l
root@d691e785bba3:/# uname -a
Linux d691e785bba3 6.12.62 #1-NixOS SMP PREEMPT_DYNAMIC Fri Dec 12 17:37:22 UTC 2025 x86_64 GNU/Linux
root@d691e785bba3:/#
(succeed)
## Failing case (6.12.63)
After upgrading to 6.12.63, the same command fails when Podman tries to create the writable overlay
on top of the composefs mount.
Error:
❯ podman run --rm -it --rootfs $HOME/.local/share/containers/ostree/a31550cc69eef0e3227fa700623250592711fdfd51b5403a74288b55e89e7e8c:O bash -l
Error: rootfs-overlay: creating overlay failed "/home/growler/.local/share/containers/ostree/a31550cc69eef0e3227fa700623250592711fdfd51b5403a74288b55e89e7e8c" from native overlay: mount overlay:/home/growler/.local/share/containers/storage/overlay-containers/a0851294d6b5b18062d4f5316032ee84d7bae700ea7d12c5be949d9e1999b0a1/rootfs/merge, flags: 0x4, data: lowerdir=/home/growler/.local/share/containers/ostree/a31550cc69eef0e3227fa700623250592711fdfd51b5403a74288b55e89e7e8c,upperdir=/home/growler/.local/share/containers/storage/overlay-containers/a0851294d6b5b18062d4f5316032ee84d7bae700ea7d12c5be949d9e1999b0a1/rootfs/upper,workdir=/home/growler/.local/share/containers/storage/overlay-containers/a0851294d6b5b18062d4f5316032ee84d7bae700ea7d12c5be949d9e1999b0a1/rootfs/work,userxattr: invalid argument
❯ uname -a
Linux ci-node-09 6.12.63 #1-NixOS SMP PREEMPT_DYNAMIC Thu Dec 18 12:55:23 UTC 2025 x86_64 GNU/Linux
## Expected behavior
Using a composefs (EROFS + overlayfs) read-only mount as the lowerdir for a container rootfs overlay
should continue to work as it did in 6.12.62.
## Actual behavior
Overlayfs mounting fails with EINVAL when stacking on top of the composefs mount backed by EROFS.
## Notes
The setup does not involve recursive EROFS mounting or unbounded stacking depth. It appears the new stacking
limit rejects this valid and previously supported container use case.
Please let me know if further details or testing would be helpful.
Thank you,
--
Alekséi Nadénov
Hi maintainers,
This is a v6.6 backport mainly for an upstream commit `5ace7ef87f05 net:
openvswitch: fix middle attribute validation in push_nsh() action`.
I built the kernel then tested it with selftest. The selftest that ran
with a a bunch of SyntaxWarning.
Example:
/ovs-dpctl.py:598: SyntaxWarning: invalid escape sequence '\d'
actstr, ":", "(\d+)", int, False
/ovs-dpctl.py:601: SyntaxWarning: invalid escape sequence '\d'
actstr, "-", "(\d+)", int, False
/ovs-dpctl.py:505: SyntaxWarning: invalid escape sequence '\d'
elif parse_starts_block(actstr, "^(\d+)", False, True):
This error was then easily fixed with another minimal backport for the
file tools/testing/selftests/net/openvswitch/ovs-dpctl.py. Hence the
series.
Both patches was applied cleanly and was tested with selftest and passed
though the timeout had to be increased for drop_reason to pass.
Adrian Moreno (1):
selftests: openvswitch: Fix escape chars in regexp.
Ilya Maximets (1):
net: openvswitch: fix middle attribute validation in push_nsh() action
net/openvswitch/flow_netlink.c | 13 ++++++++++---
.../selftests/net/openvswitch/ovs-dpctl.py | 16 ++++++++--------
2 files changed, 18 insertions(+), 11 deletions(-)
--
2.52.0
A potential memory leak exists in the gssx_dec_status function (in
net/sunrpc/auth_gss/gss_rpc_xdr.c) and its dependent gssx_dec_buffer
function. The leak occurs when gssx_dec_buffer allocates memory via
kmemdup for gssx_buffer fields, but the allocated memory is not freed
in error paths of the chained decoding process in gssx_dec_status.
The gssx_dec_buffer function allocates memory using kmemdup when
buf->data is NULL (to store decoded XDR buffer data). This allocation
is not paired with a release mechanism in case of subsequent decoding
failures.
gssx_dec_status sequentially decodes multiple gssx_buffer fields
(e.g., mech, major_status_string, minor_status_string, server_ctx) by
calling gssx_dec_buffer. If a later decoding step fails (e.g.,
gssx_dec_buffer returns -ENOSPC or -ENOMEM), the function immediately
returns the error without freeing the memory allocated for earlier
gssx_buffer fields. This results in persistent kernel memory leaks.
This memory allocation is conditional. I traced upward through the
callers gssx_dec_status and found that it is ultimately invoked by the
interface gssx_dec_accept_sec_context. Although I have not identified
the specific code execution path that triggers this memory leak, I
believe this coding pattern is highly prone to causing confusion
between callers and callees, which in turn leads to memory leaks.
Relevant code links:
https://github.com/torvalds/linux/blob/ccd1cdca5cd433c8a5dff78b69a79b31d9b7…https://github.com/torvalds/linux/blob/ccd1cdca5cd433c8a5dff78b69a79b31d9b7…
I have searched Bugzilla, lore.kernel.org, and client.linux-nfs.org,
but no related issues were found.
syzbot reported a KASAN out-of-bounds Read in ext4_xattr_set_entry()[1].
When xattr_find_entry() returns -ENODATA, search.here still points to the
position after the last valid entry. ext4_xattr_block_set() clones the xattr
block because the original block maybe shared and must not be modified in
place.
In the clone_block, search.here is recomputed unconditionally from the old
offset, which may place it past search.first. This results in a negative
reset size and an out-of-bounds memmove() in ext4_xattr_set_entry().
Fix this by initializing search.here correctly when search.not_found is set.
[1] https://syzkaller.appspot.com/bug?extid=f792df426ff0f5ceb8d1
Fixes: fd48e9acdf2 (ext4: Unindent codeblock in ext4_xattr_block_set)
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+f792df426ff0f5ceb8d1(a)syzkaller.appspotmail.com
Signed-off-by: Jinchao Wang <wangjinchao600(a)gmail.com>
---
fs/ext4/xattr.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 2e02efbddaac..cc30abeb7f30 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1980,7 +1980,10 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
goto cleanup;
s->first = ENTRY(header(s->base)+1);
header(s->base)->h_refcount = cpu_to_le32(1);
- s->here = ENTRY(s->base + offset);
+ if (s->not_found)
+ s->here = s->first;
+ else
+ s->here = ENTRY(s->base + offset);
s->end = s->base + bs->bh->b_size;
/*
--
2.43.0
From: Shigeru Yoshida <syoshida(a)redhat.com>
commit fc1092f51567277509563800a3c56732070b6aa4 upstream.
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb()
tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a
race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL
while __ip_make_skb() is running, the function will access icmphdr in the
skb even if it is not included. This causes the issue reported by KMSAN.
Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL
on the socket.
Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These
are union in struct flowi4 and are implicitly initialized by
flowi4_init_output(), but we should not rely on specific union layout.
Initialize these explicitly in raw_sendmsg().
[1]
BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
__ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
ip_finish_skb include/net/ip.h:243 [inline]
ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508
raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1318 [inline]
__ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128
ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365
raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
Fixes: 99e5acae193e ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()")
Reported-by: syzkaller <syzkaller(a)googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida(a)redhat.com>
Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Shubham Kulkarni <skulkarni(a)mvista.com>
---
Referred stable v6.1.y version of the patch to generate this one
[ v6.1 link: https://github.com/gregkh/linux/commit/55bf541e018b76b3750cb6c6ea18c46e1ac5… ]
---
net/ipv4/ip_output.c | 3 ++-
net/ipv4/raw.c | 3 +++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 1e430e135aa6..70c316c0537f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1572,7 +1572,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
* so icmphdr does not in skb linear region and can not get icmp_type
* by icmp_hdr(skb)->type.
*/
- if (sk->sk_type == SOCK_RAW && !inet_sk(sk)->hdrincl)
+ if (sk->sk_type == SOCK_RAW &&
+ !(fl4->flowi4_flags & FLOWI_FLAG_KNOWN_NH))
icmp_type = fl4->fl4_icmp_type;
else
icmp_type = icmp_hdr(skb)->type;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 650da4d8f7ad..5a8d5ebe6590 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -634,6 +634,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
(hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
+ fl4.fl4_icmp_type = 0;
+ fl4.fl4_icmp_code = 0;
+
if (!hdrincl) {
rfv.msg = msg;
rfv.hlen = 0;
--
2.25.1
__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.
Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.
Reported-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f
Tested-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi(a)gmail.com>
---
io_uring/openclose.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index bfeb91b31bba..15dde9bd6ff6 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -73,13 +73,13 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
open->filename = NULL;
return ret;
}
+ req->flags |= REQ_F_NEED_CLEANUP;
open->file_slot = READ_ONCE(sqe->file_index);
if (open->file_slot && (open->how.flags & O_CLOEXEC))
return -EINVAL;
open->nofile = rlimit(RLIMIT_NOFILE);
- req->flags |= REQ_F_NEED_CLEANUP;
if (io_openat_force_async(open))
req->flags |= REQ_F_FORCE_ASYNC;
return 0;
base-commit: b927546677c876e26eba308550207c2ddf812a43
--
2.34.1
Hello,
please backport
commit fbf5892df21a8ccfcb2fda0fd65bc3169c89ed28
Author: Martin Nybo Andersen <tweek(a)tweek.dk>
Date: Fri Sep 15 12:15:39 2023 +0200
kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
Kmod is now (since kmod commit 09c9f8c5df04 ("libkmod: Use kernel
decompression when available")) using the kernel decompressor, when
loading compressed modules.
However, the kernel XZ decompressor is XZ Embedded, which doesn't
handle CRC64 and dictionaries larger than 1MiB.
Use CRC32 and 1MiB dictionary when XZ compressing and installing
kernel modules.
to the 6.1 stable kernel, and possibly older ones as well.
The commit message actually has it all, so just my story: There's a
hardware that has or had issues with never kernels (no time to check),
my kernel for this board is usually static. But after building a kernel
with xz-compressed modules, they wouldn't load but trigger
"decompression failed with status 6". Investigation led to a CRC64 check
for these files, and eventually to the above commit.
The commit applies (with an offset), the resulting modules work as
expected.
Kernel 6.6 and newer already have that commit. Older kernels could
possibly benefit from this as well, I haven't checked.
Kind regards,
Christoph
Memory region assignment in ath11k_qmi_assign_target_mem_chunk()
assumes that:
1. firmware will make a HOST_DDR_REGION_TYPE request, and
2. this request is processed before CALDB_MEM_REGION_TYPE
In this case CALDB_MEM_REGION_TYPE, can safely be assigned immediately
after the host region.
However, if the HOST_DDR_REGION_TYPE request is not made, or the
reserved-memory node is not present, then res.start and res.end are 0,
and host_ddr_sz remains uninitialized. The physical address should
fall back to ATH11K_QMI_CALDB_ADDRESS. That doesn't happen:
resource_size(&res) returns 1 for an empty resource, and thus the if
clause never takes the fallback path. ab->qmi.target_mem[idx].paddr
is assigned the uninitialized value of host_ddr_sz + 0 (res.start).
Use "if (res.end > res.start)" for the predicate, which correctly
falls back to ATH11K_QMI_CALDB_ADDRESS.
Fixes: 900730dc4705 ("wifi: ath: Use of_reserved_mem_region_to_resource() for "memory-region"")
Cc: stable(a)vger.kernel.org # v6.18
Signed-off-by: Alexandru Gagniuc <mr.nuke.me(a)gmail.com>
---
drivers/net/wireless/ath/ath11k/qmi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/qmi.c b/drivers/net/wireless/ath/ath11k/qmi.c
index aea56c38bf8f3..6cc26d1c1e2a4 100644
--- a/drivers/net/wireless/ath/ath11k/qmi.c
+++ b/drivers/net/wireless/ath/ath11k/qmi.c
@@ -2054,7 +2054,7 @@ static int ath11k_qmi_assign_target_mem_chunk(struct ath11k_base *ab)
return ret;
}
- if (res.end - res.start + 1 < ab->qmi.target_mem[i].size) {
+ if (resource_size(&res) < ab->qmi.target_mem[i].size) {
ath11k_dbg(ab, ATH11K_DBG_QMI,
"fail to assign memory of sz\n");
return -EINVAL;
@@ -2086,7 +2086,7 @@ static int ath11k_qmi_assign_target_mem_chunk(struct ath11k_base *ab)
}
if (ath11k_core_coldboot_cal_support(ab)) {
- if (resource_size(&res)) {
+ if (res.end > res.start) {
ab->qmi.target_mem[idx].paddr =
res.start + host_ddr_sz;
ab->qmi.target_mem[idx].iaddr =
--
2.45.1
__io_openat_prep() allocates a struct filename using getname(), but
it isn't freed in case the present file is installed in the fixed file
table and simultaneously, it has the flag O_CLOEXEC set in the
open->how.flags field.
This is an erroneous condition, since for a file installed in the fixed
file table, it won't be installed in the normal file table, due to which
the file cannot support close on exec. Earlier, the code just returned
-EINVAL error code for this condition, however, the memory allocated for
that struct filename wasn't freed, resulting in a memory leak.
Hence, the case of file being installed in the fixed file table as well
as having O_CLOEXEC flag in open->how.flags set, is adressed by using
putname() to release the memory allocated to the struct filename, then
setting the field open->filename to NULL, and after that, returning
-EINVAL.
Reported-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f
Tested-by: syzbot+00e61c43eb5e4740438f(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi(a)gmail.com>
---
io_uring/openclose.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index bfeb91b31bba..fc190a3d8112 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -75,8 +75,11 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
}
open->file_slot = READ_ONCE(sqe->file_index);
- if (open->file_slot && (open->how.flags & O_CLOEXEC))
+ if (open->file_slot && (open->how.flags & O_CLOEXEC)) {
+ putname(open->filename);
+ open->filename = NULL;
return -EINVAL;
+ }
open->nofile = rlimit(RLIMIT_NOFILE);
req->flags |= REQ_F_NEED_CLEANUP;
base-commit: b927546677c876e26eba308550207c2ddf812a43
--
2.34.1
Hi Greg, thanks for looking into this..
The full commit hash is 807221d3c5ff6e3c91ff57bc82a0b7a541462e20
Note: apologies if you received this multiple times, the previous one
got bounced due to html
Cheers,
JP
________________________________
From: Greg KH <gregkh(a)linuxfoundation.org>
Sent: Tuesday, December 23, 2025 6:39:22 PM
To: JP Dehollain <jpdehollain(a)gmail.com>
Cc: stable(a)vger.kernel.org <stable(a)vger.kernel.org>
Subject: Re: Request to add mainline merged patch to stable kernels
On Tue, Dec 23, 2025 at 04:05:24PM +1100, JP Dehollain wrote:
> Hello,
> I recently used the patch misc: rtsx_pci: Add separate CD/WP pin
> polarity reversal support with commit ID 807221d, to fix a bug causing
> the cardreader driver to always load sd cards in read-only mode.
> On the suggestion of the driver maintainer, I am requesting that this
> patch be applied to all stable kernel versions, as it is currently
> only applied to >=6.18.
> Thanks,
> JP
>
What is the git id of the commit you are looking to have backported?
thanks,
greg k-h
When compiling the device tree for the Integrator/AP with IM-PD1, the
following warning is observed regarding the display controller node:
arch/arm/boot/dts/arm/integratorap-im-pd1.dts:251.3-14: Warning
(dma_ranges_format):
/bus@c0000000/bus@c0000000/display@1000000:dma-ranges: empty
"dma-ranges" property but its #address-cells (2) differs from
/bus@c0000000/bus@c0000000 (1)
The display node specifies an empty "dma-ranges" property, intended to
describe a 1:1 identity mapping. However, the node lacks explicit
"#address-cells" and "#size-cells" properties. In this case, the device
tree compiler defaults the address cells to 2 (64-bit), which conflicts
with the parent bus configuration (32-bit, 1 cell).
Fix this by explicitly defining "#address-cells" and "#size-cells" as
1. This matches the 32-bit architecture of the Integrator platform and
ensures the address translation range is correctly parsed by the
compiler.
Fixes: 7bea67a99430 ("ARM: dts: integrator: Fix DMA ranges")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
---
arch/arm/boot/dts/arm/integratorap-im-pd1.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm/boot/dts/arm/integratorap-im-pd1.dts b/arch/arm/boot/dts/arm/integratorap-im-pd1.dts
index db13e09f2fab..6d90d9a42dc9 100644
--- a/arch/arm/boot/dts/arm/integratorap-im-pd1.dts
+++ b/arch/arm/boot/dts/arm/integratorap-im-pd1.dts
@@ -248,6 +248,8 @@ display@1000000 {
/* 640x480 16bpp @ 25.175MHz is 36827428 bytes/s */
max-memory-bandwidth = <40000000>;
memory-region = <&impd1_ram>;
+ #address-cells = <1>;
+ #size-cells = <1>;
dma-ranges;
port@0 {
--
2.52.0.177.g9f829587af-goog
Hi Andrew,
Here are 2 fixes for missing mm_cid fields for init_mm and efi_mm static
initialization. The renaming of cpu_bitmap to flexible_array (patch 2)
is needed for patch 3.
Those are relevant for mainline, with CC stable. They are based on
v6.19-rc2.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Mathieu Desnoyers (3):
mm: Add missing static initializer for init_mm::mm_cid.lock
mm: Rename cpu_bitmap field to flexible_array
mm: Take into account mm_cid size for mm_struct static definitions
drivers/firmware/efi/efi.c | 2 +-
include/linux/mm_types.h | 18 +++++++++++++-----
mm/init-mm.c | 5 ++++-
3 files changed, 18 insertions(+), 7 deletions(-)
--
2.39.5
A process issuing blocking writes to a virtio console may get stuck
indefinitely if another thread polls the device. Here is how to trigger
the bug:
- Thread A writes to the port until the virtqueue is full.
- Thread A calls wait_port_writable() and goes to sleep, waiting on
port->waitqueue.
- The host processes some of the write, marks buffers as used and raises
an interrupt.
- Before the interrupt is serviced, thread B executes port_fops_poll().
This calls reclaim_consumed_buffers() via will_write_block() and
consumes all used buffers.
- The interrupt is serviced. vring_interrupt() finds no used buffers
via more_used() and returns without waking port->waitqueue.
- Thread A is still in wait_event(port->waitqueue), waiting for a
wakeup that never arrives.
The crux is that invoking reclaim_consumed_buffers() may cause
vring_interrupt() to omit wakeups.
Fix this by calling reclaim_consumed_buffers() in out_int() before
waking. This is similar to the call to discard_port_data() in
in_intr() which also frees buffer from a non-sleepable context.
This in turn guarantees that port->outvq_full is up to date when
handling polling. Since in_intr() already populates port->inbuf we
use that to avoid changing reader state.
Cc: stable(a)vger.kernel.org
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
As far as I can tell all currently maintained stable series kernels need
this commit. Applies and builds cleanly on 5.10.247, verified to fix
the issue.
---
Changes in v3:
- Use spin_lock_irq in port_fops_poll (Arnd)
- Use spin_lock in out_intr (Arnd)
- Link to v2: https://lore.kernel.org/r/20251222-virtio-console-lost-wakeup-v2-1-5de93cb3…
Changes in v2:
- Call reclaim_consumed_buffers() in out_intr instead of
issuing another wake.
- Link to v1: https://lore.kernel.org/r/20251215-virtio-console-lost-wakeup-v1-1-79a5c578…
---
drivers/char/virtio_console.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 088182e54debd6029ea2c2a5542d7a28500e67b8..e6048e04c3b23d008caa2a1d31d4ac6b2841045f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -971,10 +971,17 @@ static __poll_t port_fops_poll(struct file *filp, poll_table *wait)
return EPOLLHUP;
}
ret = 0;
- if (!will_read_block(port))
+
+ spin_lock_irq(&port->inbuf_lock);
+ if (port->inbuf)
ret |= EPOLLIN | EPOLLRDNORM;
- if (!will_write_block(port))
+ spin_unlock_irq(&port->inbuf_lock);
+
+ spin_lock_irq(&port->outvq_lock);
+ if (!port->outvq_full)
ret |= EPOLLOUT;
+ spin_unlock_irq(&port->outvq_lock);
+
if (!port->host_connected)
ret |= EPOLLHUP;
@@ -1705,6 +1712,10 @@ static void out_intr(struct virtqueue *vq)
return;
}
+ spin_lock(&port->outvq_lock);
+ reclaim_consumed_buffers(port);
+ spin_unlock(&port->outvq_lock);
+
wake_up_interruptible(&port->waitqueue);
}
---
base-commit: d358e5254674b70f34c847715ca509e46eb81e6f
change-id: 20251215-virtio-console-lost-wakeup-0f566c5cd35f
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
It is checked almost always in dpu_encoder_phys_wb_setup_ctl(), but in a
single place the check is missing.
Also use convenient locals instead of phys_enc->* where available.
Cc: stable(a)vger.kernel.org
Fixes: d7d0e73f7de33 ("drm/msm/dpu: introduce the dpu_encoder_phys_* for writeback")
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
index 46f348972a97..6d28f2281c76 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
@@ -247,14 +247,12 @@ static void dpu_encoder_phys_wb_setup_ctl(struct dpu_encoder_phys *phys_enc)
if (hw_cdm)
intf_cfg.cdm = hw_cdm->idx;
- if (phys_enc->hw_pp->merge_3d && phys_enc->hw_pp->merge_3d->ops.setup_3d_mode)
- phys_enc->hw_pp->merge_3d->ops.setup_3d_mode(phys_enc->hw_pp->merge_3d,
- mode_3d);
+ if (hw_pp && hw_pp->merge_3d && hw_pp->merge_3d->ops.setup_3d_mode)
+ hw_pp->merge_3d->ops.setup_3d_mode(hw_pp->merge_3d, mode_3d);
/* setup which pp blk will connect to this wb */
- if (hw_pp && phys_enc->hw_wb->ops.bind_pingpong_blk)
- phys_enc->hw_wb->ops.bind_pingpong_blk(phys_enc->hw_wb,
- phys_enc->hw_pp->idx);
+ if (hw_pp && hw_wb->ops.bind_pingpong_blk)
+ hw_wb->ops.bind_pingpong_blk(hw_wb, hw_pp->idx);
phys_enc->hw_ctl->ops.setup_intf_cfg(phys_enc->hw_ctl, &intf_cfg);
} else if (phys_enc->hw_ctl && phys_enc->hw_ctl->ops.setup_intf_cfg) {
--
2.34.1
When starting multi-core loongarch virtualization on loongarch physical
machine, loading livepatch on the physical machine will cause an error
similar to the following:
[ 411.686289] livepatch: klp_try_switch_task: CPU 31/KVM:3116 has an
unreliable stack
The specific test steps are as follows:
1.Start a multi-core virtual machine on a physical machine
2.Enter the following command on the physical machine to turn on the debug
switch:
echo "file kernel/livepatch/transition.c +p" > /sys/kernel/debug/\
dynamic_debug/control
3.Load livepatch:
modprobe livepatch-sample
Through the above steps, similar prints can be viewed in dmesg.
The reason for this issue is that the code of the kvm_exc_entry function
was copied in the function kvm_loongarch_env_init. When the cpu needs to
execute kvm_exc_entry, it will switch to the copied address for execution.
The new address of the kvm_exc_entry function cannot be recognized in ORC,
which eventually leads to the arch_stack_walk_reliable function returning
an error and printing an exception message.
To solve the above problems, we directly compile the switch.S file into
the kernel instead of the module. In this way, the function kvm_exc_entry
will no longer need to be copied.
changlog:
V2<-V1:
1.Rollback the modification of function parameter types such as
kvm_save_fpu. In the asm-prototypes.h header file, only the parameter types
it depends on are included
Cc: Huacai Chen <chenhuacai(a)kernel.org>
Cc: WANG Xuerui <kernel(a)xen0n.name>
Cc: Tianrui Zhao <zhaotianrui(a)loongson.cn>
Cc: Bibo Mao <maobibo(a)loongson.cn>
Cc: Charlie Jenkins <charlie(a)rivosinc.com>
Cc: Xianglai Li <lixianglai(a)loongson.cn>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Xianglai Li (2):
LoongArch: KVM: Compile the switch.S file directly into the kernel
LoongArch: KVM: fix "unreliable stack" issue
arch/loongarch/Kbuild | 2 +-
arch/loongarch/include/asm/asm-prototypes.h | 21 +++++++++++++
arch/loongarch/include/asm/kvm_host.h | 3 --
arch/loongarch/kvm/Makefile | 2 +-
arch/loongarch/kvm/main.c | 35 ++-------------------
arch/loongarch/kvm/switch.S | 24 +++++++++++---
6 files changed, 45 insertions(+), 42 deletions(-)
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
--
2.39.1
In vmw_compat_shader_add(), the return value check of vmw_shader_alloc()
is not proper. Modify the check for the return pointer 'res'.
Found by code review and compiled on ubuntu 20.04.
Fixes: 18e4a4669c50 ("drm/vmwgfx: Fix compat shader namespace")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang(a)isrc.iscas.ac.cn>
---
drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
index 69dfe69ce0f8..7ed938710342 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
@@ -923,8 +923,10 @@ int vmw_compat_shader_add(struct vmw_private *dev_priv,
ttm_bo_unreserve(&buf->tbo);
res = vmw_shader_alloc(dev_priv, buf, size, 0, shader_type);
- if (unlikely(ret != 0))
+ if (IS_ERR(res)) {
+ ret = PTR_ERR(res);
goto no_reserve;
+ }
ret = vmw_cmdbuf_res_add(man, vmw_cmdbuf_res_shader,
vmw_shader_key(user_key, shader_type),
--
2.25.1
From: Zack Rusin <zack.rusin(a)broadcom.com>
[ Upstream commit 5ac2c0279053a2c5265d46903432fb26ae2d0da2 ]
Check that the resource which is converted to a surface exists before
trying to use the cursor snooper on it.
vmw_cmd_res_check allows explicit invalid (SVGA3D_INVALID_ID) identifiers
because some svga commands accept SVGA3D_INVALID_ID to mean "no surface",
unfortunately functions that accept the actual surfaces as objects might
(and in case of the cursor snooper, do not) be able to handle null
objects. Make sure that we validate not only the identifier (via the
vmw_cmd_res_check) but also check that the actual resource exists before
trying to do something with it.
Fixes unchecked null-ptr reference in the snooping code.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: c0951b797e7d ("drm/vmwgfx: Refactor resource management")
Reported-by: Kuzey Arda Bulut <kuzeyardabulut(a)gmail.com>
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Reviewed-by: Ian Forbes <ian.forbes(a)broadcom.com>
Link: https://lore.kernel.org/r/20250917153655.1968583-1-zack.rusin@broadcom.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on v5.10.y-v6.1.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 0d12d6af6..5d3827b5d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -1507,6 +1507,7 @@ static int vmw_cmd_dma(struct vmw_private *dev_priv,
SVGA3dCmdHeader *header)
{
struct vmw_buffer_object *vmw_bo = NULL;
+ struct vmw_resource *res;
struct vmw_surface *srf = NULL;
VMW_DECLARE_CMD_VAR(*cmd, SVGA3dCmdSurfaceDMA);
int ret;
@@ -1542,18 +1543,24 @@ static int vmw_cmd_dma(struct vmw_private *dev_priv,
dirty = (cmd->body.transfer == SVGA3D_WRITE_HOST_VRAM) ?
VMW_RES_DIRTY_SET : 0;
- ret = vmw_cmd_res_check(dev_priv, sw_context, vmw_res_surface,
- dirty, user_surface_converter,
- &cmd->body.host.sid, NULL);
+ ret = vmw_cmd_res_check(dev_priv, sw_context, vmw_res_surface, dirty,
+ user_surface_converter, &cmd->body.host.sid,
+ NULL);
if (unlikely(ret != 0)) {
if (unlikely(ret != -ERESTARTSYS))
VMW_DEBUG_USER("could not find surface for DMA.\n");
return ret;
}
- srf = vmw_res_to_srf(sw_context->res_cache[vmw_res_surface].res);
+ res = sw_context->res_cache[vmw_res_surface].res;
+ if (!res) {
+ VMW_DEBUG_USER("Invalid DMA surface.\n");
+ return -EINVAL;
+ }
- vmw_kms_cursor_snoop(srf, sw_context->fp->tfile, &vmw_bo->base, header);
+ srf = vmw_res_to_srf(res);
+ vmw_kms_cursor_snoop(srf, sw_context->fp->tfile, &vmw_bo->base,
+ header);
return 0;
}
--
2.40.4
The functions ocfs2_reserve_suballoc_bits(), ocfs2_block_group_alloc(),
ocfs2_block_group_alloc_contig() and ocfs2_find_smallest_chain() trust
the on-disk values related to the allocation chain. However, KASAN bug
was triggered in these functions, and the kernel panicked when accessing
redzoned memory. This occurred due to the corrupted value of `cl_count`
field of `struct ocfs2_chain_list`. Upon analysis, the value of `cl_count`
was observed to be overwhemingly large, due to which the code accessed
redzoned memory.
The fix introduces an if statement which validates value of `cl_count`
(both lower and upper bounds). Lower bound check ensures the value of
`cl_count` is not zero and upper bound check ensures that the value of
`cl_count` is in the range such that it has a value less than the total
size of struct ocfs2_chain_list and maximum number of chains that can be
present, so as to fill one block.
Reported-by: syzbot+af14efe17dfa46173239(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=af14efe17dfa46173239
Tested-by: syzbot+af14efe17dfa46173239(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi(a)gmail.com>
---
fs/ocfs2/suballoc.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index f7b483f0de2a..7ea63e9cc4f8 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -671,6 +671,21 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
BUG_ON(ocfs2_is_cluster_bitmap(alloc_inode));
cl = &fe->id2.i_chain;
+ unsigned int block_size = osb->sb->s_blocksize;
+ unsigned int max_cl_count =
+ (block_size - offsetof(struct ocfs2_chain_list, cl_recs)) /
+ sizeof(struct ocfs2_chain_rec);
+
+ if (!le16_to_cpu(cl->cl_count) ||
+ le16_to_cpu(cl->cl_count) > max_cl_count) {
+ ocfs2_error(osb->sb,
+ "Invalid chain list: cl_count %u "
+ "exceeds max %u",
+ le16_to_cpu(cl->cl_count), max_cl_count);
+ status = -EIO;
+ goto bail;
+ }
+
status = ocfs2_reserve_clusters_with_limit(osb,
le16_to_cpu(cl->cl_cpg),
max_block, flags, &ac);
base-commit: 36c254515dc6592c44db77b84908358979dd6b50
--
2.34.1
Hi, all
We hit hard-lockups when the Intel IOMMU waits indefinitely for an ATS invalidation
that cannot complete, especially under GDR high-load conditions.
1. Hard-lock when a passthrough PCIe NIC with ATS enabled link-down in Intel IOMMU
non-scalable mode. Two scenarios exist: NIC link-down with an explicit link-down
event and link-down without any event.
a) NIC link-down with an explicit link-dow event.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
domain_context_clear_one_cb
pci_for_each_dma_alias
device_block_translation
blocking_domain_attach_dev
iommu_deinit_device
__iommu_group_remove_device
iommu_release_device
iommu_bus_notifier
blocking_notifier_call_chain
bus_notify
device_del
pci_remove_bus_device
pci_stop_and_remove_bus_device
pciehp_unconfigure_device
pciehp_disable_slot
pciehp_handle_presence_or_link_change
pciehp_ist
b) NIC link-down without an event - hard-lock on VM destroy.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
domain_context_clear_one_cb
pci_for_each_dma_alias
device_block_translation
blocking_domain_attach_dev
__iommu_attach_device
__iommu_device_set_domain
__iommu_group_set_domain_internal
iommu_detach_group
vfio_iommu_type1_detach_group
vfio_group_detach_container
vfio_group_fops_release
__fput
2. Hard-lock when a passthrough PCIe NIC with ATS enabled link-down in Intel IOMMU
scalable mode; NIC link-down without an event hard-locks on VM destroy.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
intel_pasid_tear_down_entry
device_block_translation
blocking_domain_attach_dev
__iommu_attach_device
__iommu_device_set_domain
__iommu_group_set_domain_internal
iommu_detach_group
vfio_iommu_type1_detach_group
vfio_group_detach_container
vfio_group_fops_release
__fput
Fix both issues with two patches:
1. Skip dev-IOTLB flush for inaccessible devices in __context_flush_dev_iotlb() using
pci_device_is_present().
2. Use pci_device_is_present() instead of pci_dev_is_disconnected() to decide when to
skip ATS invalidation in devtlb_invalidation_with_pasid().
Best Regards,
Jinhui
---
v1: https://lore.kernel.org/all/20251210171431.1589-1-guojinhui.liam@bytedance.…
Changelog in v1 -> v2 (suggested by Baolu Lu)
- Simplify the pci_device_is_present() check in __context_flush_dev_iotlb().
- Add Cc: stable(a)vger.kernel.org to both patches.
Jinhui Guo (2):
iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without
scalable mode
iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in
scalable mode
drivers/iommu/intel/pasid.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--
2.20.1
Hi,
I would like to request backporting 5326ab737a47 ("virtio_console: fix
order of fields cols and rows") to all LTS kernels.
I'm working on QEMU patches that add virtio console size support.
Without the fix, rows and columns will be swapped.
As far as I know, there are no device implementations that use the
wrong order and would by broken by the fix.
Note: A previous version [1] of the patch contained "Cc: stable" and
"Fixes:" tags, but they seem to have been accidentally left out from
the final version.
[1]: https://lore.kernel.org/all/20250320172654.624657-1-maxbr@linux.ibm.com/
Thanks,
Filip Hejsek
We switched from (wrongly) using the page count to an independent
shared count. Now, shared page tables have a refcount of 1 (excluding
speculative references) and instead use ptdesc->pt_share_count to
identify sharing.
We didn't convert hugetlb_pmd_shared(), so right now, we would never
detect a shared PMD table as such, because sharing/unsharing no longer
touches the refcount of a PMD table.
Page migration, like mbind() or migrate_pages() would allow for migrating
folios mapped into such shared PMD tables, even though the folios are
not exclusive. In smaps we would account them as "private" although they
are "shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the
pagemap interface.
Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared().
Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count")
Reviewed-by: Rik van Riel <riel(a)surriel.com>
Reviewed-by: Lance Yang <lance.yang(a)linux.dev>
Tested-by: Lance Yang <lance.yang(a)linux.dev>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Tested-by: Laurence Oberman <loberman(a)redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Acked-by: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Signed-off-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
---
include/linux/hugetlb.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 019a1c5281e4e..03c8725efa289 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1326,7 +1326,7 @@ static inline __init void hugetlb_cma_reserve(int order)
#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
static inline bool hugetlb_pmd_shared(pte_t *pte)
{
- return page_count(virt_to_page(pte)) > 1;
+ return ptdesc_pmd_is_shared(virt_to_ptdesc(pte));
}
#else
static inline bool hugetlb_pmd_shared(pte_t *pte)
--
2.52.0
As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix
huge_pmd_unshare() vs GUP-fast race") we can end up in some situations
where we perform so many IPI broadcasts when unsharing hugetlb PMD page
tables that it severely regresses some workloads.
In particular, when we fork()+exit(), or when we munmap() a large
area backed by many shared PMD tables, we perform one IPI broadcast per
unshared PMD table.
There are two optimizations to be had:
(1) When we process (unshare) multiple such PMD tables, such as during
exit(), it is sufficient to send a single IPI broadcast (as long as
we respect locking rules) instead of one per PMD table.
Locking prevents that any of these PMD tables could get reused before
we drop the lock.
(2) When we are not the last sharer (> 2 users including us), there is
no need to send the IPI broadcast. The shared PMD tables cannot
become exclusive (fully unshared) before an IPI will be broadcasted
by the last sharer.
Concurrent GUP-fast could walk into a PMD table just before we
unshared it. It could then succeed in grabbing a page from the
shared page table even after munmap() etc succeeded (and supressed
an IPI). But there is not difference compared to GUP-fast just
sleeping for a while after grabbing the page and re-enabling IRQs.
Most importantly, GUP-fast will never walk into page tables that are
no-longer shared, because the last sharer will issue an IPI
broadcast.
(if ever required, checking whether the PUD changed in GUP-fast
after grabbing the page like we do in the PTE case could handle
this)
So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather
infrastructure so we can implement these optimizations and demystify the
code at least a bit. Extend the mmu_gather infrastructure to be able to
deal with our special hugetlb PMD table sharing implementation.
To make initialization of the mmu_gather easier when working on a single
VMA (in particular, when dealing with hugetlb), provide
tlb_gather_mmu_vma().
We'll consolidate the handling for (full) unsharing of PMD tables in
tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track
in "struct mmu_gather" whether we had (full) unsharing of PMD tables.
Because locking is very special (concurrent unsharing+reuse must be
prevented), we disallow deferring flushing to tlb_finish_mmu() and instead
require an explicit earlier call to tlb_flush_unshared_tables().
From hugetlb code, we call huge_pmd_unshare_flush() where we make sure
that the expected lock protecting us from concurrent unsharing+reuse is
still held.
Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that
tlb_flush_unshared_tables() was properly called earlier.
Document it all properly.
Notes about tlb_remove_table_sync_one() interaction with unsharing:
There are two fairly tricky things:
(1) tlb_remove_table_sync_one() is a NOP on architectures without
CONFIG_MMU_GATHER_RCU_TABLE_FREE.
Here, the assumption is that the previous TLB flush would send an
IPI to all relevant CPUs. Careful: some architectures like x86 only
send IPIs to all relevant CPUs when tlb->freed_tables is set.
The relevant architectures should be selecting
MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable
kernels and it might have been problematic before this patch.
Also, the arch flushing behavior (independent of IPIs) is different
when tlb->freed_tables is set. Do we have to enlighten them to also
take care of tlb->unshared_tables? So far we didn't care, so
hopefully we are fine. Of course, we could be setting
tlb->freed_tables as well, but that might then unnecessarily flush
too much, because the semantics of tlb->freed_tables are a bit
fuzzy.
This patch changes nothing in this regard.
(2) tlb_remove_table_sync_one() is not a NOP on architectures with
CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync.
Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB)
we still issue IPIs during TLB flushes and don't actually need the
second tlb_remove_table_sync_one().
This optimized can be implemented on top of this, by checking e.g., in
tlb_remove_table_sync_one() whether we really need IPIs. But as
described in (1), it really must honor tlb->freed_tables then to
send IPIs to all relevant CPUs.
Notes on TLB flushing changes:
(1) Flushing for non-shared PMD tables
We're converting from flush_hugetlb_tlb_range() to
tlb_remove_huge_tlb_entry(). Given that we properly initialize the
MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to
__unmap_hugepage_range(), that should be fine.
(2) Flushing for shared PMD tables
We're converting from various things (flush_hugetlb_tlb_range(),
tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range().
tlb_flush_pmd_range() achieves the same that
tlb_remove_huge_tlb_entry() would achieve in these scenarios.
Note that tlb_remove_huge_tlb_entry() also calls
__tlb_remove_tlb_entry(), however that is only implemented on
powerpc, which does not support PMD table sharing.
Similar to (1), tlb_gather_mmu_vma() should make sure that TLB
flushing keeps on working as expected.
Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a
concern, as we are holding the i_mmap_lock the whole time, preventing
concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed
separately as a cleanup later.
There are plenty more cleanups to be had, but they have to wait until
this is fixed.
Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race")
Reported-by: Uschakow, Stanislav" <suschako(a)amazon.de>
Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/
Tested-by: Laurence Oberman <loberman(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
---
include/asm-generic/tlb.h | 77 +++++++++++++++++++++++-
include/linux/hugetlb.h | 15 +++--
include/linux/mm_types.h | 1 +
mm/hugetlb.c | 123 ++++++++++++++++++++++----------------
mm/mmu_gather.c | 33 ++++++++++
mm/rmap.c | 25 +++++---
6 files changed, 208 insertions(+), 66 deletions(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 1fff717cae510..4d679d2a206b4 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -46,7 +46,8 @@
*
* The mmu_gather API consists of:
*
- * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_finish_mmu()
+ * - tlb_gather_mmu() / tlb_gather_mmu_fullmm() / tlb_gather_mmu_vma() /
+ * tlb_finish_mmu()
*
* start and finish a mmu_gather
*
@@ -364,6 +365,20 @@ struct mmu_gather {
unsigned int vma_huge : 1;
unsigned int vma_pfn : 1;
+ /*
+ * Did we unshare (unmap) any shared page tables? For now only
+ * used for hugetlb PMD table sharing.
+ */
+ unsigned int unshared_tables : 1;
+
+ /*
+ * Did we unshare any page tables such that they are now exclusive
+ * and could get reused+modified by the new owner? When setting this
+ * flag, "unshared_tables" will be set as well. For now only used
+ * for hugetlb PMD table sharing.
+ */
+ unsigned int fully_unshared_tables : 1;
+
unsigned int batch_count;
#ifndef CONFIG_MMU_GATHER_NO_GATHER
@@ -400,6 +415,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
tlb->cleared_pmds = 0;
tlb->cleared_puds = 0;
tlb->cleared_p4ds = 0;
+ tlb->unshared_tables = 0;
/*
* Do not reset mmu_gather::vma_* fields here, we do not
* call into tlb_start_vma() again to set them if there is an
@@ -484,7 +500,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
* these bits.
*/
if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
- tlb->cleared_puds || tlb->cleared_p4ds))
+ tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables))
return;
tlb_flush(tlb);
@@ -773,6 +789,63 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)
}
#endif
+#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
+static inline void tlb_unshare_pmd_ptdesc(struct mmu_gather *tlb, struct ptdesc *pt,
+ unsigned long addr)
+{
+ /*
+ * The caller must make sure that concurrent unsharing + exclusive
+ * reuse is impossible until tlb_flush_unshared_tables() was called.
+ */
+ VM_WARN_ON_ONCE(!ptdesc_pmd_is_shared(pt));
+ ptdesc_pmd_pts_dec(pt);
+
+ /* Clearing a PUD pointing at a PMD table with PMD leaves. */
+ tlb_flush_pmd_range(tlb, addr & PUD_MASK, PUD_SIZE);
+
+ /*
+ * If the page table is now exclusively owned, we fully unshared
+ * a page table.
+ */
+ if (!ptdesc_pmd_is_shared(pt))
+ tlb->fully_unshared_tables = true;
+ tlb->unshared_tables = true;
+}
+
+static inline void tlb_flush_unshared_tables(struct mmu_gather *tlb)
+{
+ /*
+ * As soon as the caller drops locks to allow for reuse of
+ * previously-shared tables, these tables could get modified and
+ * even reused outside of hugetlb context, so we have to make sure that
+ * any page table walkers (incl. TLB, GUP-fast) are aware of that
+ * change.
+ *
+ * Even if we are not fully unsharing a PMD table, we must
+ * flush the TLB for the unsharer now.
+ */
+ if (tlb->unshared_tables)
+ tlb_flush_mmu_tlbonly(tlb);
+
+ /*
+ * Similarly, we must make sure that concurrent GUP-fast will not
+ * walk previously-shared page tables that are getting modified+reused
+ * elsewhere. So broadcast an IPI to wait for any concurrent GUP-fast.
+ *
+ * We only perform this when we are the last sharer of a page table,
+ * as the IPI will reach all CPUs: any GUP-fast.
+ *
+ * Note that on configs where tlb_remove_table_sync_one() is a NOP,
+ * the expectation is that the tlb_flush_mmu_tlbonly() would have issued
+ * required IPIs already for us.
+ */
+ if (tlb->fully_unshared_tables) {
+ tlb_remove_table_sync_one();
+ tlb->fully_unshared_tables = false;
+ }
+}
+#endif /* CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */
+
#endif /* CONFIG_MMU */
#endif /* _ASM_GENERIC__TLB_H */
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 03c8725efa289..e51b8ef0cebd9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -240,8 +240,9 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz);
unsigned long hugetlb_mask_last_page(struct hstate *h);
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep);
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma);
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end);
@@ -300,13 +301,17 @@ static inline struct address_space *hugetlb_folio_mapping_lock_write(
return NULL;
}
-static inline int huge_pmd_unshare(struct mm_struct *mm,
- struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+static inline int huge_pmd_unshare(struct mmu_gather *tlb,
+ struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
{
return 0;
}
+static inline void huge_pmd_unshare_flush(struct mmu_gather *tlb,
+ struct vm_area_struct *vma)
+{
+}
+
static inline void adjust_range_if_pmd_sharing_possible(
struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 42af2292951d4..d1053b2c1f800 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1522,6 +1522,7 @@ static inline unsigned int mm_cid_size(void)
struct mmu_gather;
extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm);
+void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma);
extern void tlb_finish_mmu(struct mmu_gather *tlb);
struct vm_fault;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c77cdef12a32..2609b6d58f99e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5096,7 +5096,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
unsigned long last_addr_mask;
pte_t *src_pte, *dst_pte;
struct mmu_notifier_range range;
- bool shared_pmd = false;
+ struct mmu_gather tlb;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, old_addr,
old_end);
@@ -5106,6 +5106,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
* range.
*/
flush_cache_range(vma, range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
mmu_notifier_invalidate_range_start(&range);
last_addr_mask = hugetlb_mask_last_page(h);
@@ -5122,8 +5123,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
if (huge_pte_none(huge_ptep_get(mm, old_addr, src_pte)))
continue;
- if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) {
- shared_pmd = true;
+ if (huge_pmd_unshare(&tlb, vma, old_addr, src_pte)) {
old_addr |= last_addr_mask;
new_addr |= last_addr_mask;
continue;
@@ -5134,15 +5134,16 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
break;
move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte, sz);
+ tlb_remove_huge_tlb_entry(h, &tlb, src_pte, old_addr);
}
- if (shared_pmd)
- flush_hugetlb_tlb_range(vma, range.start, range.end);
- else
- flush_hugetlb_tlb_range(vma, old_end - len, old_end);
+ tlb_flush_mmu_tlbonly(&tlb);
+ huge_pmd_unshare_flush(&tlb, vma);
+
mmu_notifier_invalidate_range_end(&range);
i_mmap_unlock_write(mapping);
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
return len + old_addr - old_end;
}
@@ -5161,7 +5162,6 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
unsigned long sz = huge_page_size(h);
bool adjust_reservation;
unsigned long last_addr_mask;
- bool force_flush = false;
WARN_ON(!is_vm_hugetlb_page(vma));
BUG_ON(start & ~huge_page_mask(h));
@@ -5184,10 +5184,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
}
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, vma, address, ptep)) {
+ if (huge_pmd_unshare(tlb, vma, address, ptep)) {
spin_unlock(ptl);
- tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
- force_flush = true;
address |= last_addr_mask;
continue;
}
@@ -5303,14 +5301,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
}
tlb_end_vma(tlb, vma);
- /*
- * There is nothing protecting a previously-shared page table that we
- * unshared through huge_pmd_unshare() from getting freed after we
- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare()
- * succeeded, flush the range corresponding to the pud.
- */
- if (force_flush)
- tlb_flush_mmu_tlbonly(tlb);
+ huge_pmd_unshare_flush(tlb, vma);
}
void __hugetlb_zap_begin(struct vm_area_struct *vma,
@@ -6409,11 +6400,11 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
pte_t pte;
struct hstate *h = hstate_vma(vma);
long pages = 0, psize = huge_page_size(h);
- bool shared_pmd = false;
struct mmu_notifier_range range;
unsigned long last_addr_mask;
bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+ struct mmu_gather tlb;
/*
* In the case of shared PMDs, the area to flush could be beyond
@@ -6426,6 +6417,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
BUG_ON(address >= end);
flush_cache_range(vma, range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
mmu_notifier_invalidate_range_start(&range);
hugetlb_vma_lock_write(vma);
@@ -6452,7 +6444,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
}
}
ptl = huge_pte_lock(h, mm, ptep);
- if (huge_pmd_unshare(mm, vma, address, ptep)) {
+ if (huge_pmd_unshare(&tlb, vma, address, ptep)) {
/*
* When uffd-wp is enabled on the vma, unshare
* shouldn't happen at all. Warn about it if it
@@ -6461,7 +6453,6 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
WARN_ON_ONCE(uffd_wp || uffd_wp_resolve);
pages++;
spin_unlock(ptl);
- shared_pmd = true;
address |= last_addr_mask;
continue;
}
@@ -6522,22 +6513,16 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
pte = huge_pte_clear_uffd_wp(pte);
huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
pages++;
+ tlb_remove_huge_tlb_entry(h, &tlb, ptep, address);
}
next:
spin_unlock(ptl);
cond_resched();
}
- /*
- * There is nothing protecting a previously-shared page table that we
- * unshared through huge_pmd_unshare() from getting freed after we
- * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare()
- * succeeded, flush the range corresponding to the pud.
- */
- if (shared_pmd)
- flush_hugetlb_tlb_range(vma, range.start, range.end);
- else
- flush_hugetlb_tlb_range(vma, start, end);
+
+ tlb_flush_mmu_tlbonly(&tlb);
+ huge_pmd_unshare_flush(&tlb, vma);
/*
* No need to call mmu_notifier_arch_invalidate_secondary_tlbs() we are
* downgrading page table protection not changing it to point to a new
@@ -6548,6 +6533,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
i_mmap_unlock_write(vma->vm_file->f_mapping);
hugetlb_vma_unlock_write(vma);
mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
return pages > 0 ? (pages << h->order) : pages;
}
@@ -6904,18 +6890,27 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
return pte;
}
-/*
- * unmap huge page backed by shared pte.
+/**
+ * huge_pmd_unshare - Unmap a pmd table if it is shared by multiple users
+ * @tlb: the current mmu_gather.
+ * @vma: the vma covering the pmd table.
+ * @addr: the address we are trying to unshare.
+ * @ptep: pointer into the (pmd) page table.
+ *
+ * Called with the page table lock held, the i_mmap_rwsem held in write mode
+ * and the hugetlb vma lock held in write mode.
*
- * Called with page table lock held.
+ * Note: The caller must call huge_pmd_unshare_flush() before dropping the
+ * i_mmap_rwsem.
*
- * returns: 1 successfully unmapped a shared pte page
- * 0 the underlying pte page is not shared, or it is the last user
+ * Returns: 1 if it was a shared PMD table and it got unmapped, or 0 if it
+ * was not a shared PMD table.
*/
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
{
unsigned long sz = huge_page_size(hstate_vma(vma));
+ struct mm_struct *mm = vma->vm_mm;
pgd_t *pgd = pgd_offset(mm, addr);
p4d_t *p4d = p4d_offset(pgd, addr);
pud_t *pud = pud_offset(p4d, addr);
@@ -6927,18 +6922,36 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
i_mmap_assert_write_locked(vma->vm_file->f_mapping);
hugetlb_vma_assert_locked(vma);
pud_clear(pud);
- /*
- * Once our caller drops the rmap lock, some other process might be
- * using this page table as a normal, non-hugetlb page table.
- * Wait for pending gup_fast() in other threads to finish before letting
- * that happen.
- */
- tlb_remove_table_sync_one();
- ptdesc_pmd_pts_dec(virt_to_ptdesc(ptep));
+
+ tlb_unshare_pmd_ptdesc(tlb, virt_to_ptdesc(ptep), addr);
+
mm_dec_nr_pmds(mm);
return 1;
}
+/*
+ * huge_pmd_unshare_flush - Complete a sequence of huge_pmd_unshare() calls
+ * @tlb: the current mmu_gather.
+ * @vma: the vma covering the pmd table.
+ *
+ * Perform necessary TLB flushes or IPI broadcasts to synchronize PMD table
+ * unsharing with concurrent page table walkers.
+ *
+ * This function must be called after a sequence of huge_pmd_unshare()
+ * calls while still holding the i_mmap_rwsem.
+ */
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ /*
+ * We must synchronize page table unsharing such that nobody will
+ * try reusing a previously-shared page table while it might still
+ * be in use by previous sharers (TLB, GUP_fast).
+ */
+ i_mmap_assert_write_locked(vma->vm_file->f_mapping);
+
+ tlb_flush_unshared_tables(tlb);
+}
+
#else /* !CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */
pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
@@ -6947,12 +6960,16 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
return NULL;
}
-int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
{
return 0;
}
+void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+}
+
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
{
@@ -7219,6 +7236,7 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
unsigned long sz = huge_page_size(h);
struct mm_struct *mm = vma->vm_mm;
struct mmu_notifier_range range;
+ struct mmu_gather tlb;
unsigned long address;
spinlock_t *ptl;
pte_t *ptep;
@@ -7230,6 +7248,8 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
return;
flush_cache_range(vma, start, end);
+ tlb_gather_mmu_vma(&tlb, vma);
+
/*
* No need to call adjust_range_if_pmd_sharing_possible(), because
* we have already done the PUD_SIZE alignment.
@@ -7248,10 +7268,10 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
if (!ptep)
continue;
ptl = huge_pte_lock(h, mm, ptep);
- huge_pmd_unshare(mm, vma, address, ptep);
+ huge_pmd_unshare(&tlb, vma, address, ptep);
spin_unlock(ptl);
}
- flush_hugetlb_tlb_range(vma, start, end);
+ huge_pmd_unshare_flush(&tlb, vma);
if (take_locks) {
i_mmap_unlock_write(vma->vm_file->f_mapping);
hugetlb_vma_unlock_write(vma);
@@ -7261,6 +7281,7 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
* Documentation/mm/mmu_notifier.rst.
*/
mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
}
/*
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 247e3f9db6c7a..cd32c2dbf501b 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -10,6 +10,7 @@
#include <linux/swap.h>
#include <linux/rmap.h>
#include <linux/pgalloc.h>
+#include <linux/hugetlb.h>
#include <asm/tlb.h>
@@ -426,6 +427,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
#endif
tlb->vma_pfn = 0;
+ tlb->fully_unshared_tables = 0;
__tlb_reset_range(tlb);
inc_tlb_flush_pending(tlb->mm);
}
@@ -459,6 +461,31 @@ void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm)
__tlb_gather_mmu(tlb, mm, true);
}
+/**
+ * tlb_gather_mmu - initialize an mmu_gather structure for operating on a single
+ * VMA
+ * @tlb: the mmu_gather structure to initialize
+ * @vma: the vm_area_struct
+ *
+ * Called to initialize an (on-stack) mmu_gather structure for operating on
+ * a single VMA. In contrast to tlb_gather_mmu(), calling this function will
+ * not require another call to tlb_start_vma(). In contrast to tlb_start_vma(),
+ * this function will *not* call flush_cache_range().
+ *
+ * For hugetlb VMAs, this function will also initialize the mmu_gather
+ * page_size accordingly, not requiring a separate call to
+ * tlb_change_page_size().
+ *
+ */
+void tlb_gather_mmu_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ tlb_gather_mmu(tlb, vma->vm_mm);
+ tlb_update_vma_flags(tlb, vma);
+ if (is_vm_hugetlb_page(vma))
+ /* All entries have the same size. */
+ tlb_change_page_size(tlb, huge_page_size(hstate_vma(vma)));
+}
+
/**
* tlb_finish_mmu - finish an mmu_gather structure
* @tlb: the mmu_gather structure to finish
@@ -468,6 +495,12 @@ void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm)
*/
void tlb_finish_mmu(struct mmu_gather *tlb)
{
+ /*
+ * We expect an earlier huge_pmd_unshare_flush() call to sort this out,
+ * due to complicated locking requirements with page table unsharing.
+ */
+ VM_WARN_ON_ONCE(tlb->fully_unshared_tables);
+
/*
* If there are parallel threads are doing PTE changes on same range
* under non-exclusive lock (e.g., mmap_lock read-side) but defer TLB
diff --git a/mm/rmap.c b/mm/rmap.c
index 748f48727a162..7b9879ef442d9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -76,7 +76,7 @@
#include <linux/mm_inline.h>
#include <linux/oom.h>
-#include <asm/tlbflush.h>
+#include <asm/tlb.h>
#define CREATE_TRACE_POINTS
#include <trace/events/migrate.h>
@@ -2008,13 +2008,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* if unsuccessful.
*/
if (!anon) {
+ struct mmu_gather tlb;
+
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
if (!hugetlb_vma_trylock_write(vma))
goto walk_abort;
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+
+ tlb_gather_mmu_vma(&tlb, vma);
+ if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
+ huge_pmd_unshare_flush(&tlb, vma);
+ tlb_finish_mmu(&tlb);
/*
* The PMD table was unmapped,
* consequently unmapping the folio.
@@ -2022,6 +2026,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
goto walk_done;
}
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
}
pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
if (pte_dirty(pteval))
@@ -2398,17 +2403,20 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
* fail if unsuccessful.
*/
if (!anon) {
+ struct mmu_gather tlb;
+
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
if (!hugetlb_vma_trylock_write(vma)) {
page_vma_mapped_walk_done(&pvmw);
ret = false;
break;
}
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
- hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
+ tlb_gather_mmu_vma(&tlb, vma);
+ if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
+ hugetlb_vma_unlock_write(vma);
+ huge_pmd_unshare_flush(&tlb, vma);
+ tlb_finish_mmu(&tlb);
/*
* The PMD table was unmapped,
* consequently unmapping the folio.
@@ -2417,6 +2425,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
}
/* Nuke the hugetlb page table entry */
pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
--
2.52.0
Add a new feature flag UBLK_F_NO_AUTO_PART_SCAN to allow users to suppress
automatic partition scanning when starting a ublk device.
This is useful for network-backed devices where partition scanning
can cause issues:
- Partition scan triggers synchronous I/O during device startup
- If userspace server crashes during scan, recovery is problematic
- For remotely-managed devices, partition probing may not be needed
Users can manually trigger partition scanning later when appropriate
using standard tools (e.g., partprobe, blockdev --rereadpt).
Reported-by: Yoav Cohen <yoav(a)nvidia.com>
Link: https://lore.kernel.org/linux-block/DM4PR12MB63280C5637917C071C2F0D65A9A8A@…
Cc: stable(a)vger.kernel.org
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
- suggest to backport to stable, which is useful for avoiding problematic
recovery, also the change is simple enough
drivers/block/ublk_drv.c | 16 +++++++++++++---
include/uapi/linux/ublk_cmd.h | 8 ++++++++
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 78f3e22151b9..ca6ec8ed443f 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -73,7 +73,8 @@
| UBLK_F_AUTO_BUF_REG \
| UBLK_F_QUIESCE \
| UBLK_F_PER_IO_DAEMON \
- | UBLK_F_BUF_REG_OFF_DAEMON)
+ | UBLK_F_BUF_REG_OFF_DAEMON \
+ | UBLK_F_NO_AUTO_PART_SCAN)
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
| UBLK_F_USER_RECOVERY_REISSUE \
@@ -2930,8 +2931,13 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
ublk_apply_params(ub);
- /* don't probe partitions if any daemon task is un-trusted */
- if (ub->unprivileged_daemons)
+ /*
+ * Don't probe partitions if:
+ * - any daemon task is un-trusted, or
+ * - user explicitly requested to suppress partition scan
+ */
+ if (ub->unprivileged_daemons ||
+ (ub->dev_info.flags & UBLK_F_NO_AUTO_PART_SCAN))
set_bit(GD_SUPPRESS_PART_SCAN, &disk->state);
ublk_get_device(ub);
@@ -2947,6 +2953,10 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
if (ret)
goto out_put_cdev;
+ /* allow user to probe partitions from userspace */
+ if (!ub->unprivileged_daemons &&
+ (ub->dev_info.flags & UBLK_F_NO_AUTO_PART_SCAN))
+ clear_bit(GD_SUPPRESS_PART_SCAN, &disk->state);
set_bit(UB_STATE_USED, &ub->state);
out_put_cdev:
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index ec77dabba45b..0827db14a215 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -311,6 +311,14 @@
*/
#define UBLK_F_BUF_REG_OFF_DAEMON (1ULL << 14)
+/*
+ * If this feature is set, the kernel will not automatically scan for partitions
+ * when the device is started. This is useful for network-backed devices where
+ * partition scanning can cause deadlocks if the userspace server crashes during
+ * the scan. Users can manually trigger partition scanning later when appropriate.
+ */
+#define UBLK_F_NO_AUTO_PART_SCAN (1ULL << 15)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
--
2.47.0
The quilt patch titled
Subject: buildid: validate page-backed file before parsing build ID
has been removed from the -mm tree. Its filename was
buildid-validate-page-backed-file-before-parsing-build-id.patch
This patch was dropped because an alternative patch was or shall be merged
------------------------------------------------------
From: Jinchao Wang <wangjinchao600(a)gmail.com>
Subject: buildid: validate page-backed file before parsing build ID
Date: Tue, 23 Dec 2025 18:32:07 +0800
__build_id_parse() only works on page-backed storage. Its helper paths
eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
that do not map a regular file or lack valid address_space operations.
Link: https://lkml.kernel.org/r/20251223103214.2412446-1-wangjinchao600@gmail.com
Fixes: ad41251c290d ("lib/buildid: implement sleepable build_id_parse() API")
Signed-off-by: Jinchao Wang <wangjinchao600(a)gmail.com>
Reported-by: <syzbot+e008db2ac01e282550ee(a)syzkaller.appspotmail.com>
Tested-by: <syzbot+e008db2ac01e282550ee(a)syzkaller.appspotmail.com>
Link: https://lkml.kernel.org/r/694a67ab.050a0220.19928e.001c.GAE@google.com
Closes: https://lkml.kernel.org/r/693540fe.a70a0220.38f243.004c.GAE@google.com
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand (Red Hat) <david(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Yuanchu Xie <yuanchu(a)google.com>
Cc: Andrii Nakryiko <andrii(a)kernel.org>
Cc: Eduard Zingerman <eddyz87(a)gmail.com>
Cc: Omar Sandoval <osandov(a)fb.com>
Cc: Deepanshu Kartikey <kartikey406(a)gmail.com>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkman <daniel(a)iogearbox.net>
Cc: Hao Luo <haoluo(a)google.com>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: John Fastabend <john.fastabend(a)gmail.com>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Martin KaFai Lau <martin.lau(a)linux.dev>
Cc: Song Liu <song(a)kernel.org>
Cc: Stanislav Fomichev <sdf(a)fomichev.me>
Cc: Yonghong Song <yonghong.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/buildid.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/lib/buildid.c~buildid-validate-page-backed-file-before-parsing-build-id
+++ a/lib/buildid.c
@@ -288,7 +288,10 @@ static int __build_id_parse(struct vm_ar
int ret;
/* only works for page backed storage */
- if (!vma->vm_file)
+ if (!vma->vm_file ||
+ !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
+ !vma->vm_file->f_mapping->a_ops ||
+ !vma->vm_file->f_mapping->a_ops->read_folio)
return -EINVAL;
freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
_
Patches currently in -mm which might be from wangjinchao600(a)gmail.com are
The quilt patch titled
Subject: mm/page_owner: fix memory leak in page_owner_stack_fops->release()
has been removed from the -mm tree. Its filename was
mm-page_owner-fix-memory-leak-in-page_owner_stack_fops-release.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Subject: mm/page_owner: fix memory leak in page_owner_stack_fops->release()
Date: Fri, 19 Dec 2025 07:42:32 +0000
The page_owner_stack_fops->open() callback invokes seq_open_private(),
therefore its corresponding ->release() callback must call
seq_release_private(). Otherwise it will cause a memory leak of struct
stack_print_ctx.
Link: https://lkml.kernel.org/r/20251219074232.136482-1-ranxiaokai627@163.com
Fixes: 765973a09803 ("mm,page_owner: display all stacks and their count")
Signed-off-by: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Marco Elver <elver(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_owner.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page_owner.c~mm-page_owner-fix-memory-leak-in-page_owner_stack_fops-release
+++ a/mm/page_owner.c
@@ -952,7 +952,7 @@ static const struct file_operations page
.open = page_owner_stack_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = seq_release_private,
};
static int page_owner_threshold_get(void *data, u64 *val)
_
Patches currently in -mm which might be from ran.xiaokai(a)zte.com.cn are
The quilt patch titled
Subject: mm: consider non-anon swap cache folios in folio_expected_ref_count()
has been removed from the -mm tree. Its filename was
mm-consider-non-anon-swap-cache-folios-in-folio_expected_ref_count.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Bijan Tabatabai <bijan311(a)gmail.com>
Subject: mm: consider non-anon swap cache folios in folio_expected_ref_count()
Date: Tue, 16 Dec 2025 14:07:27 -0600
Currently, folio_expected_ref_count() only adds references for the swap
cache if the folio is anonymous. However, according to the comment above
the definition of PG_swapcache in enum pageflags, shmem folios can also
have PG_swapcache set. This patch makes sure references for the swap
cache are added if folio_test_swapcache(folio) is true.
This issue was found when trying to hot-unplug memory in a QEMU/KVM
virtual machine. When initiating hot-unplug when most of the guest memory
is allocated, hot-unplug hangs partway through removal due to migration
failures. The following message would be printed several times, and would
be printed again about every five seconds:
[ 49.641309] migrating pfn b12f25 failed ret:7
[ 49.641310] page: refcount:2 mapcount:0 mapping:0000000033bd8fe2 index:0x7f404d925 pfn:0xb12f25
[ 49.641311] aops:swap_aops
[ 49.641313] flags: 0x300000000030508(uptodate|active|owner_priv_1|reclaim|swapbacked|node=0|zone=3)
[ 49.641314] raw: 0300000000030508 ffffed312c4bc908 ffffed312c4bc9c8 0000000000000000
[ 49.641315] raw: 00000007f404d925 00000000000c823b 00000002ffffffff 0000000000000000
[ 49.641315] page dumped because: migration failure
When debugging this, I found that these migration failures were due to
__migrate_folio() returning -EAGAIN for a small set of folios because the
expected reference count it calculates via folio_expected_ref_count() is
one less than the actual reference count of the folios. Furthermore, all
of the affected folios were not anonymous, but had the PG_swapcache flag
set, inspiring this patch. After applying this patch, the memory
hot-unplug behaves as expected.
I tested this on a machine running Ubuntu 24.04 with kernel version
6.8.0-90-generic and 64GB of memory. The guest VM is managed by libvirt
and runs Ubuntu 24.04 with kernel version 6.18 (though the head of the
mm-unstable branch as a Dec 16, 2025 was also tested and behaves the same)
and 48GB of memory. The libvirt XML definition for the VM can be found at
[1]. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE is set in the guest
kernel so the hot-pluggable memory is automatically onlined.
Below are the steps to reproduce this behavior:
1) Define and start and virtual machine
host$ virsh -c qemu:///system define ./test_vm.xml # test_vm.xml from [1]
host$ virsh -c qemu:///system start test_vm
2) Setup swap in the guest
guest$ sudo fallocate -l 32G /swapfile
guest$ sudo chmod 0600 /swapfile
guest$ sudo mkswap /swapfile
guest$ sudo swapon /swapfile
3) Use alloc_data [2] to allocate most of the remaining guest memory
guest$ ./alloc_data 45
4) In a separate guest terminal, monitor the amount of used memory
guest$ watch -n1 free -h
5) When alloc_data has finished allocating, initiate the memory
hot-unplug using the provided xml file [3]
host$ virsh -c qemu:///system detach-device test_vm ./remove.xml --live
After initiating the memory hot-unplug, you should see the amount of
available memory in the guest decrease, and the amount of used swap data
increase. If everything works as expected, when all of the memory is
unplugged, there should be around 8.5-9GB of data in swap. If the
unplugging is unsuccessful, the amount of used swap data will settle below
that. If that happens, you should be able to see log messages in dmesg
similar to the one posted above.
Link: https://lkml.kernel.org/r/20251216200727.2360228-1-bijan311@gmail.com
Link: https://github.com/BijanT/linux_patch_files/blob/main/test_vm.xml [1]
Link: https://github.com/BijanT/linux_patch_files/blob/main/alloc_data.c [2]
Link: https://github.com/BijanT/linux_patch_files/blob/main/remove.xml [3]
Fixes: 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation")
Signed-off-by: Bijan Tabatabai <bijan311(a)gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Shivank Garg <shivankg(a)amd.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Kairui Song <ryncsn(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/include/linux/mm.h~mm-consider-non-anon-swap-cache-folios-in-folio_expected_ref_count
+++ a/include/linux/mm.h
@@ -2459,10 +2459,10 @@ static inline int folio_expected_ref_cou
if (WARN_ON_ONCE(page_has_type(&folio->page) && !folio_test_hugetlb(folio)))
return 0;
- if (folio_test_anon(folio)) {
- /* One reference per page from the swapcache. */
- ref_count += folio_test_swapcache(folio) << order;
- } else {
+ /* One reference per page from the swapcache. */
+ ref_count += folio_test_swapcache(folio) << order;
+
+ if (!folio_test_anon(folio)) {
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
_
Patches currently in -mm which might be from bijan311(a)gmail.com are
The quilt patch titled
Subject: rust: maple_tree: rcu_read_lock() in destructor to silence lockdep
has been removed from the -mm tree. Its filename was
rust-maple_tree-rcu_read_lock-in-destructor-to-silence-lockdep.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Alice Ryhl <aliceryhl(a)google.com>
Subject: rust: maple_tree: rcu_read_lock() in destructor to silence lockdep
Date: Wed, 17 Dec 2025 13:10:37 +0000
When running the Rust maple tree kunit tests with lockdep, you may trigger
a warning that looks like this:
lib/maple_tree.c:780 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
no locks held by kunit_try_catch/344.
stack backtrace:
CPU: 3 UID: 0 PID: 344 Comm: kunit_try_catch Tainted: G N 6.19.0-rc1+ #2 NONE
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x71/0x90
lockdep_rcu_suspicious+0x150/0x190
mas_start+0x104/0x150
mas_find+0x179/0x240
_RINvNtCs5QSdWC790r4_4core3ptr13drop_in_placeINtNtCs1cdwasc6FUb_6kernel10maple_tree9MapleTreeINtNtNtBL_5alloc4kbox3BoxlNtNtB1x_9allocator7KmallocEEECsgxAQYCfdR72_25doctests_kernel_generated+0xaf/0x130
rust_doctest_kernel_maple_tree_rs_0+0x600/0x6b0
? lock_release+0xeb/0x2a0
? kunit_try_catch_run+0x210/0x210
kunit_try_run_case+0x74/0x160
? kunit_try_catch_run+0x210/0x210
kunit_generic_run_threadfn_adapter+0x12/0x30
kthread+0x21c/0x230
? __do_trace_sched_kthread_stop_ret+0x40/0x40
ret_from_fork+0x16c/0x270
? __do_trace_sched_kthread_stop_ret+0x40/0x40
ret_from_fork_asm+0x11/0x20
</TASK>
This is because the destructor of maple tree calls mas_find() without
taking rcu_read_lock() or the spinlock. Doing that is actually ok in this
case since the destructor has exclusive access to the entire maple tree,
but it triggers a lockdep warning. To fix that, take the rcu read lock.
In the future, it's possible that memory reclaim could gain a feature
where it reallocates entries in maple trees even if no user-code is
touching it. If that feature is added, then this use of rcu read lock
would become load-bearing, so I did not make it conditional on lockdep.
We have to repeatedly take and release rcu because the destructor of T
might perform operations that sleep.
Link: https://lkml.kernel.org/r/20251217-maple-drop-rcu-v1-1-702af063573f@google.…
Fixes: da939ef4c494 ("rust: maple_tree: add MapleTree")
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
Reported-by: Andreas Hindborg <a.hindborg(a)kernel.org>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/x/topic/x/near/5642151…
Reviewed-by: Gary Guo <gary(a)garyguo.net>
Reviewed-by: Daniel Almeida <daniel.almeida(a)collabora.com>
Cc: Andrew Ballance <andrewjballance(a)gmail.com>
Cc: Bj��rn Roy Baron <bjorn3_gh(a)protonmail.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miguel Ojeda <ojeda(a)kernel.org>
Cc: Trevor Gross <tmgross(a)umich.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
rust/kernel/maple_tree.rs | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/rust/kernel/maple_tree.rs~rust-maple_tree-rcu_read_lock-in-destructor-to-silence-lockdep
+++ a/rust/kernel/maple_tree.rs
@@ -265,7 +265,16 @@ impl<T: ForeignOwnable> MapleTree<T> {
loop {
// This uses the raw accessor because we're destroying pointers without removing them
// from the maple tree, which is only valid because this is the destructor.
- let ptr = ma_state.mas_find_raw(usize::MAX);
+ //
+ // Take the rcu lock because mas_find_raw() requires that you hold either the spinlock
+ // or the rcu read lock. This is only really required if memory reclaim might
+ // reallocate entries in the tree, as we otherwise have exclusive access. That feature
+ // doesn't exist yet, so for now, taking the rcu lock only serves the purpose of
+ // silencing lockdep.
+ let ptr = {
+ let _rcu = kernel::sync::rcu::Guard::new();
+ ma_state.mas_find_raw(usize::MAX)
+ };
if ptr.is_null() {
break;
}
_
Patches currently in -mm which might be from aliceryhl(a)google.com are
The quilt patch titled
Subject: tools/mm/page_owner_sort: fix timestamp comparison for stable sorting
has been removed from the -mm tree. Its filename was
tools-mm-page_owner_sort-fix-timestamp-comparison-for-stable-sorting.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kaushlendra Kumar <kaushlendra.kumar(a)intel.com>
Subject: tools/mm/page_owner_sort: fix timestamp comparison for stable sorting
Date: Tue, 9 Dec 2025 10:15:52 +0530
The ternary operator in compare_ts() returns 1 when timestamps are equal,
causing unstable sorting behavior. Replace with explicit three-way
comparison that returns 0 for equal timestamps, ensuring stable qsort
ordering and consistent output.
Link: https://lkml.kernel.org/r/20251209044552.3396468-1-kaushlendra.kumar@intel.…
Fixes: 8f9c447e2e2b ("tools/vm/page_owner_sort.c: support sorting pid and time")
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar(a)intel.com>
Cc: Chongxi Zhao <zhaochongxi2019(a)email.szu.edu.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/mm/page_owner_sort.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/tools/mm/page_owner_sort.c~tools-mm-page_owner_sort-fix-timestamp-comparison-for-stable-sorting
+++ a/tools/mm/page_owner_sort.c
@@ -181,7 +181,11 @@ static int compare_ts(const void *p1, co
{
const struct block_list *l1 = p1, *l2 = p2;
- return l1->ts_nsec < l2->ts_nsec ? -1 : 1;
+ if (l1->ts_nsec < l2->ts_nsec)
+ return -1;
+ if (l1->ts_nsec > l2->ts_nsec)
+ return 1;
+ return 0;
}
static int compare_cull_condition(const void *p1, const void *p2)
_
Patches currently in -mm which might be from kaushlendra.kumar(a)intel.com are
tools-mm-thp_swap_allocator_test-fix-small-folio-alignment.patch
tools-mm-slabinfo-fix-partial-long-option-mapping.patch
The quilt patch titled
Subject: selftests/mm: fix thread state check in uffd-unit-tests
has been removed from the -mm tree. Its filename was
selftests-mm-fix-thread-state-check-in-uffd-unit-tests.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Wake Liu <wakel(a)google.com>
Subject: selftests/mm: fix thread state check in uffd-unit-tests
Date: Wed, 10 Dec 2025 17:14:08 +0800
In the thread_state_get() function, the logic to find the thread's state
character was using `sizeof(header) - 1` to calculate the offset from the
"State:\t" string.
The `header` variable is a `const char *` pointer. `sizeof()` on a
pointer returns the size of the pointer itself, not the length of the
string literal it points to. This makes the code's behavior dependent on
the architecture's pointer size.
This bug was identified on a 32-bit ARM build (`gsi_tv_arm`) for Android,
running on an ARMv8-based device, compiled with Clang 19.0.1.
On this 32-bit architecture, `sizeof(char *)` is 4. The expression
`sizeof(header) - 1` resulted in an incorrect offset of 3, causing the
test to read the wrong character from `/proc/[tid]/status` and fail.
On 64-bit architectures, `sizeof(char *)` is 8, so the expression
coincidentally evaluates to 7, which matches the length of "State:\t".
This is why the bug likely remained hidden on 64-bit builds.
To fix this and make the code portable and correct across all
architectures, this patch replaces `sizeof(header) - 1` with
`strlen(header)`. The `strlen()` function correctly calculates the
string's length, ensuring the correct offset is always used.
Link: https://lkml.kernel.org/r/20251210091408.3781445-1-wakel@google.com
Fixes: f60b6634cd88 ("mm/selftests: add a test to verify mmap_changing race with -EAGAIN")
Signed-off-by: Wake Liu <wakel(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Bill Wendling <morbo(a)google.com>
Cc: Justin Stitt <justinstitt(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/uffd-unit-tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/uffd-unit-tests.c~selftests-mm-fix-thread-state-check-in-uffd-unit-tests
+++ a/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -1317,7 +1317,7 @@ static thread_state thread_state_get(pid
p = strstr(tmp, header);
if (p) {
/* For example, "State:\tD (disk sleep)" */
- c = *(p + sizeof(header) - 1);
+ c = *(p + strlen(header));
return c == 'D' ?
THR_STATE_UNINTERRUPTIBLE : THR_STATE_UNKNOWN;
}
_
Patches currently in -mm which might be from wakel(a)google.com are
The quilt patch titled
Subject: kernel/kexec: fix IMA when allocation happens in CMA area
has been removed from the -mm tree. Its filename was
kernel-kexec-fix-ima-when-allocation-happens-in-cma-area.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Pingfan Liu <piliu(a)redhat.com>
Subject: kernel/kexec: fix IMA when allocation happens in CMA area
Date: Tue, 16 Dec 2025 09:48:52 +0800
*** Bug description ***
When I tested kexec with the latest kernel, I ran into the following warning:
[ 40.712410] ------------[ cut here ]------------
[ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198
[...]
[ 40.816047] Call trace:
[ 40.818498] kimage_map_segment+0x144/0x198 (P)
[ 40.823221] ima_kexec_post_load+0x58/0xc0
[ 40.827246] __do_sys_kexec_file_load+0x29c/0x368
[...]
[ 40.855423] ---[ end trace 0000000000000000 ]---
*** How to reproduce ***
This bug is only triggered when the kexec target address is allocated in
the CMA area. If no CMA area is reserved in the kernel, use the "cma="
option in the kernel command line to reserve one.
*** Root cause ***
The commit 07d24902977e ("kexec: enable CMA based contiguous
allocation") allocates the kexec target address directly on the CMA area
to avoid copying during the jump. In this case, there is no IND_SOURCE
for the kexec segment. But the current implementation of
kimage_map_segment() assumes that IND_SOURCE pages exist and map them
into a contiguous virtual address by vmap().
*** Solution ***
If IMA segment is allocated in the CMA area, use its page_address()
directly.
Link: https://lkml.kernel.org/r/20251216014852.8737-2-piliu@redhat.com
Fixes: 07d24902977e ("kexec: enable CMA based contiguous allocation")
Signed-off-by: Pingfan Liu <piliu(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Steven Chen <chenste(a)linux.microsoft.com>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kexec_core.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/kernel/kexec_core.c~kernel-kexec-fix-ima-when-allocation-happens-in-cma-area
+++ a/kernel/kexec_core.c
@@ -960,13 +960,17 @@ void *kimage_map_segment(struct kimage *
kimage_entry_t *ptr, entry;
struct page **src_pages;
unsigned int npages;
+ struct page *cma;
void *vaddr = NULL;
int i;
+ cma = image->segment_cma[idx];
+ if (cma)
+ return page_address(cma);
+
addr = image->segment[idx].mem;
size = image->segment[idx].memsz;
eaddr = addr + size;
-
/*
* Collect the source pages and map them in a contiguous VA range.
*/
@@ -1007,7 +1011,8 @@ void *kimage_map_segment(struct kimage *
void kimage_unmap_segment(void *segment_buffer)
{
- vunmap(segment_buffer);
+ if (is_vmalloc_addr(segment_buffer))
+ vunmap(segment_buffer);
}
struct kexec_load_limit {
_
Patches currently in -mm which might be from piliu(a)redhat.com are
The quilt patch titled
Subject: kasan: unpoison vms[area] addresses with a common tag
has been removed from the -mm tree. Its filename was
kasan-unpoison-vms-addresses-with-a-common-tag.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Maciej Wieczor-Retman <maciej.wieczor-retman(a)intel.com>
Subject: kasan: unpoison vms[area] addresses with a common tag
Date: Thu, 04 Dec 2025 19:00:11 +0000
A KASAN tag mismatch, possibly causing a kernel panic, can be observed on
systems with a tag-based KASAN enabled and with multiple NUMA nodes. It
was reported on arm64 and reproduced on x86. It can be explained in the
following points:
1. There can be more than one virtual memory chunk.
2. Chunk's base address has a tag.
3. The base address points at the first chunk and thus inherits
the tag of the first chunk.
4. The subsequent chunks will be accessed with the tag from the
first chunk.
5. Thus, the subsequent chunks need to have their tag set to
match that of the first chunk.
Use the new vmalloc flag that disables random tag assignment in
__kasan_unpoison_vmalloc() - pass the same random tag to all the
vm_structs by tagging the pointers before they go inside
__kasan_unpoison_vmalloc(). Assigning a common tag resolves the pcpu
chunk address mismatch.
[akpm(a)linux-foundation.org: use WARN_ON_ONCE(), per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZeuGdKSEm11oGT6FS71_vGq1vjq-xY36kxVdFvwmag2…
[maciej.wieczor-retman(a)intel.com: remove unneeded pr_warn()]
Link: https://lkml.kernel.org/r/919897daaaa3c982a27762a2ee038769ad033991.17649453…
Link: https://lkml.kernel.org/r/873821114a9f722ffb5d6702b94782e902883fdf.17648745…
Fixes: 1d96320f8d53 ("kasan, vmalloc: add vmalloc tagging for SW_TAGS")
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman(a)intel.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Jiayuan Chen <jiayuan.chen(a)linux.dev>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Marco Elver <elver(a)google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/common.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
--- a/mm/kasan/common.c~kasan-unpoison-vms-addresses-with-a-common-tag
+++ a/mm/kasan/common.c
@@ -584,11 +584,26 @@ void __kasan_unpoison_vmap_areas(struct
unsigned long size;
void *addr;
int area;
+ u8 tag;
- for (area = 0 ; area < nr_vms ; area++) {
+ /*
+ * If KASAN_VMALLOC_KEEP_TAG was set at this point, all vms[] pointers
+ * would be unpoisoned with the KASAN_TAG_KERNEL which would disable
+ * KASAN checks down the line.
+ */
+ if (WARN_ON_ONCE(flags & KASAN_VMALLOC_KEEP_TAG))
+ return;
+
+ size = vms[0]->size;
+ addr = vms[0]->addr;
+ vms[0]->addr = __kasan_unpoison_vmalloc(addr, size, flags);
+ tag = get_tag(vms[0]->addr);
+
+ for (area = 1 ; area < nr_vms ; area++) {
size = vms[area]->size;
- addr = vms[area]->addr;
- vms[area]->addr = __kasan_unpoison_vmalloc(addr, size, flags);
+ addr = set_tag(vms[area]->addr, tag);
+ vms[area]->addr =
+ __kasan_unpoison_vmalloc(addr, size, flags | KASAN_VMALLOC_KEEP_TAG);
}
}
#endif
_
Patches currently in -mm which might be from maciej.wieczor-retman(a)intel.com are
The quilt patch titled
Subject: kasan: refactor pcpu kasan vmalloc unpoison
has been removed from the -mm tree. Its filename was
kasan-refactor-pcpu-kasan-vmalloc-unpoison.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Maciej Wieczor-Retman <maciej.wieczor-retman(a)intel.com>
Subject: kasan: refactor pcpu kasan vmalloc unpoison
Date: Thu, 04 Dec 2025 19:00:04 +0000
A KASAN tag mismatch, possibly causing a kernel panic, can be observed
on systems with a tag-based KASAN enabled and with multiple NUMA nodes.
It was reported on arm64 and reproduced on x86. It can be explained in
the following points:
1. There can be more than one virtual memory chunk.
2. Chunk's base address has a tag.
3. The base address points at the first chunk and thus inherits
the tag of the first chunk.
4. The subsequent chunks will be accessed with the tag from the
first chunk.
5. Thus, the subsequent chunks need to have their tag set to
match that of the first chunk.
Refactor code by reusing __kasan_unpoison_vmalloc in a new helper in
preparation for the actual fix.
Link: https://lkml.kernel.org/r/eb61d93b907e262eefcaa130261a08bcb6c5ce51.17648745…
Fixes: 1d96320f8d53 ("kasan, vmalloc: add vmalloc tagging for SW_TAGS")
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman(a)intel.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Jiayuan Chen <jiayuan.chen(a)linux.dev>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Marco Elver <elver(a)google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/kasan.h | 15 +++++++++++++++
mm/kasan/common.c | 17 +++++++++++++++++
mm/vmalloc.c | 4 +---
3 files changed, 33 insertions(+), 3 deletions(-)
--- a/include/linux/kasan.h~kasan-refactor-pcpu-kasan-vmalloc-unpoison
+++ a/include/linux/kasan.h
@@ -631,6 +631,16 @@ static __always_inline void kasan_poison
__kasan_poison_vmalloc(start, size);
}
+void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms,
+ kasan_vmalloc_flags_t flags);
+static __always_inline void
+kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms,
+ kasan_vmalloc_flags_t flags)
+{
+ if (kasan_enabled())
+ __kasan_unpoison_vmap_areas(vms, nr_vms, flags);
+}
+
#else /* CONFIG_KASAN_VMALLOC */
static inline void kasan_populate_early_vm_area_shadow(void *start,
@@ -655,6 +665,11 @@ static inline void *kasan_unpoison_vmall
static inline void kasan_poison_vmalloc(const void *start, unsigned long size)
{ }
+static __always_inline void
+kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms,
+ kasan_vmalloc_flags_t flags)
+{ }
+
#endif /* CONFIG_KASAN_VMALLOC */
#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
--- a/mm/kasan/common.c~kasan-refactor-pcpu-kasan-vmalloc-unpoison
+++ a/mm/kasan/common.c
@@ -28,6 +28,7 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/bug.h>
+#include <linux/vmalloc.h>
#include "kasan.h"
#include "../slab.h"
@@ -575,3 +576,19 @@ bool __kasan_check_byte(const void *addr
}
return true;
}
+
+#ifdef CONFIG_KASAN_VMALLOC
+void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms,
+ kasan_vmalloc_flags_t flags)
+{
+ unsigned long size;
+ void *addr;
+ int area;
+
+ for (area = 0 ; area < nr_vms ; area++) {
+ size = vms[area]->size;
+ addr = vms[area]->addr;
+ vms[area]->addr = __kasan_unpoison_vmalloc(addr, size, flags);
+ }
+}
+#endif
--- a/mm/vmalloc.c~kasan-refactor-pcpu-kasan-vmalloc-unpoison
+++ a/mm/vmalloc.c
@@ -5027,9 +5027,7 @@ retry:
* With hardware tag-based KASAN, marking is skipped for
* non-VM_ALLOC mappings, see __kasan_unpoison_vmalloc().
*/
- for (area = 0; area < nr_vms; area++)
- vms[area]->addr = kasan_unpoison_vmalloc(vms[area]->addr,
- vms[area]->size, KASAN_VMALLOC_PROT_NORMAL);
+ kasan_unpoison_vmap_areas(vms, nr_vms, KASAN_VMALLOC_PROT_NORMAL);
kfree(vas);
return vms;
_
Patches currently in -mm which might be from maciej.wieczor-retman(a)intel.com are
The quilt patch titled
Subject: mm/kasan: fix incorrect unpoisoning in vrealloc for KASAN
has been removed from the -mm tree. Its filename was
mm-kasan-fix-incorrect-unpoisoning-in-vrealloc-for-kasan.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jiayuan Chen <jiayuan.chen(a)linux.dev>
Subject: mm/kasan: fix incorrect unpoisoning in vrealloc for KASAN
Date: Thu, 04 Dec 2025 18:59:55 +0000
Patch series "kasan: vmalloc: Fixes for the percpu allocator and
vrealloc", v3.
Patches fix two issues related to KASAN and vmalloc.
The first one, a KASAN tag mismatch, possibly resulting in a kernel panic,
can be observed on systems with a tag-based KASAN enabled and with
multiple NUMA nodes. Initially it was only noticed on x86 [1] but later a
similar issue was also reported on arm64 [2].
Specifically the problem is related to how vm_structs interact with
pcpu_chunks - both when they are allocated, assigned and when pcpu_chunk
addresses are derived.
When vm_structs are allocated they are unpoisoned, each with a different
random tag, if vmalloc support is enabled along the KASAN mode. Later
when first pcpu chunk is allocated it gets its 'base_addr' field set to
the first allocated vm_struct. With that it inherits that vm_struct's
tag.
When pcpu_chunk addresses are later derived (by pcpu_chunk_addr(), for
example in pcpu_alloc_noprof()) the base_addr field is used and offsets
are added to it. If the initial conditions are satisfied then some of the
offsets will point into memory allocated with a different vm_struct. So
while the lower bits will get accurately derived the tag bits in the top
of the pointer won't match the shadow memory contents.
The solution (proposed at v2 of the x86 KASAN series [3]) is to unpoison
the vm_structs with the same tag when allocating them for the per cpu
allocator (in pcpu_get_vm_areas()).
The second one reported by syzkaller [4] is related to vrealloc and
happens because of random tag generation when unpoisoning memory without
allocating new pages. This breaks shadow memory tracking and needs to
reuse the existing tag instead of generating a new one. At the same time
an inconsistency in used flags is corrected.
This patch (of 3):
Syzkaller reported a memory out-of-bounds bug [4]. This patch fixes two
issues:
1. In vrealloc the KASAN_VMALLOC_VM_ALLOC flag is missing when
unpoisoning the extended region. This flag is required to correctly
associate the allocation with KASAN's vmalloc tracking.
Note: In contrast, vzalloc (via __vmalloc_node_range_noprof)
explicitly sets KASAN_VMALLOC_VM_ALLOC and calls
kasan_unpoison_vmalloc() with it. vrealloc must behave consistently --
especially when reusing existing vmalloc regions -- to ensure KASAN can
track allocations correctly.
2. When vrealloc reuses an existing vmalloc region (without allocating
new pages) KASAN generates a new tag, which breaks tag-based memory
access tracking.
Introduce KASAN_VMALLOC_KEEP_TAG, a new KASAN flag that allows reusing the
tag already attached to the pointer, ensuring consistent tag behavior
during reallocation.
Pass KASAN_VMALLOC_KEEP_TAG and KASAN_VMALLOC_VM_ALLOC to the
kasan_unpoison_vmalloc inside vrealloc_node_align_noprof().
Link: https://lkml.kernel.org/r/cover.1765978969.git.m.wieczorretman@pm.me
Link: https://lkml.kernel.org/r/38dece0a4074c43e48150d1e242f8242c73bf1a5.17648745…
Link: https://lore.kernel.org/all/e7e04692866d02e6d3b32bb43b998e5d17092ba4.173868… [1]
Link: https://lore.kernel.org/all/aMUrW1Znp1GEj7St@MiWiFi-R3L-srv/ [2]
Link: https://lore.kernel.org/all/CAPAsAGxDRv_uFeMYu9TwhBVWHCCtkSxoWY4xmFB_vowMbi… [3]
Link: https://syzkaller.appspot.com/bug?extid=997752115a851cb0cf36 [4]
Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing")
Signed-off-by: Jiayuan Chen <jiayuan.chen(a)linux.dev>
Co-developed-by: Maciej Wieczor-Retman <maciej.wieczor-retman(a)intel.com>
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman(a)intel.com>
Reported-by: syzbot+997752115a851cb0cf36(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68e243a2.050a0220.1696c6.007d.GAE@google.com/T/
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Marco Elver <elver(a)google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/kasan.h | 1 +
mm/kasan/hw_tags.c | 2 +-
mm/kasan/shadow.c | 4 +++-
mm/vmalloc.c | 4 +++-
4 files changed, 8 insertions(+), 3 deletions(-)
--- a/include/linux/kasan.h~mm-kasan-fix-incorrect-unpoisoning-in-vrealloc-for-kasan
+++ a/include/linux/kasan.h
@@ -28,6 +28,7 @@ typedef unsigned int __bitwise kasan_vma
#define KASAN_VMALLOC_INIT ((__force kasan_vmalloc_flags_t)0x01u)
#define KASAN_VMALLOC_VM_ALLOC ((__force kasan_vmalloc_flags_t)0x02u)
#define KASAN_VMALLOC_PROT_NORMAL ((__force kasan_vmalloc_flags_t)0x04u)
+#define KASAN_VMALLOC_KEEP_TAG ((__force kasan_vmalloc_flags_t)0x08u)
#define KASAN_VMALLOC_PAGE_RANGE 0x1 /* Apply exsiting page range */
#define KASAN_VMALLOC_TLB_FLUSH 0x2 /* TLB flush */
--- a/mm/kasan/hw_tags.c~mm-kasan-fix-incorrect-unpoisoning-in-vrealloc-for-kasan
+++ a/mm/kasan/hw_tags.c
@@ -361,7 +361,7 @@ void *__kasan_unpoison_vmalloc(const voi
return (void *)start;
}
- tag = kasan_random_tag();
+ tag = (flags & KASAN_VMALLOC_KEEP_TAG) ? get_tag(start) : kasan_random_tag();
start = set_tag(start, tag);
/* Unpoison and initialize memory up to size. */
--- a/mm/kasan/shadow.c~mm-kasan-fix-incorrect-unpoisoning-in-vrealloc-for-kasan
+++ a/mm/kasan/shadow.c
@@ -631,7 +631,9 @@ void *__kasan_unpoison_vmalloc(const voi
!(flags & KASAN_VMALLOC_PROT_NORMAL))
return (void *)start;
- start = set_tag(start, kasan_random_tag());
+ if (unlikely(!(flags & KASAN_VMALLOC_KEEP_TAG)))
+ start = set_tag(start, kasan_random_tag());
+
kasan_unpoison(start, size, false);
return (void *)start;
}
--- a/mm/vmalloc.c~mm-kasan-fix-incorrect-unpoisoning-in-vrealloc-for-kasan
+++ a/mm/vmalloc.c
@@ -4331,7 +4331,9 @@ void *vrealloc_node_align_noprof(const v
*/
if (size <= alloced_size) {
kasan_unpoison_vmalloc(p + old_size, size - old_size,
- KASAN_VMALLOC_PROT_NORMAL);
+ KASAN_VMALLOC_PROT_NORMAL |
+ KASAN_VMALLOC_VM_ALLOC |
+ KASAN_VMALLOC_KEEP_TAG);
/*
* No need to zero memory here, as unused memory will have
* already been zeroed at initial allocation time or during
_
Patches currently in -mm which might be from jiayuan.chen(a)linux.dev are
The quilt patch titled
Subject: idr: fix idr_alloc() returning an ID out of range
has been removed from the -mm tree. Its filename was
idr-fix-idr_alloc-returning-an-id-out-of-range.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: idr: fix idr_alloc() returning an ID out of range
Date: Fri, 28 Nov 2025 16:18:32 +0000
If you use an IDR with a non-zero base, and specify a range that lies
entirely below the base, 'max - base' becomes very large and
idr_get_free() can return an ID that lies outside of the requested range.
Link: https://lkml.kernel.org/r/20251128161853.3200058-1-willy@infradead.org
Fixes: 6ce711f27500 ("idr: Make 1-based IDRs more efficient")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Jan Sokolowski <jan.sokolowski(a)intel.com>
Reported-by: Koen Koning <koen.koning(a)intel.com>
Reported-by: Peter Senna Tschudin <peter.senna(a)linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6449
Reviewed-by: Christian K��nig <christian.koenig(a)amd.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/idr.c | 2 ++
tools/testing/radix-tree/idr-test.c | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+)
--- a/lib/idr.c~idr-fix-idr_alloc-returning-an-id-out-of-range
+++ a/lib/idr.c
@@ -40,6 +40,8 @@ int idr_alloc_u32(struct idr *idr, void
if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR)))
idr->idr_rt.xa_flags |= IDR_RT_MARKER;
+ if (max < base)
+ return -ENOSPC;
id = (id < base) ? 0 : id - base;
radix_tree_iter_init(&iter, id);
--- a/tools/testing/radix-tree/idr-test.c~idr-fix-idr_alloc-returning-an-id-out-of-range
+++ a/tools/testing/radix-tree/idr-test.c
@@ -57,6 +57,26 @@ void idr_alloc_test(void)
idr_destroy(&idr);
}
+void idr_alloc2_test(void)
+{
+ int id;
+ struct idr idr = IDR_INIT_BASE(idr, 1);
+
+ id = idr_alloc(&idr, idr_alloc2_test, 0, 1, GFP_KERNEL);
+ assert(id == -ENOSPC);
+
+ id = idr_alloc(&idr, idr_alloc2_test, 1, 2, GFP_KERNEL);
+ assert(id == 1);
+
+ id = idr_alloc(&idr, idr_alloc2_test, 0, 1, GFP_KERNEL);
+ assert(id == -ENOSPC);
+
+ id = idr_alloc(&idr, idr_alloc2_test, 0, 2, GFP_KERNEL);
+ assert(id == -ENOSPC);
+
+ idr_destroy(&idr);
+}
+
void idr_replace_test(void)
{
DEFINE_IDR(idr);
@@ -409,6 +429,7 @@ void idr_checks(void)
idr_replace_test();
idr_alloc_test();
+ idr_alloc2_test();
idr_null_test();
idr_nowait_test();
idr_get_next_test(0);
_
Patches currently in -mm which might be from willy(a)infradead.org are
From: Shigeru Yoshida <syoshida(a)redhat.com>
commit 4e13d3a9c25b7080f8a619f961e943fe08c2672c upstream.
As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in
__ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags
instead of testing HDRINCL on the socket to avoid a race condition which
causes uninit-value access.
Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()")
Signed-off-by: Shigeru Yoshida <syoshida(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Shubham Kulkarni <skulkarni(a)mvista.com>
---
Referred stable v6.1.y version of the patch to generate this one
[ v6.1 link: https://github.com/gregkh/linux/commit/a05c1ede50e9656f0752e523c7b54f3a3489… ]
---
net/ipv6/ip6_output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 426330b8dfa4..99ee18b3a953 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1917,7 +1917,8 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
u8 icmp6_type;
- if (sk->sk_socket->type == SOCK_RAW && !inet_sk(sk)->hdrincl)
+ if (sk->sk_socket->type == SOCK_RAW &&
+ !(fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH))
icmp6_type = fl6->fl6_icmp_type;
else
icmp6_type = icmp6_hdr(skb)->icmp6_type;
--
2.25.1
From: Shigeru Yoshida <syoshida(a)redhat.com>
commit 4e13d3a9c25b7080f8a619f961e943fe08c2672c upstream.
As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in
__ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags
instead of testing HDRINCL on the socket to avoid a race condition which
causes uninit-value access.
Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()")
Signed-off-by: Shigeru Yoshida <syoshida(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Shubham Kulkarni <skulkarni(a)mvista.com>
---
Referred stable v6.1.y version of the patch to generate this one
[ v6.1 link: https://github.com/gregkh/linux/commit/a05c1ede50e9656f0752e523c7b54f3a3489… ]
---
net/ipv6/ip6_output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 426330b8dfa4..99ee18b3a953 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1917,7 +1917,8 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
u8 icmp6_type;
- if (sk->sk_socket->type == SOCK_RAW && !inet_sk(sk)->hdrincl)
+ if (sk->sk_socket->type == SOCK_RAW &&
+ !(fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH))
icmp6_type = fl6->fl6_icmp_type;
else
icmp6_type = icmp6_hdr(skb)->icmp6_type;
--
2.25.1
When a newly poisoned subpage ends up in an already poisoned hugetlb
folio, 'num_poisoned_pages' is incremented, but the per node ->mf_stats
is not. Fix the inconsistency by designating action_result() to update
them both.
While at it, define __get_huge_page_for_hwpoison() return values in terms
of symbol names for better readibility. Also rename
folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison() since the
function does more than the conventional bit setting and the fact
three possible return values are expected.
Fixes: 18f41fa616ee4 ("mm: memory-failure: bump memory failure stats to pglist_data")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jane Chu <jane.chu(a)oracle.com>
---
v2 -> v3:
No change.
v1 -> v2:
adapted David and Liam's comment, define __get_huge_page_for_hwpoison()
return values in terms of symbol names instead of naked integers for better
readibility. #define instead of enum is used since the function has footprint
outside MF, just try to limit the MF specifics local.
also renamed folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison()
since the function does more than the conventional bit setting and the
fact three possible return values are expected.
---
mm/memory-failure.c | 56 ++++++++++++++++++++++++++-------------------
1 file changed, 33 insertions(+), 23 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fbc5a01260c8..8b47e8a1b12d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1883,12 +1883,18 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
return count;
}
-static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
+#define MF_HUGETLB_ALREADY_POISONED 3 /* already poisoned */
+#define MF_HUGETLB_ACC_EXISTING_POISON 4 /* accessed existing poisoned page */
+/*
+ * Set hugetlb folio as hwpoisoned, update folio private raw hwpoison list
+ * to keep track of the poisoned pages.
+ */
+static int hugetlb_update_hwpoison(struct folio *folio, struct page *page)
{
struct llist_head *head;
struct raw_hwp_page *raw_hwp;
struct raw_hwp_page *p;
- int ret = folio_test_set_hwpoison(folio) ? -EHWPOISON : 0;
+ int ret = folio_test_set_hwpoison(folio) ? MF_HUGETLB_ALREADY_POISONED : 0;
/*
* Once the hwpoison hugepage has lost reliable raw error info,
@@ -1896,20 +1902,18 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
* so skip to add additional raw error info.
*/
if (folio_test_hugetlb_raw_hwp_unreliable(folio))
- return -EHWPOISON;
+ return MF_HUGETLB_ALREADY_POISONED;
+
head = raw_hwp_list_head(folio);
llist_for_each_entry(p, head->first, node) {
if (p->page == page)
- return -EHWPOISON;
+ return MF_HUGETLB_ACC_EXISTING_POISON;
}
raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
if (raw_hwp) {
raw_hwp->page = page;
llist_add(&raw_hwp->node, head);
- /* the first error event will be counted in action_result(). */
- if (ret)
- num_poisoned_pages_inc(page_to_pfn(page));
} else {
/*
* Failed to save raw error info. We no longer trace all
@@ -1955,32 +1959,30 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
folio_free_raw_hwp(folio, true);
}
+#define MF_HUGETLB_FREED 0 /* freed hugepage */
+#define MF_HUGETLB_IN_USED 1 /* in-use hugepage */
+#define MF_NOT_HUGETLB 2 /* not a hugepage */
+
/*
* Called from hugetlb code with hugetlb_lock held.
- *
- * Return values:
- * 0 - free hugepage
- * 1 - in-use hugepage
- * 2 - not a hugepage
- * -EBUSY - the hugepage is busy (try to retry)
- * -EHWPOISON - the hugepage is already hwpoisoned
*/
int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
bool *migratable_cleared)
{
struct page *page = pfn_to_page(pfn);
struct folio *folio = page_folio(page);
- int ret = 2; /* fallback to normal page handling */
+ int ret = MF_NOT_HUGETLB;
bool count_increased = false;
+ int rc;
if (!folio_test_hugetlb(folio))
goto out;
if (flags & MF_COUNT_INCREASED) {
- ret = 1;
+ ret = MF_HUGETLB_IN_USED;
count_increased = true;
} else if (folio_test_hugetlb_freed(folio)) {
- ret = 0;
+ ret = MF_HUGETLB_FREED;
} else if (folio_test_hugetlb_migratable(folio)) {
ret = folio_try_get(folio);
if (ret)
@@ -1991,8 +1993,9 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
goto out;
}
- if (folio_set_hugetlb_hwpoison(folio, page)) {
- ret = -EHWPOISON;
+ rc = hugetlb_update_hwpoison(folio, page);
+ if (rc >= MF_HUGETLB_ALREADY_POISONED) {
+ ret = rc;
goto out;
}
@@ -2029,22 +2032,29 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
*hugetlb = 1;
retry:
res = get_huge_page_for_hwpoison(pfn, flags, &migratable_cleared);
- if (res == 2) { /* fallback to normal page handling */
+ switch (res) {
+ case MF_NOT_HUGETLB: /* fallback to normal page handling */
*hugetlb = 0;
return 0;
- } else if (res == -EHWPOISON) {
+ case MF_HUGETLB_ALREADY_POISONED:
+ case MF_HUGETLB_ACC_EXISTING_POISON:
if (flags & MF_ACTION_REQUIRED) {
folio = page_folio(p);
res = kill_accessing_process(current, folio_pfn(folio), flags);
}
- action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
+ if (res == MF_HUGETLB_ALREADY_POISONED)
+ action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
+ else
+ action_result(pfn, MF_MSG_HUGE, MF_FAILED);
return res;
- } else if (res == -EBUSY) {
+ case -EBUSY:
if (!(flags & MF_NO_RETRY)) {
flags |= MF_NO_RETRY;
goto retry;
}
return action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
+ default:
+ break;
}
folio = page_folio(p);
--
2.43.5
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 0edc78b82bea85e1b2165d8e870a5c3535919695
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122351-chip-causal-80ed@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0edc78b82bea85e1b2165d8e870a5c3535919695 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 25 Nov 2025 22:50:45 +0100
Subject: [PATCH] x86/msi: Make irq_retrigger() functional for posted MSI
Luigi reported that retriggering a posted MSI interrupt does not work
correctly.
The reason is that the retrigger happens at the vector domain by sending an
IPI to the actual vector on the target CPU. That works correctly exactly
once because the posted MSI interrupt chip does not issue an EOI as that's
only required for the posted MSI notification vector itself.
As a consequence the vector becomes stale in the ISR, which not only
affects this vector but also any lower priority vector in the affected
APIC because the ISR bit is not cleared.
Luigi proposed to set the vector in the remap PIR bitmap and raise the
posted MSI notification vector. That works, but that still does not cure a
related problem:
If there is ever a stray interrupt on such a vector, then the related
APIC ISR bit becomes stale due to the lack of EOI as described above.
Unlikely to happen, but if it happens it's not debuggable at all.
So instead of playing games with the PIR, this can be actually solved
for both cases by:
1) Keeping track of the posted interrupt vector handler state
2) Implementing a posted MSI specific irq_ack() callback which checks that
state. If the posted vector handler is inactive it issues an EOI,
otherwise it delegates that to the posted handler.
This is correct versus affinity changes and concurrent events on the posted
vector as the actual handler invocation is serialized through the interrupt
descriptor lock.
Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs")
Reported-by: Luigi Rizzo <lrizzo(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Luigi Rizzo <lrizzo(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/20251125214631.044440658@linutronix.de
Closes: https://lore.kernel.org/lkml/20251124104836.3685533-1-lrizzo@google.com
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 5a0d42464d44..4e55d1755846 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -87,4 +87,11 @@ static inline void panic_if_irq_remap(const char *msg)
}
#endif /* CONFIG_IRQ_REMAP */
+
+#ifdef CONFIG_X86_POSTED_MSI
+void intel_ack_posted_msi_irq(struct irq_data *irqd);
+#else
+#define intel_ack_posted_msi_irq NULL
+#endif
+
#endif /* __X86_IRQ_REMAPPING_H */
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 86f4e574de02..b2fe6181960c 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -397,6 +397,7 @@ DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_nested_ipi)
/* Posted Interrupt Descriptors for coalesced MSIs to be posted */
DEFINE_PER_CPU_ALIGNED(struct pi_desc, posted_msi_pi_desc);
+static DEFINE_PER_CPU_CACHE_HOT(bool, posted_msi_handler_active);
void intel_posted_msi_init(void)
{
@@ -414,6 +415,25 @@ void intel_posted_msi_init(void)
this_cpu_write(posted_msi_pi_desc.ndst, destination);
}
+void intel_ack_posted_msi_irq(struct irq_data *irqd)
+{
+ irq_move_irq(irqd);
+
+ /*
+ * Handle the rare case that irq_retrigger() raised the actual
+ * assigned vector on the target CPU, which means that it was not
+ * invoked via the posted MSI handler below. In that case APIC EOI
+ * is required as otherwise the ISR entry becomes stale and lower
+ * priority interrupts are never going to be delivered after that.
+ *
+ * If the posted handler invoked the device interrupt handler then
+ * the EOI would be premature because it would acknowledge the
+ * posted vector.
+ */
+ if (unlikely(!__this_cpu_read(posted_msi_handler_active)))
+ apic_eoi();
+}
+
static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_regs *regs)
{
unsigned long pir_copy[NR_PIR_WORDS];
@@ -446,6 +466,8 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
pid = this_cpu_ptr(&posted_msi_pi_desc);
+ /* Mark the handler active for intel_ack_posted_msi_irq() */
+ __this_cpu_write(posted_msi_handler_active, true);
inc_irq_stat(posted_msi_notification_count);
irq_enter();
@@ -474,6 +496,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
apic_eoi();
irq_exit();
+ __this_cpu_write(posted_msi_handler_active, false);
set_irq_regs(old_regs);
}
#endif /* X86_POSTED_MSI */
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 4f9b01dc91e8..8bcbfe3d9c72 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1303,17 +1303,17 @@ static struct irq_chip intel_ir_chip = {
* irq_enter();
* handle_edge_irq()
* irq_chip_ack_parent()
- * irq_move_irq(); // No EOI
+ * intel_ack_posted_msi_irq(); // No EOI
* handle_irq_event()
* driver_handler()
* handle_edge_irq()
* irq_chip_ack_parent()
- * irq_move_irq(); // No EOI
+ * intel_ack_posted_msi_irq(); // No EOI
* handle_irq_event()
* driver_handler()
* handle_edge_irq()
* irq_chip_ack_parent()
- * irq_move_irq(); // No EOI
+ * intel_ack_posted_msi_irq(); // No EOI
* handle_irq_event()
* driver_handler()
* apic_eoi()
@@ -1322,7 +1322,7 @@ static struct irq_chip intel_ir_chip = {
*/
static struct irq_chip intel_ir_chip_post_msi = {
.name = "INTEL-IR-POST",
- .irq_ack = irq_move_irq,
+ .irq_ack = intel_ack_posted_msi_irq,
.irq_set_affinity = intel_ir_set_affinity,
.irq_compose_msi_msg = intel_ir_compose_msi_msg,
.irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity,
The USB OTG port of the RK3308 exibits a bug when:
- configured as peripheral, and
- used in gadget mode, and
- the USB cable is connected since before booting
The symptom is: about 6 seconds after configuring gadget mode the device is
disconnected and then re-enumerated. This happens only once per boot.
Investigation showed that in this configuration the charger detection code
turns off the PHY after 6 seconds. Patch 1 avoids this when a cable is
connected (VBUS present).
After patch 1 the connection is stable but communication stops after 6
seconds. this is addressed by patch 2.
The topic had been discussed in [0]. Thanks Alan and Minas for the
discussion and Louis for having found the 1st issue, leading to patch 1.
[0] https://lore.kernel.org/lkml/20250414185458.7767aabc@booty/
Luca
Signed-off-by: Luca Ceresoli <luca.ceresoli(a)bootlin.com>
---
Changes in v2:
- Patch 1: Fixed Co-developed-by: and SoB line order
- Added missing Cc: stable(a)vger.kernel.org
- Added Théo's Reviewed-by
- Improved commit message
- Patch 2: trimmed discussion about "there is no disconnection" from
commit message to not distract from the actual issue (writes to
chg_det.opmode)
- Link to v1: https://lore.kernel.org/r/20250722-rk3308-fix-usb-gadget-phy-disconnect-v1-…
---
Louis Chauvet (1):
phy: rockchip: inno-usb2: fix disconnection in gadget mode
Luca Ceresoli (1):
phy: rockchip: inno-usb2: fix communication disruption in gadget mode
drivers/phy/rockchip/phy-rockchip-inno-usb2.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
---
base-commit: cabb748c4b98ef67bbb088be61a2e0c850ebf70d
change-id: 20250718-rk3308-fix-usb-gadget-phy-disconnect-d7de71fb28b4
Best regards,
--
Luca Ceresoli <luca.ceresoli(a)bootlin.com>
The mmio regmap that may be allocated during probe is never freed.
Switch to using the device managed allocator so that the regmap is
released on probe failures (e.g. probe deferral) and on driver unbind.
Fixes: 5ab90f40121a ("phy: ti: gmii-sel: Do not use syscon helper to build regmap")
Cc: stable(a)vger.kernel.org # 6.14
Cc: Andrew Davis <afd(a)ti.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/phy/ti/phy-gmii-sel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/phy/ti/phy-gmii-sel.c b/drivers/phy/ti/phy-gmii-sel.c
index 50adabb867cb..26209a89703a 100644
--- a/drivers/phy/ti/phy-gmii-sel.c
+++ b/drivers/phy/ti/phy-gmii-sel.c
@@ -512,7 +512,7 @@ static int phy_gmii_sel_probe(struct platform_device *pdev)
return dev_err_probe(dev, PTR_ERR(base),
"failed to get base memory resource\n");
- priv->regmap = regmap_init_mmio(dev, base, &phy_gmii_sel_regmap_cfg);
+ priv->regmap = devm_regmap_init_mmio(dev, base, &phy_gmii_sel_regmap_cfg);
if (IS_ERR(priv->regmap))
return dev_err_probe(dev, PTR_ERR(priv->regmap),
"Failed to get syscon\n");
--
2.51.2
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3e54d3b4a8437b6783d4145c86962a2aa51022f3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122306-hurry-upstream-964e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3e54d3b4a8437b6783d4145c86962a2aa51022f3 Mon Sep 17 00:00:00 2001
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
Date: Mon, 1 Dec 2025 19:26:38 +0100
Subject: [PATCH] can: gs_usb: gs_can_open(): fix error handling
Commit 2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
added missing error handling to the gs_can_open() function.
The driver uses 2 USB anchors to track the allocated URBs: the TX URBs in
struct gs_can::tx_submitted for each netdev and the RX URBs in struct
gs_usb::rx_submitted for the USB device. gs_can_open() allocates the RX
URBs, while TX URBs are allocated during gs_can_start_xmit().
The cleanup in gs_can_open() kills all anchored dev->tx_submitted
URBs (which is not necessary since the netdev is not yet registered), but
misses the parent->rx_submitted URBs.
Fix the problem by killing the rx_submitted instead of the tx_submitted.
Fixes: 2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/20251210-gs_usb-fix-error-handling-v1-1-d6a5a03f10…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index e29e85b67fd4..a0233e550a5a 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -1074,7 +1074,7 @@ static int gs_can_open(struct net_device *netdev)
usb_free_urb(urb);
out_usb_kill_anchored_urbs:
if (!parent->active_channels) {
- usb_kill_anchored_urbs(&dev->tx_submitted);
+ usb_kill_anchored_urbs(&parent->rx_submitted);
if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
gs_usb_timestamp_stop(parent);
From: Franz Schnyder <franz.schnyder(a)toradex.com>
Currently, the PHY only registers the typec orientation switch when it
is built in. If the typec driver is built as a module, the switch
registration is skipped due to the preprocessor condition, causing
orientation detection to fail.
With commit
45fe729be9a6 ("usb: typec: Stub out typec_switch APIs when CONFIG_TYPEC=n")
the preprocessor condition is not needed anymore and the orientation
switch is correctly registered for both built-in and module builds.
Fixes: b58f0f86fd61 ("phy: fsl-imx8mq-usb: add tca function driver for imx95")
Cc: stable(a)vger.kernel.org
Suggested-by: Xu Yang <xu.yang_2(a)nxp.com>
Signed-off-by: Franz Schnyder <franz.schnyder(a)toradex.com>
---
v2: Drop the preprocessor condition after a better suggestion.
Reviewed-by Neil tag not added as patch is different
---
drivers/phy/freescale/phy-fsl-imx8mq-usb.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/drivers/phy/freescale/phy-fsl-imx8mq-usb.c b/drivers/phy/freescale/phy-fsl-imx8mq-usb.c
index b94f242420fc..72e8aff38b92 100644
--- a/drivers/phy/freescale/phy-fsl-imx8mq-usb.c
+++ b/drivers/phy/freescale/phy-fsl-imx8mq-usb.c
@@ -124,8 +124,6 @@ struct imx8mq_usb_phy {
static void tca_blk_orientation_set(struct tca_blk *tca,
enum typec_orientation orientation);
-#ifdef CONFIG_TYPEC
-
static int tca_blk_typec_switch_set(struct typec_switch_dev *sw,
enum typec_orientation orientation)
{
@@ -173,18 +171,6 @@ static void tca_blk_put_typec_switch(struct typec_switch_dev *sw)
typec_switch_unregister(sw);
}
-#else
-
-static struct typec_switch_dev *tca_blk_get_typec_switch(struct platform_device *pdev,
- struct imx8mq_usb_phy *imx_phy)
-{
- return NULL;
-}
-
-static void tca_blk_put_typec_switch(struct typec_switch_dev *sw) {}
-
-#endif /* CONFIG_TYPEC */
-
static void tca_blk_orientation_set(struct tca_blk *tca,
enum typec_orientation orientation)
{
--
2.43.0
The patch titled
Subject: buildid: validate page-backed file before parsing build ID
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
buildid-validate-page-backed-file-before-parsing-build-id.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Jinchao Wang <wangjinchao600(a)gmail.com>
Subject: buildid: validate page-backed file before parsing build ID
Date: Tue, 23 Dec 2025 18:32:07 +0800
__build_id_parse() only works on page-backed storage. Its helper paths
eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
that do not map a regular file or lack valid address_space operations.
Link: https://lkml.kernel.org/r/20251223103214.2412446-1-wangjinchao600@gmail.com
Fixes: ad41251c290d ("lib/buildid: implement sleepable build_id_parse() API")
Signed-off-by: Jinchao Wang <wangjinchao600(a)gmail.com>
Reported-by: <syzbot+e008db2ac01e282550ee(a)syzkaller.appspotmail.com>
Tested-by: <syzbot+e008db2ac01e282550ee(a)syzkaller.appspotmail.com>
Link: https://lkml.kernel.org/r/694a67ab.050a0220.19928e.001c.GAE@google.com
Closes: https://lkml.kernel.org/r/693540fe.a70a0220.38f243.004c.GAE@google.com
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand (Red Hat) <david(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Yuanchu Xie <yuanchu(a)google.com>
Cc: Andrii Nakryiko <andrii(a)kernel.org>
Cc: Eduard Zingerman <eddyz87(a)gmail.com>
Cc: Omar Sandoval <osandov(a)fb.com>
Cc: Deepanshu Kartikey <kartikey406(a)gmail.com>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkman <daniel(a)iogearbox.net>
Cc: Hao Luo <haoluo(a)google.com>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: John Fastabend <john.fastabend(a)gmail.com>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Martin KaFai Lau <martin.lau(a)linux.dev>
Cc: Song Liu <song(a)kernel.org>
Cc: Stanislav Fomichev <sdf(a)fomichev.me>
Cc: Yonghong Song <yonghong.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/buildid.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/lib/buildid.c~buildid-validate-page-backed-file-before-parsing-build-id
+++ a/lib/buildid.c
@@ -288,7 +288,10 @@ static int __build_id_parse(struct vm_ar
int ret;
/* only works for page backed storage */
- if (!vma->vm_file)
+ if (!vma->vm_file ||
+ !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
+ !vma->vm_file->f_mapping->a_ops ||
+ !vma->vm_file->f_mapping->a_ops->read_folio)
return -EINVAL;
freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
_
Patches currently in -mm which might be from wangjinchao600(a)gmail.com are
buildid-validate-page-backed-file-before-parsing-build-id.patch
There is a use-after-free error in cfg80211_shutdown_all_interfaces found
by syzkaller:
BUG: KASAN: use-after-free in cfg80211_shutdown_all_interfaces+0x213/0x220
Read of size 8 at addr ffff888112a78d98 by task kworker/0:5/5326
CPU: 0 UID: 0 PID: 5326 Comm: kworker/0:5 Not tainted 6.19.0-rc2 #2 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events cfg80211_rfkill_block_work
Call Trace:
<TASK>
dump_stack_lvl+0x116/0x1f0
print_report+0xcd/0x630
kasan_report+0xe0/0x110
cfg80211_shutdown_all_interfaces+0x213/0x220
cfg80211_rfkill_block_work+0x1e/0x30
process_one_work+0x9cf/0x1b70
worker_thread+0x6c8/0xf10
kthread+0x3c5/0x780
ret_from_fork+0x56d/0x700
ret_from_fork_asm+0x1a/0x30
</TASK>
The problem arises due to the rfkill_block work is not cancelled when
cfg80211 device is being freed. In order to fix the issue cancel the
corresponding work before destroying rfkill in cfg80211_dev_free().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1f87f7d3a3b4 ("cfg80211: add rfkill support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniil Dulov <d.dulov(a)aladdin.ru>
---
net/wireless/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 54a34d8d356e..e94f69205f50 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1226,6 +1226,7 @@ void cfg80211_dev_free(struct cfg80211_registered_device *rdev)
spin_unlock_irqrestore(&rdev->wiphy_work_lock, flags);
cancel_work_sync(&rdev->wiphy_work);
+ cancel_work_sync(&rdev->rfkill_block);
rfkill_destroy(rdev->wiphy.rfkill);
list_for_each_entry_safe(reg, treg, &rdev->beacon_registrations, list) {
list_del(®->list);
--
2.34.1
From: Rafael Beims <rafael.beims(a)toradex.com>
After U-Boot initializes PCIe with "pcie enum", Linux fails to detect
an NVMe disk on some boot cycles with:
phy phy-32f00000.pcie-phy.0: phy poweron failed --> -110
Discussion with NXP identified that the iMX8MP PCIe PHY PLL may fail to
lock when re-initialized without a reset cycle [1].
The issue reproduces on 7% of tested hardware platforms, with a 30-40%
failure rate per affected device across boot cycles.
Insert a reset cycle in the power-on routine to ensure the PHY is
initialized from a known state.
[1] https://community.nxp.com/t5/i-MX-Processors/iMX8MP-PCIe-initialization-in-…
Signed-off-by: Rafael Beims <rafael.beims(a)toradex.com>
Cc: stable(a)vger.kernel.org
---
drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
index 68fcc8114d75..7f5600103a00 100644
--- a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
+++ b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
@@ -89,7 +89,8 @@ static int imx8_pcie_phy_power_on(struct phy *phy)
writel(imx8_phy->tx_deemph_gen2,
imx8_phy->base + PCIE_PHY_TRSV_REG6);
break;
- case IMX8MP: /* Do nothing. */
+ case IMX8MP:
+ reset_control_assert(imx8_phy->reset);
break;
}
--
2.51.0
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
Both rz_dmac_disable_hw() and rz_dmac_irq_handle_channel() update the
CHCTRL register. To avoid concurrency issues when configuring
functionalities exposed by this registers, take the virtual channel lock.
All other CHCTRL updates were already protected by the same lock.
Previously, rz_dmac_disable_hw() disabled and re-enabled local IRQs, before
accessing CHCTRL registers but this does not ensure race-free access.
Remove the local IRQ disable/enable code as well.
Fixes: 5000d37042a6 ("dmaengine: sh: Add DMAC driver for RZ/G2L SoC")
Cc: stable(a)vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
Changes in v6:
- update patch title and description
- in rz_dmac_irq_handle_channel() lock only around the
updates for the error path and continued using the vc lock
as this is the error path and the channel will anyway be
stopped; this avoids updating the code with another lock
as it was suggested in the review process of v5 and the code
remain simpler for a fix, w/o any impact on performance
Changes in v5:
- none, this patch is new
drivers/dma/sh/rz-dmac.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/dma/sh/rz-dmac.c b/drivers/dma/sh/rz-dmac.c
index c8e3d9f77b8a..818d1ef6f0bf 100644
--- a/drivers/dma/sh/rz-dmac.c
+++ b/drivers/dma/sh/rz-dmac.c
@@ -298,13 +298,10 @@ static void rz_dmac_disable_hw(struct rz_dmac_chan *channel)
{
struct dma_chan *chan = &channel->vc.chan;
struct rz_dmac *dmac = to_rz_dmac(chan->device);
- unsigned long flags;
dev_dbg(dmac->dev, "%s channel %d\n", __func__, channel->index);
- local_irq_save(flags);
rz_dmac_ch_writel(channel, CHCTRL_DEFAULT, CHCTRL, 1);
- local_irq_restore(flags);
}
static void rz_dmac_set_dmars_register(struct rz_dmac *dmac, int nr, u32 dmars)
@@ -569,8 +566,8 @@ static int rz_dmac_terminate_all(struct dma_chan *chan)
unsigned int i;
LIST_HEAD(head);
- rz_dmac_disable_hw(channel);
spin_lock_irqsave(&channel->vc.lock, flags);
+ rz_dmac_disable_hw(channel);
for (i = 0; i < DMAC_NR_LMDESC; i++)
lmdesc[i].header = 0;
@@ -707,7 +704,9 @@ static void rz_dmac_irq_handle_channel(struct rz_dmac_chan *channel)
if (chstat & CHSTAT_ER) {
dev_err(dmac->dev, "DMAC err CHSTAT_%d = %08X\n",
channel->index, chstat);
- rz_dmac_ch_writel(channel, CHCTRL_DEFAULT, CHCTRL, 1);
+
+ scoped_guard(spinlock_irqsave, &channel->vc.lock)
+ rz_dmac_ch_writel(channel, CHCTRL_DEFAULT, CHCTRL, 1);
goto done;
}
--
2.43.0
The local variable 'sensitivity' was never clamped to 0 or
POWERSAVE_BIAS_MAX because the return value of clamp() was not used. Fix
this by assigning the clamped value back to 'sensitivity'.
Cc: stable(a)vger.kernel.org
Fixes: 9c5320c8ea8b ("cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/cpufreq/amd_freq_sensitivity.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
index 13fed4b9e02b..713ccf24c97d 100644
--- a/drivers/cpufreq/amd_freq_sensitivity.c
+++ b/drivers/cpufreq/amd_freq_sensitivity.c
@@ -76,7 +76,7 @@ static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
sensitivity = POWERSAVE_BIAS_MAX -
(POWERSAVE_BIAS_MAX * (d_reference - d_actual) / d_reference);
- clamp(sensitivity, 0, POWERSAVE_BIAS_MAX);
+ sensitivity = clamp(sensitivity, 0, POWERSAVE_BIAS_MAX);
/* this workload is not CPU bound, so choose a lower freq */
if (sensitivity < od_tuners->powersave_bias) {
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
When ECAM is enabled, the driver skipped calling dw_pcie_iatu_setup()
before configuring ECAM iATU entries. This left IO and MEM outbound
windows unprogrammed, resulting in broken IO transactions. Additionally,
dw_pcie_config_ecam_iatu() was only called during host initialization,
so ECAM-related iATU entries were not restored after suspend/resume,
leading to failures in configuration space access.
To resolve these issues, the ECAM iATU configuration is moved into
dw_pcie_setup_rc(). At the same time, dw_pcie_iatu_setup() is invoked
when ECAM is enabled.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
---
Krishna Chaitanya Chundru (2):
PCI: dwc: Correct iATU index increment for MSG TLP region
PCI: dwc: Fix missing iATU setup when ECAM is enabled
drivers/pci/controller/dwc/pcie-designware-host.c | 37 ++++++++++++++---------
drivers/pci/controller/dwc/pcie-designware.c | 3 ++
drivers/pci/controller/dwc/pcie-designware.h | 2 +-
3 files changed, 26 insertions(+), 16 deletions(-)
---
base-commit: 3f9f0252130e7dd60d41be0802bf58f6471c691d
change-id: 20251203-ecam_io_fix-6e060fecd3b8
Best regards,
--
Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
Changing the enable/disable sequence in commit c9b1150a68d9
("drm/atomic-helper: Re-order bridge chain pre-enable and post-disable")
has caused regressions on multiple platforms: R-Car, MCDE, Rockchip.
This is an alternate series to Linus' series:
https://lore.kernel.org/all/20251202-mcde-drm-regression-thirdfix-v6-0-f1bf…
This series first reverts the original commit and reverts a fix for
mediatek which is no longer needed. It then exposes helper functions
from DRM core, and finally implements the new sequence only in the tidss
driver.
There is one more fix in upstream for the original commit, commit
5d91394f2361 ("drm/exynos: fimd: Guard display clock control with
runtime PM calls"), but I have not reverted that one as it looks like a
valid patch in its own.
I added Cc stable v6.17+ to all patches, but I didn't add Fixes tags, as
I wasn't sure what should they point to. But I could perhaps add Fixes:
<original commit> to all of these.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
---
Linus Walleij (1):
drm/atomic-helper: Export and namespace some functions
Tomi Valkeinen (3):
Revert "drm/atomic-helper: Re-order bridge chain pre-enable and post-disable"
Revert "drm/mediatek: dsi: Fix DSI host and panel bridge pre-enable order"
drm/tidss: Fix enable/disable order
drivers/gpu/drm/drm_atomic_helper.c | 122 ++++++++++++++----
drivers/gpu/drm/mediatek/mtk_dsi.c | 6 -
drivers/gpu/drm/tidss/tidss_kms.c | 30 ++++-
include/drm/drm_atomic_helper.h | 22 ++++
include/drm/drm_bridge.h | 249 ++++++++++--------------------------
5 files changed, 214 insertions(+), 215 deletions(-)
---
base-commit: 88e721ab978a86426aa08da520de77430fa7bb84
change-id: 20251205-drm-seq-fix-b4ed1f56604b
Best regards,
--
Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Fix a memory leak in gpi_peripheral_config() where the original memory
pointed to by gchan->config could be lost if krealloc() fails.
The issue occurs when:
1. gchan->config points to previously allocated memory
2. krealloc() fails and returns NULL
3. The function directly assigns NULL to gchan->config, losing the
reference to the original memory
4. The original memory becomes unreachable and cannot be freed
Fix this by using a temporary variable to hold the krealloc() result
and only updating gchan->config when the allocation succeeds.
Found via static analysis and code review.
Fixes: 5d0c3533a19f ("dmaengine: qcom: Add GPI dma driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
---
drivers/dma/qcom/gpi.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c
index 8e87738086b2..8908b7c71900 100644
--- a/drivers/dma/qcom/gpi.c
+++ b/drivers/dma/qcom/gpi.c
@@ -1605,14 +1605,16 @@ static int
gpi_peripheral_config(struct dma_chan *chan, struct dma_slave_config *config)
{
struct gchan *gchan = to_gchan(chan);
+ void *new_config;
if (!config->peripheral_config)
return -EINVAL;
- gchan->config = krealloc(gchan->config, config->peripheral_size, GFP_NOWAIT);
- if (!gchan->config)
+ new_config = krealloc(gchan->config, config->peripheral_size, GFP_NOWAIT);
+ if (!new_config)
return -ENOMEM;
+ gchan->config = new_config;
memcpy(gchan->config, config->peripheral_config, config->peripheral_size);
return 0;
--
2.39.5 (Apple Git-154)
From: Sumeet Pawnikar <sumeet4linux(a)gmail.com>
[ Upstream commit efc4c35b741af973de90f6826bf35d3b3ac36bf1 ]
Fix inconsistent error handling for sscanf() return value check.
Implicit boolean conversion is used instead of explicit return
value checks. The code checks if (!sscanf(...)) which is incorrect
because:
1. sscanf returns the number of successfully parsed items
2. On success, it returns 1 (one item passed)
3. On failure, it returns 0 or EOF
4. The check 'if (!sscanf(...))' is wrong because it treats
success (1) as failure
All occurrences of sscanf() now uses explicit return value check.
With this behavior it returns '-EINVAL' when parsing fails (returns
0 or EOF), and continues when parsing succeeds (returns 1).
Signed-off-by: Sumeet Pawnikar <sumeet4linux(a)gmail.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20251207151549.202452-1-sumeet4linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
Looking at this commit carefully to assess its suitability for stable
backporting.
## Commit Analysis
### 1. COMMIT MESSAGE ANALYSIS
The commit fixes incorrect error handling for `sscanf()` return value
checks in the powercap subsystem. The message claims that `if
(!sscanf(...))` is wrong because:
- `sscanf()` returns the number of successfully parsed items
- On success: returns 1
- On failure: returns 0 or EOF (typically -1)
### 2. CODE CHANGE ANALYSIS
Three identical changes in `drivers/powercap/powercap_sys.c`:
```c
// Before:
if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id))
// After:
if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1)
```
Let me trace through the bug:
- When `sscanf()` returns 0 (no match): `!0` = 1 (true) → error path
taken ✓
- When `sscanf()` returns 1 (success): `!1` = 0 (false) → continues ✓
- When `sscanf()` returns EOF (-1): `!(-1)` = 0 (false) → **error path
NOT taken** ✗
The critical issue is the EOF case. If `sscanf()` returns EOF (e.g.,
empty string input), the check passes incorrectly and execution
continues with an **uninitialized `id` variable**. This then leads to:
```c
if (id >= power_zone->const_id_cnt) // garbage comparison
return -EINVAL;
pconst = &power_zone->constraints[id]; // POTENTIAL OUT-OF-BOUNDS
ACCESS
```
### 3. CLASSIFICATION
This is a **bug fix** that prevents:
- Potential out-of-bounds array access
- Use of uninitialized variable
- Possible kernel crash or memory corruption in edge cases
Not a feature addition, code cleanup, or optimization.
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed**: 3 lines (identical pattern)
- **Files affected**: 1 file
- **Risk**: Extremely low - the change only makes the check stricter and
more explicit
- **Could break anything?**: No - the new check `!= 1` is strictly more
conservative than `!`
### 5. USER IMPACT
The powercap subsystem manages:
- Intel RAPL power capping
- Power domain constraints
- Used by tools like powertop, thermald
While the EOF triggering condition is rare (would require malformed
attribute names), the consequence (OOB access) could be severe.
### 6. STABILITY INDICATORS
- Signed-off by Rafael J. Wysocki (Intel power management maintainer)
- Self-contained fix with no dependencies
- The powercap subsystem has existed since kernel 3.13
### 7. DEPENDENCY CHECK
- No dependencies on other commits
- Simple standalone fix
- Code exists in all active stable trees
## Summary
**What it fixes**: Incorrect sscanf error handling that could allow
execution with uninitialized data, potentially leading to out-of-bounds
array access.
**Why it matters for stable**: While the triggering condition (EOF from
sscanf) is rare, the fix prevents a potential memory safety issue. The
change is trivially correct with zero regression risk.
**Meets stable criteria**:
- ✓ Obviously correct (explicit `!= 1` check is cleaner and more robust)
- ✓ Fixes a real bug (uninitialized variable use, potential OOB access)
- ✓ Small and contained (3 identical one-line changes)
- ✓ No new features
- ✓ No dependencies
**Risk vs benefit**: The benefit (preventing potential memory
corruption) outweighs the near-zero risk of regression.
**YES**
drivers/powercap/powercap_sys.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
index d14b36b75189..1ff369880beb 100644
--- a/drivers/powercap/powercap_sys.c
+++ b/drivers/powercap/powercap_sys.c
@@ -68,7 +68,7 @@ static ssize_t show_constraint_##_attr(struct device *dev, \
int id; \
struct powercap_zone_constraint *pconst;\
\
- if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
+ if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) \
return -EINVAL; \
if (id >= power_zone->const_id_cnt) \
return -EINVAL; \
@@ -93,7 +93,7 @@ static ssize_t store_constraint_##_attr(struct device *dev,\
int id; \
struct powercap_zone_constraint *pconst;\
\
- if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
+ if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) \
return -EINVAL; \
if (id >= power_zone->const_id_cnt) \
return -EINVAL; \
@@ -162,7 +162,7 @@ static ssize_t show_constraint_name(struct device *dev,
ssize_t len = -ENODATA;
struct powercap_zone_constraint *pconst;
- if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id))
+ if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1)
return -EINVAL;
if (id >= power_zone->const_id_cnt)
return -EINVAL;
--
2.51.0
When ports are defined in the tcpc main node, fwnode_usb_role_switch_get
returns an error, meaning usb_role_switch_get (which would succeed)
never gets a chance to run as port->role_sw isn't NULL, causing a
regression on devices where this is the case.
Fix this by turning the NULL check into IS_ERR_OR_NULL, so
usb_role_switch_get can actually run and the device get properly probed.
Fixes: 2d8713f807a4 ("tcpm: switch check for role_sw device with fw_node")
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnaud Ferraris <arnaud.ferraris(a)collabora.com>
---
drivers/usb/typec/tcpm/tcpm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index cc78770509dbc..37698204d48d2 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -7877,7 +7877,7 @@ struct tcpm_port *tcpm_register_port(struct device *dev, struct tcpc_dev *tcpc)
port->partner_desc.identity = &port->partner_ident;
port->role_sw = fwnode_usb_role_switch_get(tcpc->fwnode);
- if (!port->role_sw)
+ if (IS_ERR_OR_NULL(port->role_sw))
port->role_sw = usb_role_switch_get(port->dev);
if (IS_ERR(port->role_sw)) {
err = PTR_ERR(port->role_sw);
---
base-commit: 765e56e41a5af2d456ddda6cbd617b9d3295ab4e
change-id: 20251127-fix-ppp-power-6d47f3a746f8
Best regards,
--
Arnaud Ferraris <arnaud.ferraris(a)collabora.com>
From: yangshiguang <yangshiguang(a)xiaomi.com>
Check in debugfs_read_file_str() if the string pointer is NULL.
When creating a node using debugfs_create_str(), the string parameter
value can be NULL to indicate empty/unused/ignored.
However, reading this node using debugfs_read_file_str() will cause a
kernel panic.
This should not be fatal, so return an invalid error.
Signed-off-by: yangshiguang <yangshiguang(a)xiaomi.com>
Fixes: 9af0440ec86e ("debugfs: Implement debugfs_create_str()")
Cc: stable(a)vger.kernel.org
---
fs/debugfs/file.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 3ec3324c2060..a22ff0ceb230 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -1026,6 +1026,9 @@ ssize_t debugfs_read_file_str(struct file *file, char __user *user_buf,
return ret;
str = *(char **)file->private_data;
+ if (!str)
+ return -EINVAL;
+
len = strlen(str) + 1;
copy = kmalloc(len, GFP_KERNEL);
if (!copy) {
--
2.43.0
From: Oscar Maes <oscmaes92(a)gmail.com>
[ Upstream commit 5189446ba995556eaa3755a6e875bc06675b88bd ]
Commit 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes")
introduced a regression where local-broadcast packets would have their
gateway set in __mkroute_output, which was caused by fi = NULL being
removed.
Fix this by resetting the fib_info for local-broadcast packets. This
preserves the intended changes for directed-broadcast packets.
Cc: stable(a)vger.kernel.org
Fixes: 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes")
Reported-by: Brett A C Sheffield <bacs(a)librecast.net>
Closes: https://lore.kernel.org/regressions/20250822165231.4353-4-bacs@librecast.net
Signed-off-by: Oscar Maes <oscmaes92(a)gmail.com>
Reviewed-by: David Ahern <dsahern(a)kernel.org>
Link: https://patch.msgid.link/20250827062322.4807-1-oscmaes92@gmail.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/ipv4/route.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9a5c9497b393..261ddb6542a4 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2532,12 +2532,16 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
!netif_is_l3_master(dev_out))
return ERR_PTR(-EINVAL);
- if (ipv4_is_lbcast(fl4->daddr))
+ if (ipv4_is_lbcast(fl4->daddr)) {
type = RTN_BROADCAST;
- else if (ipv4_is_multicast(fl4->daddr))
+
+ /* reset fi to prevent gateway resolution */
+ fi = NULL;
+ } else if (ipv4_is_multicast(fl4->daddr)) {
type = RTN_MULTICAST;
- else if (ipv4_is_zeronet(fl4->daddr))
+ } else if (ipv4_is_zeronet(fl4->daddr)) {
return ERR_PTR(-EINVAL);
+ }
if (dev_out->flags & IFF_LOOPBACK)
flags |= RTCF_LOCAL;
From: Josef Bacik <josef(a)toxicpanda.com>
[ Upstream commit 8cbc3001a3264d998d6b6db3e23f935c158abd4d ]
The submit helper will always run bio_endio() on the bio if it fails to
submit, so cleaning up the bio just leads to a variety of use-after-free
and NULL pointer dereference bugs because we race with the endio
function that is cleaning up the bio. Instead just return BLK_STS_OK as
the repair function has to continue to process the rest of the pages,
and the endio for the repair bio will do the appropriate cleanup for the
page that it was given.
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[Minor context change fixed.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Keerthana: Backported the patch to v5.10.y ]
Signed-off-by: Keerthana K <keerthana.kalyanasundaram(a)broadcom.com>
---
fs/btrfs/extent_io.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 489d370ddd60..3d0b854e0c19 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2655,7 +2655,6 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
bool need_validation;
struct bio *repair_bio;
struct btrfs_io_bio *repair_io_bio;
- blk_status_t status;
btrfs_debug(fs_info,
"repair read error: read error at %llu", start);
@@ -2699,13 +2698,13 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
"repair read error: submitting new read to mirror %d, in_validation=%d",
failrec->this_mirror, failrec->in_validation);
- status = submit_bio_hook(inode, repair_bio, failrec->this_mirror,
- failrec->bio_flags);
- if (status) {
- free_io_failure(failure_tree, tree, failrec);
- bio_put(repair_bio);
- }
- return status;
+ /*
+ * At this point we have a bio, so any errors from submit_bio_hook()
+ * will be handled by the endio on the repair_bio, so we can't return an
+ * error here.
+ */
+ submit_bio_hook(inode, repair_bio, failrec->this_mirror, failrec->bio_flags);
+ return BLK_STS_OK;
}
/* lots and lots of room for performance fixes in the end_bio funcs */
--
2.43.7
Hello,
I recently used the patch misc: rtsx_pci: Add separate CD/WP pin
polarity reversal support with commit ID 807221d, to fix a bug causing
the cardreader driver to always load sd cards in read-only mode.
On the suggestion of the driver maintainer, I am requesting that this
patch be applied to all stable kernel versions, as it is currently
only applied to >=6.18.
Thanks,
JP