From: Yuezhang Mo <Yuezhang.Mo(a)sony.com>
[ Upstream commit 99f9a97dce39ad413c39b92c90393bbd6778f3fd ]
An infinite loop may occur if the following conditions occur due to
file system corruption.
(1) Condition for exfat_count_dir_entries() to loop infinitely.
- The cluster chain includes a loop.
- There is no UNUSED entry in the cluster chain.
(2) Condition for exfat_create_upcase_table() to loop infinitely.
- The cluster chain of the root directory includes a loop.
- There are no UNUSED entry and up-case table entry in the cluster
chain of the root directory.
(3) Condition for exfat_load_bitmap() to loop infinitely.
- The cluster chain of the root directory includes a loop.
- There are no UNUSED entry and bitmap entry in the cluster chain
of the root directory.
(4) Condition for exfat_find_dir_entry() to loop infinitely.
- The cluster chain includes a loop.
- The unused directory entries were exhausted by some operation.
(5) Condition for exfat_check_dir_empty() to loop infinitely.
- The cluster chain includes a loop.
- The unused directory entries were exhausted by some operation.
- All files and sub-directories under the directory are deleted.
This commit adds checks to break the above infinite loop.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo(a)sony.com>
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees for the
following critical reasons:
## Security Impact - Prevents Multiple Infinite Loop Vulnerabilities
The commit fixes **five distinct infinite loop conditions** that can
occur due to filesystem corruption, each representing a potential
denial-of-service vulnerability:
1. **exfat_count_dir_entries()** - Adds loop detection using
`sbi->used_clusters` as a bound
2. **exfat_create_upcase_table()** - Addressed through root directory
chain validation
3. **exfat_load_bitmap()** - Addressed through root directory chain
validation
4. **exfat_find_dir_entry()** - Adds loop detection using
`EXFAT_DATA_CLUSTER_COUNT(sbi)`
5. **exfat_check_dir_empty()** - Adds loop detection using
`EXFAT_DATA_CLUSTER_COUNT(sbi)`
## Critical Bug Fix Characteristics
1. **Fixes Real Security Issues**: The infinite loops can cause system
hangs and DoS conditions when mounting corrupted/malicious exFAT
filesystems
2. **Small, Contained Changes**: The fix adds simple counter checks (4-5
lines per location) without architectural changes
3. **Clear Root Cause**: Addresses missing validation of cluster chain
loops in directory traversal
4. **Pattern of Similar Fixes**: This follows three previous infinite
loop fixes in the same subsystem (commits b0522303f672, a5324b3a488d,
fee873761bd9), all of which fix similar issues dating back to the
original exfat implementation
## Code Analysis Shows Low Risk
The changes are minimal and safe:
- Adds `unsigned int clu_count = 0` declarations
- Increments counter when following cluster chains
- Breaks traversal if counter exceeds valid cluster count
- In `exfat_count_num_clusters()`: adds explicit loop detection with
error message
## Follows Stable Kernel Rules
✓ Fixes critical bugs (infinite loops/DoS)
✓ Minimal code changes (~50 lines total)
✓ No new features or API changes
✓ Similar fixes already backported (the three previous infinite loop
fixes)
✓ Clear error conditions with proper error returns (-EIO)
The commit message explicitly states these are corruption-triggered
infinite loops, and the pattern matches previous fixes that have
"Fixes:" tags pointing to the original exfat implementation. This is a
critical reliability and security fix that prevents system hangs when
handling corrupted exFAT filesystems.
fs/exfat/dir.c | 12 ++++++++++++
fs/exfat/fatent.c | 10 ++++++++++
fs/exfat/namei.c | 5 +++++
fs/exfat/super.c | 32 +++++++++++++++++++++-----------
4 files changed, 48 insertions(+), 11 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 3103b932b674..ee060e26f51d 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -996,6 +996,7 @@ int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei,
struct exfat_hint_femp candi_empty;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
int num_entries = exfat_calc_num_entries(p_uniname);
+ unsigned int clu_count = 0;
if (num_entries < 0)
return num_entries;
@@ -1133,6 +1134,10 @@ int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei,
} else {
if (exfat_get_next_cluster(sb, &clu.dir))
return -EIO;
+
+ /* break if the cluster chain includes a loop */
+ if (unlikely(++clu_count > EXFAT_DATA_CLUSTER_COUNT(sbi)))
+ goto not_found;
}
}
@@ -1195,6 +1200,7 @@ int exfat_count_dir_entries(struct super_block *sb, struct exfat_chain *p_dir)
int i, count = 0;
int dentries_per_clu;
unsigned int entry_type;
+ unsigned int clu_count = 0;
struct exfat_chain clu;
struct exfat_dentry *ep;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -1227,6 +1233,12 @@ int exfat_count_dir_entries(struct super_block *sb, struct exfat_chain *p_dir)
} else {
if (exfat_get_next_cluster(sb, &(clu.dir)))
return -EIO;
+
+ if (unlikely(++clu_count > sbi->used_clusters)) {
+ exfat_fs_error(sb, "FAT or bitmap is corrupted");
+ return -EIO;
+ }
+
}
}
diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
index 23065f948ae7..232cc7f8ab92 100644
--- a/fs/exfat/fatent.c
+++ b/fs/exfat/fatent.c
@@ -490,5 +490,15 @@ int exfat_count_num_clusters(struct super_block *sb,
}
*ret_count = count;
+
+ /*
+ * since exfat_count_used_clusters() is not called, sbi->used_clusters
+ * cannot be used here.
+ */
+ if (unlikely(i == sbi->num_clusters && clu != EXFAT_EOF_CLUSTER)) {
+ exfat_fs_error(sb, "The cluster chain has a loop");
+ return -EIO;
+ }
+
return 0;
}
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index fede0283d6e2..f5f1c4e8a29f 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -890,6 +890,7 @@ static int exfat_check_dir_empty(struct super_block *sb,
{
int i, dentries_per_clu;
unsigned int type;
+ unsigned int clu_count = 0;
struct exfat_chain clu;
struct exfat_dentry *ep;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -926,6 +927,10 @@ static int exfat_check_dir_empty(struct super_block *sb,
} else {
if (exfat_get_next_cluster(sb, &(clu.dir)))
return -EIO;
+
+ /* break if the cluster chain includes a loop */
+ if (unlikely(++clu_count > EXFAT_DATA_CLUSTER_COUNT(sbi)))
+ break;
}
}
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 7ed858937d45..3a9ec75ab452 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -341,13 +341,12 @@ static void exfat_hash_init(struct super_block *sb)
INIT_HLIST_HEAD(&sbi->inode_hashtable[i]);
}
-static int exfat_read_root(struct inode *inode)
+static int exfat_read_root(struct inode *inode, struct exfat_chain *root_clu)
{
struct super_block *sb = inode->i_sb;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
struct exfat_inode_info *ei = EXFAT_I(inode);
- struct exfat_chain cdir;
- int num_subdirs, num_clu = 0;
+ int num_subdirs;
exfat_chain_set(&ei->dir, sbi->root_dir, 0, ALLOC_FAT_CHAIN);
ei->entry = -1;
@@ -360,12 +359,9 @@ static int exfat_read_root(struct inode *inode)
ei->hint_stat.clu = sbi->root_dir;
ei->hint_femp.eidx = EXFAT_HINT_NONE;
- exfat_chain_set(&cdir, sbi->root_dir, 0, ALLOC_FAT_CHAIN);
- if (exfat_count_num_clusters(sb, &cdir, &num_clu))
- return -EIO;
- i_size_write(inode, num_clu << sbi->cluster_size_bits);
+ i_size_write(inode, EXFAT_CLU_TO_B(root_clu->size, sbi));
- num_subdirs = exfat_count_dir_entries(sb, &cdir);
+ num_subdirs = exfat_count_dir_entries(sb, root_clu);
if (num_subdirs < 0)
return -EIO;
set_nlink(inode, num_subdirs + EXFAT_MIN_SUBDIR);
@@ -578,7 +574,8 @@ static int exfat_verify_boot_region(struct super_block *sb)
}
/* mount the file system volume */
-static int __exfat_fill_super(struct super_block *sb)
+static int __exfat_fill_super(struct super_block *sb,
+ struct exfat_chain *root_clu)
{
int ret;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -595,6 +592,18 @@ static int __exfat_fill_super(struct super_block *sb)
goto free_bh;
}
+ /*
+ * Call exfat_count_num_cluster() before searching for up-case and
+ * bitmap directory entries to avoid infinite loop if they are missing
+ * and the cluster chain includes a loop.
+ */
+ exfat_chain_set(root_clu, sbi->root_dir, 0, ALLOC_FAT_CHAIN);
+ ret = exfat_count_num_clusters(sb, root_clu, &root_clu->size);
+ if (ret) {
+ exfat_err(sb, "failed to count the number of clusters in root");
+ goto free_bh;
+ }
+
ret = exfat_create_upcase_table(sb);
if (ret) {
exfat_err(sb, "failed to load upcase table");
@@ -627,6 +636,7 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
struct exfat_sb_info *sbi = sb->s_fs_info;
struct exfat_mount_options *opts = &sbi->options;
struct inode *root_inode;
+ struct exfat_chain root_clu;
int err;
if (opts->allow_utime == (unsigned short)-1)
@@ -645,7 +655,7 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_time_min = EXFAT_MIN_TIMESTAMP_SECS;
sb->s_time_max = EXFAT_MAX_TIMESTAMP_SECS;
- err = __exfat_fill_super(sb);
+ err = __exfat_fill_super(sb, &root_clu);
if (err) {
exfat_err(sb, "failed to recognize exfat type");
goto check_nls_io;
@@ -680,7 +690,7 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
root_inode->i_ino = EXFAT_ROOT_INO;
inode_set_iversion(root_inode, 1);
- err = exfat_read_root(root_inode);
+ err = exfat_read_root(root_inode, &root_clu);
if (err) {
exfat_err(sb, "failed to initialize root inode");
goto put_inode;
--
2.39.5
From: Anton Nadezhdin <anton.nadezhdin(a)intel.com>
commit a5a441ae283d54ec329aadc7426991dc32786d52 upstream.
Set use_nsecs=true as timestamp is reported in ns. Lack of this result
in smaller timestamp error window which cause error during phc2sys
execution on E825 NICs:
phc2sys[1768.256]: ioctl PTP_SYS_OFFSET_PRECISE: Invalid argument
This problem was introduced in the cited commit which omitted setting
use_nsecs to true when converting the ice driver to use
convert_base_to_cs().
Testing hints (ethX is PF netdev):
phc2sys -s ethX -c CLOCK_REALTIME -O 37 -m
phc2sys[1769.256]: CLOCK_REALTIME phc offset -5 s0 freq -0 delay 0
Fixes: d4bea547ebb57 ("ice/ptp: Remove convert_art_to_tsc()")
Signed-off-by: Anton Nadezhdin <anton.nadezhdin(a)intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski(a)intel.com>
Tested-by: Rinitha S <sx.rinitha(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Markus Blöchl <markus(a)blochl.de>
---
Hi Greg,
please consider this backport for linux-6.12.y
It fixes a regression from the series around
d4bea547ebb57 ("ice/ptp: Remove convert_art_to_tsc()")
which affected multiple drivers and occasionally
caused phc2sys to fail on ioctl(fd, PTP_SYS_OFFSET_PRECISE, ...).
This was the initial fix for ice but apparently tagging it
for stable was forgotten during submission.
The hunk was moved around slightly in the upstream commit
92456e795ac6 ("ice: Add unified ice_capture_crosststamp")
from ice_ptp_get_syncdevicetime() into another helper function
ice_capture_crosststamp() so its indentation and context have changed.
I adapted it to apply cleanly.
---
Changes in v2:
- Expand reference to upstream commit to full 40 character SHA
- Add branch 6.12 target designator to PATCH prefix
- Rebase onto current 6.12.41
- Link to v1: https://lore.kernel.org/r/20250725-ice_crosstimestamp_reporting-v1-1-3d0473…
---
drivers/net/ethernet/intel/ice/ice_ptp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index 7c6f81beaee4602050b4cf366441a2584507d949..369c968a0117d0f7012241fd3e2c0a45a059bfa4 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -2226,6 +2226,7 @@ ice_ptp_get_syncdevicetime(ktime_t *device,
hh_ts = ((u64)hh_ts_hi << 32) | hh_ts_lo;
system->cycles = hh_ts;
system->cs_id = CSID_X86_ART;
+ system->use_nsecs = true;
/* Read Device source clock time */
hh_ts_lo = rd32(hw, GLTSYN_HHTIME_L(tmr_idx));
hh_ts_hi = rd32(hw, GLTSYN_HHTIME_H(tmr_idx));
---
base-commit: 8f5ff9784f3262e6e85c68d86f8b7931827f2983
change-id: 20250716-ice_crosstimestamp_reporting-b6236a246c48
Best regards,
--
Markus Blöchl <markus(a)blochl.de>
--
When handling non-swap entries in move_pages_pte(), the error handling
for entries that are NOT migration entries fails to unmap the page table
entries before jumping to the error handling label.
This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
triggers a WARNING in kunmap_local_indexed() because the kmap stack is
corrupted.
Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
Call trace:
kunmap_local_indexed from move_pages+0x964/0x19f4
move_pages from userfaultfd_ioctl+0x129c/0x2144
userfaultfd_ioctl from sys_ioctl+0x558/0xd24
The issue was introduced with the UFFDIO_MOVE feature but became more
frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
path more commonly executed during userfaultfd operations.
Fix this by ensuring PTEs are properly unmapped in all non-swap entry
paths before jumping to the error handling label, not just for migration
entries.
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
mm/userfaultfd.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 8253978ee0fb1..7c298e9cbc18f 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
+ src_pte = dst_pte = NULL;
if (is_migration_entry(entry)) {
- pte_unmap(src_pte);
- pte_unmap(dst_pte);
- src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
- } else
+ } else {
err = -EFAULT;
+ }
goto out;
}
--
2.39.5
Fix potential Spectre vulnerability in repoted by smatch:
warn: potential spectre issue 'vdev->hw->hws.grace_period' [w] (local cap)
warn: potential spectre issue 'vdev->hw->hws.process_grace_period' [w] (local cap)
warn: potential spectre issue 'vdev->hw->hws.process_quantum' [w] (local cap)
The priority_bands_fops_write() function in ivpu_debugfs.c uses an
index 'band' derived from user input. This index is used to write to
the vdev->hw->hws.grace_period, vdev->hw->hws.process_grace_period,
and vdev->hw->hws.process_quantum arrays.
This pattern presented a potential Spectre Variant 1 (Bounds Check
Bypass) vulnerability. An attacker-controlled 'band' value could
theoretically lead to speculative out-of-bounds array writes if the
CPU speculatively executed these assignments before the bounds check
on 'band' was fully resolved.
This commit mitigates this potential vulnerability by sanitizing the
'band' index using array_index_nospec() before it is used in the
array assignments. The array_index_nospec() function ensures that
'band' is constrained to the valid range
[0, VPU_JOB_SCHEDULING_PRIORITY_BAND_COUNT - 1], even during
speculative execution.
Fixes: 320323d2e545 ("accel/ivpu: Add debugfs interface for setting HWS priority bands")
Cc: <stable(a)vger.kernel.org> # v6.15+
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
---
drivers/accel/ivpu/ivpu_debugfs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/accel/ivpu/ivpu_debugfs.c b/drivers/accel/ivpu/ivpu_debugfs.c
index cd24ccd20ba6c..2ffe5bf8f1fab 100644
--- a/drivers/accel/ivpu/ivpu_debugfs.c
+++ b/drivers/accel/ivpu/ivpu_debugfs.c
@@ -5,6 +5,7 @@
#include <linux/debugfs.h>
#include <linux/fault-inject.h>
+#include <linux/nospec.h>
#include <drm/drm_debugfs.h>
#include <drm/drm_file.h>
@@ -464,6 +465,7 @@ priority_bands_fops_write(struct file *file, const char __user *user_buf, size_t
if (band >= VPU_JOB_SCHEDULING_PRIORITY_BAND_COUNT)
return -EINVAL;
+ band = array_index_nospec(band, VPU_JOB_SCHEDULING_PRIORITY_BAND_COUNT);
vdev->hw->hws.grace_period[band] = grace_period;
vdev->hw->hws.process_grace_period[band] = process_grace_period;
vdev->hw->hws.process_quantum[band] = process_quantum;
--
2.45.1
Commit 5304877936c0 ("NFSD: Fix strncpy() fortify warning") replaced
strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy()
already guaranteed NUL-termination of the destination buffer and
subtracting one byte potentially truncated the source string.
The incorrect size was then carried over in commit 72f78ae00a8e ("NFSD:
move from strlcpy with unused retval to strscpy") when switching from
strlcpy() to strscpy().
Fix this off-by-one error by using the full size of the destination
buffer again.
Cc: stable(a)vger.kernel.org
Fixes: 5304877936c0 ("NFSD: Fix strncpy() fortify warning")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
Changes in v2:
- Use three parameter variant of strscpy() for easier backporting
- Link to v1: https://lore.kernel.org/lkml/20250805175302.29386-2-thorsten.blum@linux.dev/
---
fs/nfsd/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 71b428efcbb5..954543e92988 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1469,7 +1469,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr,
return 0;
}
if (work) {
- strscpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1);
+ strscpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr));
refcount_set(&work->nsui_refcnt, 2);
work->nsui_busy = true;
list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list);
--
2.50.1
replace_fd() returns the number of the new file descriptor through the
return value of do_dup2(). However its callers never care about the
specific returned number. In fact the caller in receive_fd_replace() treats
any non-zero return value as an error and therefore never calls
__receive_sock() for most file descriptors, which is a bug.
To fix the bug in receive_fd_replace() and to avoid the same issue
happening in future callers, signal success through a plain zero.
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Link: https://lore.kernel.org/lkml/20250801220215.GS222315@ZenIV/
Fixes: 173817151b15 ("fs: Expand __receive_fd() to accept existing fd")
Fixes: 42eb0d54c08a ("fs: split receive_fd_replace from __receive_fd")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Make commit message slightly more precise
- Avoid double-unlock of file_lock
- Link to v2: https://lore.kernel.org/r/20250804-fix-receive_fd_replace-v2-1-ecb28c7b9129…
Changes in v2:
- Move the fix to replace_fd() (Al)
- Link to v1: https://lore.kernel.org/r/20250801-fix-receive_fd_replace-v1-1-d46d600c74d6…
---
Untested, it stuck out while reading the code.
---
fs/file.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/file.c b/fs/file.c
index 6d2275c3be9c6967d16c75d1b6521f9b58980926..80957d0813db5946ba8a635520e8283c722982b9 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1330,7 +1330,8 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
err = expand_files(files, fd);
if (unlikely(err < 0))
goto out_unlock;
- return do_dup2(files, file, fd, flags);
+ err = do_dup2(files, file, fd, flags);
+ return err < 0 ? err : 0;
out_unlock:
spin_unlock(&files->file_lock);
---
base-commit: d2eedaa3909be9102d648a4a0a50ccf64f96c54f
change-id: 20250801-fix-receive_fd_replace-7fdd5ce6532d
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Userspace generally expects APIs that return -EMSGSIZE to allow for them
to adjust their buffer size and retry the operation. However, the
fscontext log would previously clear the message even in the -EMSGSIZE
case.
Given that it is very cheap for us to check whether the buffer is too
small before we remove the message from the ring buffer, let's just do
that instead. While we're at it, refactor some fscontext_read() into a
separate helper to make the ring buffer logic a bit easier to read.
Fixes: 007ec26cdc9f ("vfs: Implement logging through fs_context")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v3:
- selftests: use EXPECT_STREQ()
- v2: <https://lore.kernel.org/r/20250806-fscontext-log-cleanups-v2-0-88e9d34d142f…>
Changes in v2:
- Refactor message fetching to fetch_message_locked() which returns
ERR_PTR() in error cases. [Al Viro]
- v1: <https://lore.kernel.org/r/20250806-fscontext-log-cleanups-v1-0-880597d42a5a…>
---
Aleksa Sarai (2):
fscontext: do not consume log entries when returning -EMSGSIZE
selftests/filesystems: add basic fscontext log tests
fs/fsopen.c | 54 +++++-----
tools/testing/selftests/filesystems/.gitignore | 1 +
tools/testing/selftests/filesystems/Makefile | 2 +-
tools/testing/selftests/filesystems/fclog.c | 130 +++++++++++++++++++++++++
4 files changed, 162 insertions(+), 25 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250806-fscontext-log-cleanups-50f0143674ae
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
As described in commit 7a54947e727b ('Merge patch series "fs: allow
changing idmappings"'), open_tree_attr(2) was necessary in order to
allow for a detached mount to be created and have its idmappings changed
without the risk of any racing threads operating on it. For this reason,
mount_setattr(2) still does not allow for id-mappings to be changed.
However, there was a bug in commit 2462651ffa76 ("fs: allow changing
idmappings") which allowed users to bypass this restriction by calling
open_tree_attr(2) *without* OPEN_TREE_CLONE.
can_idmap_mount() prevented this bug from allowing an attached
mountpoint's id-mapping from being modified (thanks to an is_anon_ns()
check), but this still allows for detached (but visible) mounts to have
their be id-mapping changed. This risks the same UAF and locking issues
as described in the merge commit, and was likely unintentional.
For what it's worth, I found this while working on the open_tree_attr(2)
man page, and was trying to figure out what open_tree_attr(2)'s
behaviour was in the (slightly fruity) ~OPEN_TREE_CLONE case.
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Aleksa Sarai (2):
open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE
selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug
fs/namespace.c | 3 +-
.../selftests/mount_setattr/mount_setattr_test.c | 77 ++++++++++++++++++----
2 files changed, 66 insertions(+), 14 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250808-open_tree_attr-bugfix-idmap-bb741166dc04
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
This commit addresses a rarely observed endpoint command timeout
which causes kernel panic due to warn when 'panic_on_warn' is enabled
and unnecessary call trace prints when 'panic_on_warn' is disabled.
It is seen during fast software-controlled connect/disconnect testcases.
The following is one such endpoint command timeout that we observed:
1. Connect
=======
->dwc3_thread_interrupt
->dwc3_ep0_interrupt
->configfs_composite_setup
->composite_setup
->usb_ep_queue
->dwc3_gadget_ep0_queue
->__dwc3_gadget_ep0_queue
->__dwc3_ep0_do_control_data
->dwc3_send_gadget_ep_cmd
2. Disconnect
==========
->dwc3_thread_interrupt
->dwc3_gadget_disconnect_interrupt
->dwc3_ep0_reset_state
->dwc3_ep0_end_control_data
->dwc3_send_gadget_ep_cmd
In the issue scenario, in Exynos platforms, we observed that control
transfers for the previous connect have not yet been completed and end
transfer command sent as a part of the disconnect sequence and
processing of USB_ENDPOINT_HALT feature request from the host timeout.
This maybe an expected scenario since the controller is processing EP
commands sent as a part of the previous connect. It maybe better to
remove WARN_ON in all places where device endpoint commands are sent to
avoid unnecessary kernel panic due to warn.
Cc: stable(a)vger.kernel.org
Signed-off-by: Akash M <akash.m5(a)samsung.com>
Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com>
---
Changes in v2:
- Removed the 'Fixes' tag from the commit message, as this patch does
not contain a fix.
- And Retained the 'stable' tag, as these changes are intended to be
applied across all stable kernels.
- Additionally, replaced 'dev_warn*' with 'dev_err*'."
Link to v1: https://lore.kernel.org/all/20250807005638.thhsgjn73aaov2af@synopsys.com/
---
drivers/usb/dwc3/ep0.c | 20 ++++++++++++++++----
drivers/usb/dwc3/gadget.c | 10 ++++++++--
2 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 666ac432f52d..b4229aa13f37 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -288,7 +288,9 @@ void dwc3_ep0_out_start(struct dwc3 *dwc)
dwc3_ep0_prepare_one_trb(dep, dwc->ep0_trb_addr, 8,
DWC3_TRBCTL_CONTROL_SETUP, false);
ret = dwc3_ep0_start_trans(dep);
- WARN_ON(ret < 0);
+ if (ret < 0)
+ dev_err(dwc->dev, "ep0 out start transfer failed: %d\n", ret);
+
for (i = 2; i < DWC3_ENDPOINTS_NUM; i++) {
struct dwc3_ep *dwc3_ep;
@@ -1061,7 +1063,9 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc,
ret = dwc3_ep0_start_trans(dep);
}
- WARN_ON(ret < 0);
+ if (ret < 0)
+ dev_err(dwc->dev,
+ "ep0 data phase start transfer failed: %d\n", ret);
}
static int dwc3_ep0_start_control_status(struct dwc3_ep *dep)
@@ -1078,7 +1082,12 @@ static int dwc3_ep0_start_control_status(struct dwc3_ep *dep)
static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep *dep)
{
- WARN_ON(dwc3_ep0_start_control_status(dep));
+ int ret;
+
+ ret = dwc3_ep0_start_control_status(dep);
+ if (ret)
+ dev_err(dwc->dev,
+ "ep0 status phase start transfer failed: %d\n", ret);
}
static void dwc3_ep0_do_control_status(struct dwc3 *dwc,
@@ -1121,7 +1130,10 @@ void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
cmd |= DWC3_DEPCMD_PARAM(dep->resource_index);
memset(¶ms, 0, sizeof(params));
ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms);
- WARN_ON_ONCE(ret);
+ if (ret)
+ dev_err_ratelimited(dwc->dev,
+ "ep0 data phase end transfer failed: %d\n", ret);
+
dep->resource_index = 0;
}
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 4a3e97e606d1..4a3d076c1015 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1772,7 +1772,11 @@ static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int
dep->flags |= DWC3_EP_DELAY_STOP;
return 0;
}
- WARN_ON_ONCE(ret);
+
+ if (ret)
+ dev_err_ratelimited(dep->dwc->dev,
+ "end transfer failed: %d\n", ret);
+
dep->resource_index = 0;
if (!interrupt)
@@ -4039,7 +4043,9 @@ static void dwc3_clear_stall_all_ep(struct dwc3 *dwc)
dep->flags &= ~DWC3_EP_STALL;
ret = dwc3_send_clear_stall_ep_cmd(dep);
- WARN_ON_ONCE(ret);
+ if (ret)
+ dev_err_ratelimited(dwc->dev,
+ "failed to clear STALL on %s\n", dep->name);
}
}
--
2.17.1
#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
- the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
- latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001
[ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5
...
[ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5
[ 1.964641] nouveau 0000:01:00.0: init failed with -5
[ 1.964681] nouveau: drm:00000000:00000080: init failed with -5
[ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5
[ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
Hi there,
I have explicitly disabled mptpcp by default on my custom kernel and
this seems to be causing the test case to fail. Even after enabling
mtpcp via sysctl command or adding an entry to /etc/sysctl.conf this
fails. I don't think this test should be failing and should account for
cases where mptcp has not been enabled by default?
These are the test logs:
$ sudo tools/testing/selftests/bpf/test_progs -t mptcp
Can't find bpf_testmod.ko kernel module: -2
WARNING! Selftests relying on bpf_testmod.ko will be skipped.
run_test:PASS:bpf_prog_attach 0 nsec
run_test:PASS:connect to fd 0 nsec
verify_tsk:PASS:bpf_map_lookup_elem 0 nsec
verify_tsk:PASS:unexpected invoked count 0 nsec
verify_tsk:PASS:unexpected is_mptcp 0 nsec
test_base:PASS:run_test tcp 0 nsec
(network_helpers.c:107: errno: Protocol not available) Failed to create
server socket
test_base:FAIL:start_mptcp_server unexpected start_mptcp_server: actual
-1 < expected 0
#178/1 mptcp/base:FAIL
test_mptcpify:PASS:test__join_cgroup 0 nsec
create_netns:PASS:ip netns add mptcp_ns 0 nsec
create_netns:PASS:ip -net mptcp_ns link set dev lo up 0 nsec
test_mptcpify:PASS:create_netns 0 nsec
run_mptcpify:PASS:skel_open_load 0 nsec
run_mptcpify:PASS:skel_attach 0 nsec
(network_helpers.c:107: errno: Protocol not available) Failed to create
server socket
run_mptcpify:FAIL:start_server unexpected start_server: actual -1 <
expected 0
test_mptcpify:FAIL:run_mptcpify unexpected error: -5 (errno 92)
#178/2 mptcp/mptcpify:FAIL
#178 mptcp:FAIL
All error logs:
test_base:PASS:test__join_cgroup 0 nsec
create_netns:PASS:ip netns add mptcp_ns 0 nsec
create_netns:PASS:ip -net mptcp_ns link set dev lo up 0 nsec
test_base:PASS:create_netns 0 nsec
test_base:PASS:start_server 0 nsec
run_test:PASS:skel_open_load 0 nsec
run_test:PASS:skel_attach 0 nsec
run_test:PASS:bpf_prog_attach 0 nsec
run_test:PASS:connect to fd 0 nsec
verify_tsk:PASS:bpf_map_lookup_elem 0 nsec
verify_tsk:PASS:unexpected invoked count 0 nsec
verify_tsk:PASS:unexpected is_mptcp 0 nsec
test_base:PASS:run_test tcp 0 nsec
(network_helpers.c:107: errno: Protocol not available) Failed to create
server socket
test_base:FAIL:start_mptcp_server unexpected start_mptcp_server: actual
-1 < expected 0
#178/1 mptcp/base:FAIL
test_mptcpify:PASS:test__join_cgroup 0 nsec
create_netns:PASS:ip netns add mptcp_ns 0 nsec
create_netns:PASS:ip -net mptcp_ns link set dev lo up 0 nsec
test_mptcpify:PASS:create_netns 0 nsec
run_mptcpify:PASS:skel_open_load 0 nsec
run_mptcpify:PASS:skel_attach 0 nsec
(network_helpers.c:107: errno: Protocol not available) Failed to create
server socket
run_mptcpify:FAIL:start_server unexpected start_server: actual -1 <
expected 0
test_mptcpify:FAIL:run_mptcpify unexpected error: -5 (errno 92)
#178/2 mptcp/mptcpify:FAIL
#178 mptcp:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
This is the custom patch I had applied on the LTS v6.12.36 kernel and
tested it:
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index dd595d9b5e50c..bdcc4136e92ef 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -89,7 +89,7 @@ const char *mptcp_get_scheduler(const struct net *net)
static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
{
- pernet->mptcp_enabled = 1;
+ pernet->mptcp_enabled = 0;
pernet->add_addr_timeout = TCP_RTO_MAX;
pernet->blackhole_timeout = 3600;
atomic_set(&pernet->active_disable_times, 0);
--
Thanks & Regards, Harshvardhan
Our syztester report the lockdep WARNING [1]. kmemleak_scan_thread()
invokes scan_block() which may invoke a nomal printk() to print warning
message. This can cause a deadlock in the scenario reported below:
CPU0 CPU1
---- ----
lock(kmemleak_lock);
lock(&port->lock);
lock(kmemleak_lock);
lock(console_owner);
To solve this problem, switch to printk_safe mode before printing warning
message, this will redirect all printk()-s to a special per-CPU buffer,
which will be flushed later from a safe context (irq work), and this
deadlock problem can be avoided. The proper API to use should be
printk_deferred_enter()/printk_deferred_exit() if we want to deferred the
printing [2].
This patch also fix some similar cases that need to use the printk
deferring [3].
[1]
https://lore.kernel.org/all/20250730094914.566582-1-gubowen5@huawei.com/
[2]
https://lore.kernel.org/all/5ca375cd-4a20-4807-b897-68b289626550@redhat.com/
[3]
https://lore.kernel.org/all/aJCir5Wh362XzLSx@arm.com/
====================
Cc: stable(a)vger.kernel.org # 5.10
Signed-off-by: Gu Bowen <gubowen5(a)huawei.com>
---
mm/kmemleak.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 4801751cb6b6..381145dde54f 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -390,9 +390,15 @@ static struct kmemleak_object *lookup_object(unsigned long ptr, int alias)
else if (object->pointer == ptr || alias)
return object;
else {
+ /*
+ * Printk deferring due to the kmemleak_lock held.
+ * This is done to avoid deadlock.
+ */
+ printk_deferred_enter();
kmemleak_warn("Found object by alias at 0x%08lx\n",
ptr);
dump_object_info(object);
+ printk_deferred_exit();
break;
}
}
@@ -433,8 +439,15 @@ static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
list_del(&object->object_list);
else if (mem_pool_free_count)
object = &mem_pool[--mem_pool_free_count];
- else
+ else {
+ /*
+ * Printk deferring due to the kmemleak_lock held.
+ * This is done to avoid deadlock.
+ */
+ printk_deferred_enter();
pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
+ printk_deferred_exit();
+ }
raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
return object;
@@ -632,6 +645,11 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
else if (parent->pointer + parent->size <= ptr)
link = &parent->rb_node.rb_right;
else {
+ /*
+ * Printk deferring due to the kmemleak_lock held.
+ * This is done to avoid deadlock.
+ */
+ printk_deferred_enter();
kmemleak_stop("Cannot insert 0x%lx into the object search tree (overlaps existing)\n",
ptr);
/*
@@ -639,6 +657,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
* be freed while the kmemleak_lock is held.
*/
dump_object_info(parent);
+ printk_deferred_exit();
kmem_cache_free(object_cache, object);
object = NULL;
goto out;
--
2.25.1
The patch titled
Subject: proc: proc_maps_open allow proc_mem_open to return NULL
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
proc-proc_maps_open-allow-proc_mem_open-to-return-null.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jialin Wang <wjl.linux(a)gmail.com>
Subject: proc: proc_maps_open allow proc_mem_open to return NULL
Date: Fri, 8 Aug 2025 00:54:55 +0800
commit 65c66047259f ("proc: fix the issue of proc_mem_open returning
NULL") breaks `perf record -g -p PID` when profiling a kernel thread.
The strace of `perf record -g -p $(pgrep kswapd0)` shows:
openat(AT_FDCWD, "/proc/65/task/65/maps", O_RDONLY) = -1 ESRCH (No such process)
This patch partially reverts the commit to fix it.
Link: https://lkml.kernel.org/r/20250807165455.73656-1-wjl.linux@gmail.com
Fixes: 65c66047259f ("proc: fix the issue of proc_mem_open returning NULL")
Signed-off-by: Jialin Wang <wjl.linux(a)gmail.com>
Cc: Penglei Jiang <superman.xpt(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/proc/task_mmu.c~proc-proc_maps_open-allow-proc_mem_open-to-return-null
+++ a/fs/proc/task_mmu.c
@@ -340,8 +340,8 @@ static int proc_maps_open(struct inode *
priv->inode = inode;
priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
- if (IS_ERR_OR_NULL(priv->mm)) {
- int err = priv->mm ? PTR_ERR(priv->mm) : -ESRCH;
+ if (IS_ERR(priv->mm)) {
+ int err = PTR_ERR(priv->mm);
seq_release_private(inode, file);
return err;
_
Patches currently in -mm which might be from wjl.linux(a)gmail.com are
proc-proc_maps_open-allow-proc_mem_open-to-return-null.patch
The quilt patch titled
Subject: mm: fix accounting of memmap pages for early sections
has been removed from the -mm tree. Its filename was
mm-fix-accounting-of-memmap-pages-for-early-sections.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Subject: mm: fix accounting of memmap pages for early sections
Date: Mon, 4 Aug 2025 10:40:15 +0200
memmap pages can be allocated either from the memblock (boot) allocator
during early boot or from the buddy allocator.
When these memmap pages are removed via arch_remove_memory(), the
deallocation path depends on their source:
* For pages from the buddy allocator, depopulate_section_memmap() is
called, which also decrements the count of nr_memmap_pages.
* For pages from the boot allocator, free_map_bootmem() is called. But
it currently does not adjust the nr_memmap_boot_pages.
To fix this inconsistency, update free_map_bootmem() to also decrement the
nr_memmap_boot_pages count by invoking memmap_boot_pages_add(), mirroring
how free_vmemmap_page() handles this for boot-allocated pages.
This ensures correct tracking of memmap pages regardless of allocation
source.
Link: https://lkml.kernel.org/r/20250804084015.270570-1-sumanthk@linux.ibm.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/sparse.c~mm-fix-accounting-of-memmap-pages-for-early-sections
+++ a/mm/sparse.c
@@ -688,6 +688,7 @@ static void free_map_bootmem(struct page
unsigned long start = (unsigned long)memmap;
unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+ memmap_boot_pages_add(-1L * (DIV_ROUND_UP(end - start, PAGE_SIZE)));
vmemmap_free(start, end, NULL);
}
_
Patches currently in -mm which might be from sumanthk(a)linux.ibm.com are
mm-fix-accounting-of-memmap-pages.patch
The patch titled
Subject: mm: fix accounting of memmap pages
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-accounting-of-memmap-pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Subject: mm: fix accounting of memmap pages
Date: Thu, 7 Aug 2025 20:35:45 +0200
For !CONFIG_SPARSEMEM_VMEMMAP, memmap page accounting is currently done
upfront in sparse_buffer_init(). However, sparse_buffer_alloc() may
return NULL in failure scenario.
Also, memmap pages may be allocated either from the memblock allocator
during early boot or from the buddy allocator. When removed via
arch_remove_memory(), accounting of memmap pages must reflect the original
allocation source.
To ensure correctness:
* Account memmap pages after successful allocation in sparse_init_nid()
and section_activate().
* Account memmap pages in section_deactivate() based on allocation
source.
Link: https://lkml.kernel.org/r/20250807183545.1424509-1-sumanthk@linux.ibm.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse-vmemmap.c | 5 -----
mm/sparse.c | 15 +++++++++------
2 files changed, 9 insertions(+), 11 deletions(-)
--- a/mm/sparse.c~mm-fix-accounting-of-memmap-pages
+++ a/mm/sparse.c
@@ -454,9 +454,6 @@ static void __init sparse_buffer_init(un
*/
sparsemap_buf = memmap_alloc(size, section_map_size(), addr, nid, true);
sparsemap_buf_end = sparsemap_buf + size;
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
- memmap_boot_pages_add(DIV_ROUND_UP(size, PAGE_SIZE));
-#endif
}
static void __init sparse_buffer_fini(void)
@@ -567,6 +564,8 @@ static void __init sparse_init_nid(int n
sparse_buffer_fini();
goto failed;
}
+ memmap_boot_pages_add(DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
+ PAGE_SIZE));
sparse_init_early_section(nid, map, pnum, 0);
}
}
@@ -680,7 +679,6 @@ static void depopulate_section_memmap(un
unsigned long start = (unsigned long) pfn_to_page(pfn);
unsigned long end = start + nr_pages * sizeof(struct page);
- memmap_pages_add(-1L * (DIV_ROUND_UP(end - start, PAGE_SIZE)));
vmemmap_free(start, end, altmap);
}
static void free_map_bootmem(struct page *memmap)
@@ -857,10 +855,14 @@ static void section_deactivate(unsigned
* The memmap of early sections is always fully populated. See
* section_activate() and pfn_valid() .
*/
- if (!section_is_early)
+ if (!section_is_early) {
+ memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
depopulate_section_memmap(pfn, nr_pages, altmap);
- else if (memmap)
+ } else if (memmap) {
+ memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
+ PAGE_SIZE)));
free_map_bootmem(memmap);
+ }
if (empty)
ms->section_mem_map = (unsigned long)NULL;
@@ -905,6 +907,7 @@ static struct page * __meminit section_a
section_deactivate(pfn, nr_pages, altmap);
return ERR_PTR(-ENOMEM);
}
+ memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
return memmap;
}
--- a/mm/sparse-vmemmap.c~mm-fix-accounting-of-memmap-pages
+++ a/mm/sparse-vmemmap.c
@@ -578,11 +578,6 @@ struct page * __meminit __populate_secti
if (r < 0)
return NULL;
- if (system_state == SYSTEM_BOOTING)
- memmap_boot_pages_add(DIV_ROUND_UP(end - start, PAGE_SIZE));
- else
- memmap_pages_add(DIV_ROUND_UP(end - start, PAGE_SIZE));
-
return pfn_to_page(pfn);
}
_
Patches currently in -mm which might be from sumanthk(a)linux.ibm.com are
mm-fix-accounting-of-memmap-pages-for-early-sections.patch
mm-fix-accounting-of-memmap-pages.patch
When the file system is frozen in preparation for taking an LVM
snapshot, the journal is checkpointed and if the orphan_file feature
is enabled, and the orphan file is empty, we clear the orphan_present
feature flag. But if there are pending inodes that need to be removed
the orphan_present feature flag can't be cleared.
The problem comes if the block device is read-only. In that case, we
can't process the orphan inode list, so it is skipped in
ext4_orphan_cleanup(). But then in ext4_mark_recovery_complete(),
this results in the ext4 error "Orphan file not empty on read-only fs"
firing and the file system mount is aborted.
Fix this by clearing the needs_recovery flag in the block device is
read-only. We do this after the call to ext4_load_and_init-journal()
since there are some error checks need to be done in case the journal
needs to be replayed and the block device is read-only, or if the
block device containing the externa journal is read-only, etc.
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1108271
Cc: stable(a)vger.kernel.org
Fixes: 02f310fcf47f ("ext4: Speedup ext4 orphan inode handling")
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
---
fs/ext4/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c7d39da7e733..52a5f2b391fb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5414,6 +5414,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
err = ext4_load_and_init_journal(sb, es, ctx);
if (err)
goto failed_mount3a;
+ if (bdev_read_only(sb->s_bdev))
+ needs_recovery = 0;
} else if (test_opt(sb, NOLOAD) && !sb_rdonly(sb) &&
ext4_has_feature_journal_needs_recovery(sb)) {
ext4_msg(sb, KERN_ERR, "required journal recovery "
--
2.47.2
The TypeC PHY HW readout during driver loading and system resume
determines which TypeC mode the PHY is in (legacy/DP-alt/TBT-alt) and
whether the PHY is connected, based on the PHY's Owned and Ready flags.
For the PHY to be in DP-alt or legacy mode and for the PHY to be in the
connected state in these modes, both the Owned (set by the BIOS/driver)
and the Ready (set by the HW) flags should be set.
On ICL-MTL the HW kept the PHY's Ready flag set after the driver
connected the PHY by acquiring the PHY ownership (by setting the Owned
flag), until the driver disconnected the PHY by releasing the PHY
ownership (by clearing the Owned flag). On LNL+ this has changed, in
that the HW clears the Ready flag as soon as the sink gets disconnected,
even if the PHY ownership was acquired already and hence the PHY is
being used by the display.
When inheriting the HW state from BIOS for a PHY connected in DP-alt
mode on which the sink got disconnected - i.e. in a case where the sink
was connected while BIOS/GOP was running and so the sink got enabled
connecting the PHY, but the user disconnected the sink by the time the
driver loaded - the PHY Owned but not Ready state must be accounted for
on LNL+ according to the above. Do that by assuming on LNL+ that the PHY
is connected in DP-alt mode whenever the PHY Owned flag is set,
regardless of the PHY Ready flag.
This fixes a problem on LNL+, where the PHY TypeC mode / connected state
was detected incorrectly for a DP-alt sink, which got connected and then
disconnected by the user in the above way.
Cc: stable(a)vger.kernel.org # v6.8+
Reported-by: Charlton Lin <charlton.lin(a)intel.com>
Tested-by: Khaled Almahallawy <khaled.almahallawy(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_tc.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_tc.c b/drivers/gpu/drm/i915/display/intel_tc.c
index 3bc57579fe53e..73a08bd84a70a 100644
--- a/drivers/gpu/drm/i915/display/intel_tc.c
+++ b/drivers/gpu/drm/i915/display/intel_tc.c
@@ -1226,14 +1226,18 @@ static void tc_phy_get_hw_state(struct intel_tc_port *tc)
tc->phy_ops->get_hw_state(tc);
}
-static bool tc_phy_is_ready_and_owned(struct intel_tc_port *tc,
- bool phy_is_ready, bool phy_is_owned)
+static bool tc_phy_in_legacy_or_dp_alt_mode(struct intel_tc_port *tc,
+ bool phy_is_ready, bool phy_is_owned)
{
struct intel_display *display = to_intel_display(tc->dig_port);
- drm_WARN_ON(display->drm, phy_is_owned && !phy_is_ready);
+ if (DISPLAY_VER(display) < 20) {
+ drm_WARN_ON(display->drm, phy_is_owned && !phy_is_ready);
- return phy_is_ready && phy_is_owned;
+ return phy_is_ready && phy_is_owned;
+ } else {
+ return phy_is_owned;
+ }
}
static bool tc_phy_is_connected(struct intel_tc_port *tc,
@@ -1244,7 +1248,7 @@ static bool tc_phy_is_connected(struct intel_tc_port *tc,
bool phy_is_owned = tc_phy_is_owned(tc);
bool is_connected;
- if (tc_phy_is_ready_and_owned(tc, phy_is_ready, phy_is_owned))
+ if (tc_phy_in_legacy_or_dp_alt_mode(tc, phy_is_ready, phy_is_owned))
is_connected = port_pll_type == ICL_PORT_DPLL_MG_PHY;
else
is_connected = port_pll_type == ICL_PORT_DPLL_DEFAULT;
@@ -1352,7 +1356,7 @@ tc_phy_get_current_mode(struct intel_tc_port *tc)
phy_is_ready = tc_phy_is_ready(tc);
phy_is_owned = tc_phy_is_owned(tc);
- if (!tc_phy_is_ready_and_owned(tc, phy_is_ready, phy_is_owned)) {
+ if (!tc_phy_in_legacy_or_dp_alt_mode(tc, phy_is_ready, phy_is_owned)) {
mode = get_tc_mode_in_phy_not_owned_state(tc, live_mode);
} else {
drm_WARN_ON(display->drm, live_mode == TC_PORT_TBT_ALT);
--
2.49.1
The BIOS can leave the AUX power well enabled on an output, even if this
isn't required (on platforms where the AUX power is only needed for an
AUX access). This was observed at least on PTL. To avoid the WARN which
would be triggered by this during the HW readout, convert the WARN to a
debug message.
Cc: stable(a)vger.kernel.org # v6.8+
Reported-by: Charlton Lin <charlton.lin(a)intel.com>
Tested-by: Khaled Almahallawy <khaled.almahallawy(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_tc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_tc.c b/drivers/gpu/drm/i915/display/intel_tc.c
index 14042a64375e1..dec54cb0c8c63 100644
--- a/drivers/gpu/drm/i915/display/intel_tc.c
+++ b/drivers/gpu/drm/i915/display/intel_tc.c
@@ -1494,11 +1494,11 @@ static void intel_tc_port_reset_mode(struct intel_tc_port *tc,
intel_display_power_flush_work(display);
if (!intel_tc_cold_requires_aux_pw(dig_port)) {
enum intel_display_power_domain aux_domain;
- bool aux_powered;
aux_domain = intel_aux_power_domain(dig_port);
- aux_powered = intel_display_power_is_enabled(display, aux_domain);
- drm_WARN_ON(display->drm, aux_powered);
+ if (intel_display_power_is_enabled(display, aux_domain))
+ drm_dbg_kms(display->drm, "Port %s: AUX unexpectedly powered\n",
+ tc->port_name);
}
tc_phy_disconnect(tc);
--
2.49.1
Use the cached max lane count value on LNL+, to account for scenarios
where this value is queried after the HW cleared the corresponding pin
assignment value in the TCSS_DDI_STATUS register after the sink got
disconnected.
For consistency, follow-up changes will use the cached max lane count
value on other platforms as well and will also cache the pin assignment
value in a similar way.
Cc: stable(a)vger.kernel.org # v6.8+
Reported-by: Charlton Lin <charlton.lin(a)intel.com>
Tested-by: Khaled Almahallawy <khaled.almahallawy(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_tc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_tc.c b/drivers/gpu/drm/i915/display/intel_tc.c
index ea93893980e17..14042a64375e1 100644
--- a/drivers/gpu/drm/i915/display/intel_tc.c
+++ b/drivers/gpu/drm/i915/display/intel_tc.c
@@ -395,12 +395,16 @@ static void read_pin_configuration(struct intel_tc_port *tc)
int intel_tc_port_max_lane_count(struct intel_digital_port *dig_port)
{
+ struct intel_display *display = to_intel_display(dig_port);
struct intel_tc_port *tc = to_tc_port(dig_port);
if (!intel_encoder_is_tc(&dig_port->base))
return 4;
- return get_max_lane_count(tc);
+ if (DISPLAY_VER(display) < 20)
+ return get_max_lane_count(tc);
+
+ return tc->max_lane_count;
}
void intel_tc_port_set_fia_lane_count(struct intel_digital_port *dig_port,
--
2.49.1
On LNL+ for a disconnected sink the pin assignment value gets cleared by
the HW/FW as soon as the sink gets disconnected, even if the PHY
ownership got acquired already by the BIOS/driver (and hence the PHY
itself is still connected and used by the display). During HW readout
this can result in detecting the PHY's max lane count as 0 - matching
the above cleared aka NONE pin assignment HW state. For a connected PHY
the driver in general (outside of intel_tc.c) expects the max lane count
value to be valid for the video mode enabled on the corresponding output
(1, 2 or 4). Ensure this by setting the max lane count to 4 in this
case. Note, that it doesn't matter if this lane count happened to be
more than the max lane count with which the PHY got connected and
enabled, since the only thing the driver can do with such an output -
where the DP-alt sink is disconnected - is to disable the output.
Cc: stable(a)vger.kernel.org # v6.8+
Reported-by: Charlton Lin <charlton.lin(a)intel.com>
Tested-by: Khaled Almahallawy <khaled.almahallawy(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_tc.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_tc.c b/drivers/gpu/drm/i915/display/intel_tc.c
index ea6c73af683a0..ea93893980e17 100644
--- a/drivers/gpu/drm/i915/display/intel_tc.c
+++ b/drivers/gpu/drm/i915/display/intel_tc.c
@@ -23,6 +23,7 @@
#include "intel_modeset_lock.h"
#include "intel_tc.h"
+#define DP_PIN_ASSIGNMENT_NONE 0x0
#define DP_PIN_ASSIGNMENT_C 0x3
#define DP_PIN_ASSIGNMENT_D 0x4
#define DP_PIN_ASSIGNMENT_E 0x5
@@ -308,6 +309,8 @@ static int lnl_tc_port_get_max_lane_count(struct intel_digital_port *dig_port)
REG_FIELD_GET(TCSS_DDI_STATUS_PIN_ASSIGNMENT_MASK, val);
switch (pin_assignment) {
+ case DP_PIN_ASSIGNMENT_NONE:
+ return 0;
default:
MISSING_CASE(pin_assignment);
fallthrough;
@@ -1157,6 +1160,12 @@ static void xelpdp_tc_phy_get_hw_state(struct intel_tc_port *tc)
tc->lock_wakeref = tc_cold_block(tc);
read_pin_configuration(tc);
+ /*
+ * Set a valid lane count value for a DP-alt sink which got
+ * disconnected. The driver can only disable the output on this PHY.
+ */
+ if (tc->max_lane_count == 0)
+ tc->max_lane_count = 4;
drm_WARN_ON(display->drm,
(tc->mode == TC_PORT_DP_ALT || tc->mode == TC_PORT_LEGACY) &&
--
2.49.1
In __hdmi_lpe_audio_probe(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in one character less
being copied from 'card->shortname' to 'pcm->name'.
Since 'pcm->name' is a fixed-size buffer, we can safely omit the size
argument and let strscpy() infer it using sizeof(). This ensures the
card name is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: 75b1a8f9d62e ("ALSA: Convert strlcpy to strscpy when return value is unused")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
sound/x86/intel_hdmi_audio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/x86/intel_hdmi_audio.c b/sound/x86/intel_hdmi_audio.c
index cc54539c6030..fbef0cbe8f1a 100644
--- a/sound/x86/intel_hdmi_audio.c
+++ b/sound/x86/intel_hdmi_audio.c
@@ -1765,7 +1765,7 @@ static int __hdmi_lpe_audio_probe(struct platform_device *pdev)
/* setup private data which can be retrieved when required */
pcm->private_data = ctx;
pcm->info_flags = 0;
- strscpy(pcm->name, card->shortname, strlen(card->shortname));
+ strscpy(pcm->name, card->shortname);
/* setup the ops for playback */
snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, &had_pcm_ops);
--
2.50.1
From: Akash M <akash.m5(a)samsung.com>
This commit addresses a rarely observed endpoint command timeout
which causes kernel panic due to warn when 'panic_on_warn' is enabled
and unnecessary call trace prints when 'panic_on_warn' is disabled.
It is seen during fast software-controlled connect/disconnect testcases.
The following is one such endpoint command timeout that we observed:
1. Connect
=======
->dwc3_thread_interrupt
->dwc3_ep0_interrupt
->configfs_composite_setup
->composite_setup
->usb_ep_queue
->dwc3_gadget_ep0_queue
->__dwc3_gadget_ep0_queue
->__dwc3_ep0_do_control_data
->dwc3_send_gadget_ep_cmd
2. Disconnect
==========
->dwc3_thread_interrupt
->dwc3_gadget_disconnect_interrupt
->dwc3_ep0_reset_state
->dwc3_ep0_end_control_data
->dwc3_send_gadget_ep_cmd
In the issue scenario, in Exynos platforms, we observed that control
transfers for the previous connect have not yet been completed and end
transfer command sent as a part of the disconnect sequence and
processing of USB_ENDPOINT_HALT feature request from the host timeout.
This maybe an expected scenario since the controller is processing EP
commands sent as a part of the previous connect. It maybe better to
remove WARN_ON in all places where device endpoint commands are sent to
avoid unnecessary kernel panic due to warn.
Fixes: e192cc7b5239 ("usb: dwc3: gadget: move cmd_endtransfer to extra function")
Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
Fixes: c7fcdeb2627c ("usb: dwc3: ep0: simplify EP0 state machine")
Fixes: f0f2b2a2db85 ("usb: dwc3: ep0: push ep0state into xfernotready processing")
Fixes: 2e3db064855a ("usb: dwc3: ep0: drop XferNotReady(DATA) support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Akash M <akash.m5(a)samsung.com>
Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com>
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 666ac432f52d..7b313836f62b 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -288,7 +288,9 @@ void dwc3_ep0_out_start(struct dwc3 *dwc)
dwc3_ep0_prepare_one_trb(dep, dwc->ep0_trb_addr, 8,
DWC3_TRBCTL_CONTROL_SETUP, false);
ret = dwc3_ep0_start_trans(dep);
- WARN_ON(ret < 0);
+ if (ret < 0)
+ dev_warn(dwc->dev, "ep0 out start transfer failed: %d\n", ret);
+
for (i = 2; i < DWC3_ENDPOINTS_NUM; i++) {
struct dwc3_ep *dwc3_ep;
@@ -1061,7 +1063,9 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc,
ret = dwc3_ep0_start_trans(dep);
}
- WARN_ON(ret < 0);
+ if (ret < 0)
+ dev_warn(dwc->dev, "ep0 data phase start transfer failed: %d\n",
+ ret);
}
static int dwc3_ep0_start_control_status(struct dwc3_ep *dep)
@@ -1078,7 +1082,12 @@ static int dwc3_ep0_start_control_status(struct dwc3_ep *dep)
static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep *dep)
{
- WARN_ON(dwc3_ep0_start_control_status(dep));
+ int ret;
+
+ ret = dwc3_ep0_start_control_status(dep);
+ if (ret)
+ dev_warn(dwc->dev,
+ "ep0 status phase start transfer failed: %d\n", ret);
}
static void dwc3_ep0_do_control_status(struct dwc3 *dwc,
@@ -1121,7 +1130,10 @@ void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
cmd |= DWC3_DEPCMD_PARAM(dep->resource_index);
memset(¶ms, 0, sizeof(params));
ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms);
- WARN_ON_ONCE(ret);
+ if (ret)
+ dev_warn_ratelimited(dwc->dev,
+ "ep0 data phase end transfer failed: %d\n", ret);
+
dep->resource_index = 0;
}
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 321361288935..50e4f667b2f2 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1774,7 +1774,11 @@ static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int
dep->flags |= DWC3_EP_DELAY_STOP;
return 0;
}
- WARN_ON_ONCE(ret);
+
+ if (ret)
+ dev_warn_ratelimited(dep->dwc->dev,
+ "end transfer failed: ret = %d\n", ret);
+
dep->resource_index = 0;
if (!interrupt)
@@ -4041,7 +4045,9 @@ static void dwc3_clear_stall_all_ep(struct dwc3 *dwc)
dep->flags &= ~DWC3_EP_STALL;
ret = dwc3_send_clear_stall_ep_cmd(dep);
- WARN_ON_ONCE(ret);
+ if (ret)
+ dev_warn_ratelimited(dwc->dev,
+ "failed to clear STALL on %s\n", dep->name);
}
}
--
2.17.1
The patch titled
Subject: mm: fix accounting of memmap pages for early sections
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-accounting-of-memmap-pages-for-early-sections.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Subject: mm: fix accounting of memmap pages for early sections
Date: Mon, 4 Aug 2025 10:40:15 +0200
memmap pages can be allocated either from the memblock (boot) allocator
during early boot or from the buddy allocator.
When these memmap pages are removed via arch_remove_memory(), the
deallocation path depends on their source:
* For pages from the buddy allocator, depopulate_section_memmap() is
called, which also decrements the count of nr_memmap_pages.
* For pages from the boot allocator, free_map_bootmem() is called. But
it currently does not adjust the nr_memmap_boot_pages.
To fix this inconsistency, update free_map_bootmem() to also decrement the
nr_memmap_boot_pages count by invoking memmap_boot_pages_add(), mirroring
how free_vmemmap_page() handles this for boot-allocated pages.
This ensures correct tracking of memmap pages regardless of allocation
source.
Link: https://lkml.kernel.org/r/20250804084015.270570-1-sumanthk@linux.ibm.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/sparse.c~mm-fix-accounting-of-memmap-pages-for-early-sections
+++ a/mm/sparse.c
@@ -688,6 +688,7 @@ static void free_map_bootmem(struct page
unsigned long start = (unsigned long)memmap;
unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+ memmap_boot_pages_add(-1L * (DIV_ROUND_UP(end - start, PAGE_SIZE)));
vmemmap_free(start, end, NULL);
}
_
Patches currently in -mm which might be from sumanthk(a)linux.ibm.com are
mm-fix-accounting-of-memmap-pages-for-early-sections.patch
Pentium 4's which are INTEL_P4_PRESCOTT (model 0x03) and later have
a constant TSC. This was correctly captured until commit fadb6f569b10
("x86/cpu/intel: Limit the non-architectural constant_tsc model checks").
In that commit, an error was introduced while selecting the last P4
model (0x06) as the upper bound. Model 0x06 was transposed to
INTEL_P4_WILLAMETTE, which is just plain wrong. That was presumably a
simple typo, probably just copying and pasting the wrong P4 model.
Fix the constant TSC logic to cover all later P4 models. End at
INTEL_P4_CEDARMILL which accurately corresponds to the last P4 model.
Fixes: fadb6f569b10 ("x86/cpu/intel: Limit the non-architectural constant_tsc model checks")
Cc: <stable(a)vger.kernel.org> # v6.15
Signed-off-by: Suchit Karunakaran <suchitkarunakaran(a)gmail.com>
---
Changes since v4:
- Updated the patch based on review suggestions
Changes since v3:
- Refined changelog
Changes since v2:
- Improved commit message
Changes since v1:
- Fixed incorrect logic
arch/x86/kernel/cpu/intel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 076eaa41b8c8..98ae4c37c93e 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -262,7 +262,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
- } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_WILLAMETTE) ||
+ } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_CEDARMILL) ||
(c->x86_vfm >= INTEL_CORE_YONAH && c->x86_vfm <= INTEL_IVYBRIDGE)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
}
--
2.50.1
A new warning in Clang 22 [1] complains that @clidr passed to
get_clidr_el1() is an uninitialized const pointer. get_clidr_el1()
doesn't really care since it casts away the const-ness anyways -- it is
a false positive.
| ../arch/arm64/kvm/sys_regs.c:2838:23: warning: variable 'clidr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
| 2838 | get_clidr_el1(NULL, &clidr); /* Ugly... */
| | ^~~~~
Disable this warning for sys_regs.o with an iron fist as it doesn't make
sense to waste maintainer's time or potentially break builds by
backporting large changelists from 6.2+.
This patch isn't needed for anything past 6.1 as this code section was
reworked in Commit 7af0c2534f4c ("KVM: arm64: Normalize cache
configuration").
Cc: stable(a)vger.kernel.org
Fixes: 7c8c5e6a9101e ("arm64: KVM: system register handling")
Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d44… [1]
Signed-off-by: Justin Stitt <justinstitt(a)google.com>
---
I'm sending a similar patch for 6.1.
---
arch/arm64/kvm/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..109cca425d3e 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,3 +25,6 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
vgic/vgic-its.o vgic/vgic-debug.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o
+
+# Work around a false positive Clang 22 -Wuninitialized-const-pointer warning
+CFLAGS_sys_regs.o := $(call cc-disable-warning, uninitialized-const-pointer)
---
base-commit: 8bb7eca972ad531c9b149c0a51ab43a417385813
change-id: 20250728-b4-stable-disable-uninit-ptr-warn-5-15-c0c9db3df206
Best regards,
--
Justin Stitt <justinstitt(a)google.com>
A new warning in Clang 22 [1] complains that @clidr passed to
get_clidr_el1() is an uninitialized const pointer. get_clidr_el1()
doesn't really care since it casts away the const-ness anyways -- it is
a false positive.
| ../arch/arm64/kvm/sys_regs.c:2978:23: warning: variable 'clidr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
| 2978 | get_clidr_el1(NULL, &clidr); /* Ugly... */
| | ^~~~~
Disable this warning for sys_regs.o with an iron fist as it doesn't make
sense to waste maintainer's time or potentially break builds by
backporting large changelists from 6.2+.
This patch isn't needed for anything past 6.1 as this code section was
reworked in Commit 7af0c2534f4c ("KVM: arm64: Normalize cache
configuration").
Cc: stable(a)vger.kernel.org
Fixes: 7c8c5e6a9101e ("arm64: KVM: system register handling")
Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d44… [1]
Signed-off-by: Justin Stitt <justinstitt(a)google.com>
---
I've sent a similar patch for 5.15.
---
arch/arm64/kvm/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5e33c2d4645a..5fdb5331bfad 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -24,6 +24,9 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
+# Work around a false positive Clang 22 -Wuninitialized-const-pointer warning
+CFLAGS_sys_regs.o := $(call cc-disable-warning, uninitialized-const-pointer)
+
always-y := hyp_constants.h hyp-constants.s
define rule_gen_hyp_constants
---
base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
change-id: 20250728-stable-disable-unit-ptr-warn-281fee82539c
Best regards,
--
Justin Stitt <justinstitt(a)google.com>
The patch titled
Subject: userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
userfaultfd-fix-a-crash-in-uffdio_move-when-pmd-is-a-migration-entry.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
Date: Wed, 6 Aug 2025 15:00:22 -0700
When UFFDIO_MOVE encounters a migration PMD entry, it proceeds with
obtaining a folio and accessing it even though the entry is swp_entry_t.
Add the missing check and let split_huge_pmd() handle migration entries.
Link: https://lkml.kernel.org/r/20250806220022.926763-1-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Reported-by: syzbot+b446dbe27035ef6bd6c2(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68794b5c.a70a0220.693ce.0050.GAE@google.com/
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-a-crash-in-uffdio_move-when-pmd-is-a-migration-entry
+++ a/mm/userfaultfd.c
@@ -1821,13 +1821,16 @@ ssize_t move_pages(struct userfaultfd_ct
/* Check if we can move the pmd without splitting it. */
if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||
!pmd_none(dst_pmdval)) {
- struct folio *folio = pmd_folio(*src_pmd);
+ /* Can be a migration entry */
+ if (pmd_present(*src_pmd)) {
+ struct folio *folio = pmd_folio(*src_pmd);
- if (!folio || (!is_huge_zero_folio(folio) &&
- !PageAnonExclusive(&folio->page))) {
- spin_unlock(ptl);
- err = -EBUSY;
- break;
+ if (!folio || (!is_huge_zero_folio(folio) &&
+ !PageAnonExclusive(&folio->page))) {
+ spin_unlock(ptl);
+ err = -EBUSY;
+ break;
+ }
}
spin_unlock(ptl);
_
Patches currently in -mm which might be from surenb(a)google.com are
userfaultfd-fix-a-crash-in-uffdio_move-when-pmd-is-a-migration-entry.patch
mm-limit-the-scope-of-vma_start_read.patch
mm-change-vma_start_read-to-drop-rcu-lock-on-failure.patch
When UFFDIO_MOVE is used with UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES and it
encounters a non-present PMD (migration entry), it proceeds with folio
access even though the folio is not present. Add the missing check and
let split_huge_pmd() handle migration entries.
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Reported-by: syzbot+b446dbe27035ef6bd6c2(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68794b5c.a70a0220.693ce.0050.GAE@google.com/
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable(a)vger.kernel.org
---
Changes since v2 [1]
- Updated the title and changelog, per David Hildenbrand
- Removed extra checks for non-present not-migration PMD entries,
per Peter Xu
[1] https://lore.kernel.org/all/20250731154442.319568-1-surenb@google.com/
mm/userfaultfd.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 5431c9dd7fd7..116481606be8 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1826,13 +1826,16 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
/* Check if we can move the pmd without splitting it. */
if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||
!pmd_none(dst_pmdval)) {
- struct folio *folio = pmd_folio(*src_pmd);
-
- if (!folio || (!is_huge_zero_folio(folio) &&
- !PageAnonExclusive(&folio->page))) {
- spin_unlock(ptl);
- err = -EBUSY;
- break;
+ /* Can be a migration entry */
+ if (pmd_present(*src_pmd)) {
+ struct folio *folio = pmd_folio(*src_pmd);
+
+ if (!folio || (!is_huge_zero_folio(folio) &&
+ !PageAnonExclusive(&folio->page))) {
+ spin_unlock(ptl);
+ err = -EBUSY;
+ break;
+ }
}
spin_unlock(ptl);
base-commit: 8e7e0c6d09502e44aa7a8fce0821e042a6ec03d1
--
2.50.1.565.gc32cd1483b-goog
The commit referenced in the Fixes tag causes usbnet to malfunction
(identified via git bisect). Post-commit, my external RJ45 LAN cable
fails to connect. Linus also reported the same issue after pulling that
commit.
The code has a logic error: netif_carrier_on() is only called when the
link is already on. Fix this by moving the netif_carrier_on() call
outside the if-statement entirely. This ensures it is always called
when EVENT_LINK_CARRIER_ON is set and properly clears it regardless
of the link state.
Cc: stable(a)vger.kernel.org
Cc: Armando Budianto <sprite(a)gnuweeb.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjqL4uF0MG_c8+xHX1Vv8==sPYQrtzbdA3kzi9628…
Closes: https://lore.kernel.org/netdev/CAHk-=wjKh8X4PT_mU1kD4GQrbjivMfPn-_hXa6han_B…
Closes: https://lore.kernel.org/netdev/0752dee6-43d6-4e1f-81d2-4248142cccd2@gnuweeb…
Fixes: 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event")
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
v3:
- Move the netif_carrier_on() call outside of the if-statement
entirely (Linus).
v2:
- Rebase on top of the latest netdev/net tree. The previous patch was
based on 0d9cfc9b8cb1. Line numbers have changed since then.
Link: https://lore.gnuweeb.org/gwml/20250801190310.58443-1-ammarfaizi2@gnuweeb.org
drivers/net/usb/usbnet.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a38ffbf4b3f0..511c4154cf74 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1113,32 +1113,32 @@ static const struct ethtool_ops usbnet_ethtool_ops = {
.set_link_ksettings = usbnet_set_link_ksettings_mii,
};
/*-------------------------------------------------------------------------*/
static void __handle_link_change(struct usbnet *dev)
{
if (!test_bit(EVENT_DEV_OPEN, &dev->flags))
return;
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
if (!netif_carrier_ok(dev->net)) {
/* kill URBs for reading packets to save bus bandwidth */
unlink_urbs(dev, &dev->rxq);
/*
* tx_timeout will unlink URBs for sending packets and
* tx queue is stopped by netcore after link becomes off
*/
} else {
- if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
- netif_carrier_on(dev->net);
-
/* submitting URBs for reading packets */
queue_work(system_bh_wq, &dev->bh_work);
}
/* hard_mtu or rx_urb_size may change during link change */
usbnet_update_max_qlen(dev);
clear_bit(EVENT_LINK_CHANGE, &dev->flags);
}
--
Ammar Faizi
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 80 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 10 ++-
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 129 insertions(+), 34 deletions(-)
--
2.50.1.552.g942d659e1b-goog
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 72 ++++++++++---------
2 files changed, 41 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index efe98ffb679a4..b2538cff222ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2570,9 +2570,6 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
adev->firmware.gpu_info_fw = NULL;
- if (adev->mman.discovery_bin)
- return 0;
-
switch (adev->asic_type) {
default:
return 0;
@@ -2594,6 +2591,8 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
chip_name = "arcturus";
break;
case CHIP_NAVI12:
+ if (adev->mman.discovery_bin)
+ return 0;
chip_name = "navi12";
break;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 81b3443c8d7f4..27bd7659961e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -2555,40 +2555,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
switch (adev->asic_type) {
case CHIP_VEGA10:
- case CHIP_VEGA12:
- case CHIP_RAVEN:
- case CHIP_VEGA20:
- case CHIP_ARCTURUS:
- case CHIP_ALDEBARAN:
- /* this is not fatal. We have a fallback below
- * if the new firmwares are not present. some of
- * this will be overridden below to keep things
- * consistent with the current behavior.
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
*/
- r = amdgpu_discovery_reg_base_init(adev);
- if (!r) {
- amdgpu_discovery_harvest_ip(adev);
- amdgpu_discovery_get_gfx_info(adev);
- amdgpu_discovery_get_mall_info(adev);
- amdgpu_discovery_get_vcn_info(adev);
- }
- break;
- default:
- r = amdgpu_discovery_reg_base_init(adev);
- if (r) {
- drm_err(&adev->ddev, "discovery failed: %d\n", r);
- return r;
- }
-
- amdgpu_discovery_harvest_ip(adev);
- amdgpu_discovery_get_gfx_info(adev);
- amdgpu_discovery_get_mall_info(adev);
- amdgpu_discovery_get_vcn_info(adev);
- break;
- }
-
- switch (adev->asic_type) {
- case CHIP_VEGA10:
+ amdgpu_discovery_init(adev);
vega10_reg_base_init(adev);
adev->sdma.num_instances = 2;
adev->gmc.num_umc = 4;
@@ -2611,6 +2582,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[DCI_HWIP][0] = IP_VERSION(12, 0, 0);
break;
case CHIP_VEGA12:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
vega10_reg_base_init(adev);
adev->sdma.num_instances = 2;
adev->gmc.num_umc = 4;
@@ -2633,6 +2609,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[DCI_HWIP][0] = IP_VERSION(12, 0, 1);
break;
case CHIP_RAVEN:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
vega10_reg_base_init(adev);
adev->sdma.num_instances = 1;
adev->vcn.num_vcn_inst = 1;
@@ -2674,6 +2655,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
}
break;
case CHIP_VEGA20:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
vega20_reg_base_init(adev);
adev->sdma.num_instances = 2;
adev->gmc.num_umc = 8;
@@ -2697,6 +2683,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[DCI_HWIP][0] = IP_VERSION(12, 1, 0);
break;
case CHIP_ARCTURUS:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
arct_reg_base_init(adev);
adev->sdma.num_instances = 8;
adev->vcn.num_vcn_inst = 2;
@@ -2725,6 +2716,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[UVD_HWIP][1] = IP_VERSION(2, 5, 0);
break;
case CHIP_ALDEBARAN:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
aldebaran_reg_base_init(adev);
adev->sdma.num_instances = 5;
adev->vcn.num_vcn_inst = 2;
@@ -2751,6 +2747,16 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 1, 0);
break;
default:
+ r = amdgpu_discovery_reg_base_init(adev);
+ if (r) {
+ drm_err(&adev->ddev, "discovery failed: %d\n", r);
+ return r;
+ }
+
+ amdgpu_discovery_harvest_ip(adev);
+ amdgpu_discovery_get_gfx_info(adev);
+ amdgpu_discovery_get_mall_info(adev);
+ amdgpu_discovery_get_vcn_info(adev);
break;
}
--
2.50.1
Make sure to drop the references taken to the PMC OF node and device by
of_parse_phandle() and of_find_device_by_node() during probe.
Note the holding a reference to the PMC device does not prevent the
PMC regmap from going away (e.g. if the PMC driver is unbound) so there
is no need to keep the reference.
Fixes: 2d1021487273 ("phy: tegra: xusb: Add wake/sleepwalk for Tegra210")
Cc: stable(a)vger.kernel.org # 5.14
Cc: JC Kuo <jckuo(a)nvidia.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/phy/tegra/xusb-tegra210.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/tegra/xusb-tegra210.c b/drivers/phy/tegra/xusb-tegra210.c
index ebc8a7e21a31..3409924498e9 100644
--- a/drivers/phy/tegra/xusb-tegra210.c
+++ b/drivers/phy/tegra/xusb-tegra210.c
@@ -3164,18 +3164,22 @@ tegra210_xusb_padctl_probe(struct device *dev,
}
pdev = of_find_device_by_node(np);
+ of_node_put(np);
if (!pdev) {
dev_warn(dev, "PMC device is not available\n");
goto out;
}
- if (!platform_get_drvdata(pdev))
+ if (!platform_get_drvdata(pdev)) {
+ put_device(&pdev->dev);
return ERR_PTR(-EPROBE_DEFER);
+ }
padctl->regmap = dev_get_regmap(&pdev->dev, "usb_sleepwalk");
if (!padctl->regmap)
dev_info(dev, "failed to find PMC regmap\n");
+ put_device(&pdev->dev);
out:
return &padctl->base;
}
--
2.49.1
A regression in output polling was introduced by commit 4ad8d57d902fbc7c82507cfc1b031f3a07c3de6e
("drm: Check output polling initialized before disabling") in the 6.1.y stable tree.
As a result, when the i915 driver detects an HPD IRQ storm and attempts to switch
from IRQ-based hotplug detection to polling, output polling fails to resume.
The root cause is the use of dev->mode_config.poll_running. Once poll_running is set
(during the first connector detection) the calls to drm_kms_helper_poll_enable(), such as
intel_hpd_irq_storm_switch_to_polling() fails to schedule output_poll_work as expected.
Therefore, after an IRQ storm disables HPD IRQs, polling does not start, breaking hotplug detection.
The fix is to remove the dev->mode_config.poll_running in the check condition, ensuring polling
is always scheduled as requested.
Notes:
Initial analysis, assumptions, device testing details, the correct fix and detailed rationale
were discussed here https://lore.kernel.org/stable/aI32HUzrT95nS_H9@ideak-desk/
Cc: stable(a)vger.kernel.org # 6.1.y
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: Shradha Gupta <shradhagupta(a)linux.microsoft.com>
Suggested-by: Imre Deak <imre.deak(a)intel.com>
Signed-off-by: Nicusor Huhulea <nicusor.huhulea(a)siemens.com>
---
drivers/gpu/drm/drm_probe_helper.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c
index 0e5eadc6d44d..a515b78f839e 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -250,7 +250,7 @@ void drm_kms_helper_poll_enable(struct drm_device *dev)
unsigned long delay = DRM_OUTPUT_POLL_PERIOD;
if (drm_WARN_ON_ONCE(dev, !dev->mode_config.poll_enabled) ||
- !drm_kms_helper_poll || dev->mode_config.poll_running)
+ !drm_kms_helper_poll)
return;
drm_connector_list_iter_begin(dev, &conn_iter);
--
2.39.2
In cdx_rpmsg_probe(), strscpy() is incorrectly called with the length of
the source string (excluding the NUL terminator) rather than the size of
the destination buffer. This results in one character less being copied
from 'cdx_rpmsg_id_table[0].name' to 'chinfo.name'.
Use the destination buffer size instead to ensure the name is copied
correctly.
Cc: stable(a)vger.kernel.org
Fixes: 2a226927d9b8 ("cdx: add rpmsg communication channel for CDX")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/cdx/controller/cdx_rpmsg.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cdx/controller/cdx_rpmsg.c b/drivers/cdx/controller/cdx_rpmsg.c
index 04b578a0be17..61f1a290ff08 100644
--- a/drivers/cdx/controller/cdx_rpmsg.c
+++ b/drivers/cdx/controller/cdx_rpmsg.c
@@ -129,8 +129,7 @@ static int cdx_rpmsg_probe(struct rpmsg_device *rpdev)
chinfo.src = RPMSG_ADDR_ANY;
chinfo.dst = rpdev->dst;
- strscpy(chinfo.name, cdx_rpmsg_id_table[0].name,
- strlen(cdx_rpmsg_id_table[0].name));
+ strscpy(chinfo.name, cdx_rpmsg_id_table[0].name, sizeof(chinfo.name));
cdx_mcdi->ept = rpmsg_create_ept(rpdev, cdx_rpmsg_cb, NULL, chinfo);
if (!cdx_mcdi->ept) {
--
2.50.1
Recently, we encountered the following hungtask:
INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/11:2 D 0 2981147 2 0x80004000
Workqueue: cgroup_destroy css_free_rwork_fn
Call Trace:
__schedule+0x934/0xe10
schedule+0x40/0xb0
wb_wait_for_completion+0x52/0x80
? finish_wait+0x80/0x80
mem_cgroup_css_free+0x3a/0x1b0
css_free_rwork_fn+0x42/0x380
process_one_work+0x1a2/0x360
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x110/0x130
? __kthread_cancel_work+0x40/0x40
ret_from_fork+0x1f/0x30
This is because the writeback thread has been continuously and repeatedly
throttled by wbt, but at the same time, the writes of another thread
proceed quite smoothly.
After debugging, I believe it is caused by the following reasons.
When thread A is blocked by wbt, the I/O issued by thread B will
use a deeper queue depth(rwb->rq_depth.max_depth) because it
meets the conditions of wb_recent_wait(), thus allowing thread B's
I/O to be issued smoothly and resulting in the inflight I/O of wbt
remaining relatively high.
However, when I/O completes, due to the high inflight I/O of wbt,
the condition "limit - inflight >= rwb->wb_background / 2"
in wbt_rqw_done() cannot be satisfied, causing thread A's I/O
to remain unable to be woken up.
Some on-site information:
>>> rwb.rq_depth.max_depth
(unsigned int)48
>>> rqw.inflight.counter.value_()
44
>>> rqw.inflight.counter.value_()
35
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)3
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)2
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)20
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)12
cat wb_normal
24
cat wb_background
12
To fix this issue, we can use max_depth in wbt_rqw_done(), so that
the handling of wb_recent_wait by wbt_rqw_done() and get_limit()
will also be consistent, which is more reasonable.
Signed-off-by: Julian Sun <sunjunchao(a)bytedance.com>
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
---
block/blk-wbt.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a50d4cd55f41..d6a2782d442f 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
!wb_recent_wait(rwb))
limit = 0;
+ else if (wb_recent_wait(rwb))
+ limit = rwb->rq_depth.max_depth;
else
limit = rwb->wb_normal;
--
2.20.1
Using device_find_child() and of_find_device_by_node() to locate
devices could cause an imbalance in the device's reference count.
device_find_child() and of_find_device_by_node() both call
get_device() to increment the reference count of the found device
before returning the pointer. In mtk_drm_get_all_drm_priv(), these
references are never released through put_device(), resulting in
permanent reference count increments. Additionally, the
for_each_child_of_node() iterator fails to release node references in
all code paths. This leaks device node references when loop
termination occurs before reaching MAX_CRTC. These reference count
leaks may prevent device/node resources from being properly released
during driver unbind operations.
As comment of device_find_child() says, 'NOTE: you will need to drop
the reference with put_device() after use'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 1ef7ed48356c ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/mediatek/mtk_drm_drv.c | 27 +++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 7c0c12dde488..c78186debd3e 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -388,19 +388,24 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev)
of_id = of_match_node(mtk_drm_of_ids, node);
if (!of_id)
- continue;
+ goto next;
pdev = of_find_device_by_node(node);
if (!pdev)
- continue;
+ goto next;
drm_dev = device_find_child(&pdev->dev, NULL, mtk_drm_match);
- if (!drm_dev)
- continue;
+ if (!drm_dev) {
+ put_device(&pdev->dev);
+ goto next;
+ }
temp_drm_priv = dev_get_drvdata(drm_dev);
- if (!temp_drm_priv)
- continue;
+ if (!temp_drm_priv) {
+ put_device(drm_dev);
+ put_device(&pdev->dev);
+ goto next;
+ }
if (temp_drm_priv->data->main_len)
all_drm_priv[CRTC_MAIN] = temp_drm_priv;
@@ -412,10 +417,14 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev)
if (temp_drm_priv->mtk_drm_bound)
cnt++;
- if (cnt == MAX_CRTC) {
- of_node_put(node);
+ put_device(drm_dev);
+ put_device(&pdev->dev);
+
+next:
+ of_node_put(node);
+
+ if (cnt == MAX_CRTC)
break;
- }
}
if (drm_priv->data->mmsys_dev_num == cnt) {
--
2.25.1
Hi,
Please cherry-pick following patch to 6.12:
541a137254c71 accel/ivpu: Fix reset_engine debugfs file logic
It fixes a small regression introduced in:
0c3fa6e8441b1 accel/ivpu: Remove copy engine support
Thanks,
Jacek
Userspace generally expects APIs that return -EMSGSIZE to allow for them
to adjust their buffer size and retry the operation. However, the
fscontext log would previously clear the message even in the -EMSGSIZE
case.
Given that it is very cheap for us to check whether the buffer is too
small before we remove the message from the ring buffer, let's just do
that instead. While we're at it, refactor some fscontext_read() into a
separate helper to make the ring buffer logic a bit easier to read.
Fixes: 007ec26cdc9f ("vfs: Implement logging through fs_context")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v2:
- Refactor message fetching to fetch_message_locked() which returns
ERR_PTR() in error cases. [Al Viro]
- v1: <https://lore.kernel.org/r/20250806-fscontext-log-cleanups-v1-0-880597d42a5a…>
---
Aleksa Sarai (2):
fscontext: do not consume log entries when returning -EMSGSIZE
selftests/filesystems: add basic fscontext log tests
fs/fsopen.c | 54 +++++-----
tools/testing/selftests/filesystems/.gitignore | 1 +
tools/testing/selftests/filesystems/Makefile | 2 +-
tools/testing/selftests/filesystems/fclog.c | 135 +++++++++++++++++++++++++
4 files changed, 167 insertions(+), 25 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250806-fscontext-log-cleanups-50f0143674ae
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
In __hdmi_lpe_audio_probe(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in one character less
being copied from 'card->shortname' to 'pcm->name'.
Use the destination buffer size instead to ensure the card name is
copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: 75b1a8f9d62e ("ALSA: Convert strlcpy to strscpy when return value is unused")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
Changes in v2:
- Use three parameter variant of strscpy() for backporting as suggested
by Sakari Ailus <sakari.ailus(a)linux.intel.com>
- Link to v1: https://lore.kernel.org/lkml/20250805190809.31150-1-thorsten.blum@linux.dev/
---
sound/x86/intel_hdmi_audio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/x86/intel_hdmi_audio.c b/sound/x86/intel_hdmi_audio.c
index cc54539c6030..01f49555c5f6 100644
--- a/sound/x86/intel_hdmi_audio.c
+++ b/sound/x86/intel_hdmi_audio.c
@@ -1765,7 +1765,7 @@ static int __hdmi_lpe_audio_probe(struct platform_device *pdev)
/* setup private data which can be retrieved when required */
pcm->private_data = ctx;
pcm->info_flags = 0;
- strscpy(pcm->name, card->shortname, strlen(card->shortname));
+ strscpy(pcm->name, card->shortname, sizeof(pcm->name));
/* setup the ops for playback */
snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, &had_pcm_ops);
--
2.50.1
Userspace generally expects APIs that return EMSGSIZE to allow for them
to adjust their buffer size and retry the operation. However, the
fscontext log would previously clear the message even in the EMSGSIZE
case.
Given that it is very cheap for us to check whether the buffer is too
small before we remove the message from the ring buffer, let's just do
that instead.
Fixes: 007ec26cdc9f ("vfs: Implement logging through fs_context")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Aleksa Sarai (2):
fscontext: do not consume log entries for -EMSGSIZE case
selftests/filesystems: add basic fscontext log tests
fs/fsopen.c | 22 ++--
tools/testing/selftests/filesystems/.gitignore | 1 +
tools/testing/selftests/filesystems/Makefile | 2 +-
tools/testing/selftests/filesystems/fclog.c | 137 +++++++++++++++++++++++++
4 files changed, 153 insertions(+), 9 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250806-fscontext-log-cleanups-50f0143674ae
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
Changes in v2:
- Use three parameter variant of strscpy() for easier backporting
- Link to v1: https://lore.kernel.org/lkml/20250805221424.57890-1-thorsten.blum@linux.dev/
---
fs/smb/server/smb_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
--
2.50.1
replace_fd() returns the number of the new file descriptor through the
return value of do_dup2(). However its callers never care about the
specific number. In fact the caller in receive_fd_replace() treats any
non-zero return value as an error and therefore never calls
__receive_sock() for most file descriptors, which is a bug.
To fix the bug in receive_fd_replace() and to avoid the same issue
happening in future callers, signal success through a plain zero.
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Link: https://lore.kernel.org/lkml/20250801220215.GS222315@ZenIV/
Fixes: 173817151b15 ("fs: Expand __receive_fd() to accept existing fd")
Fixes: 42eb0d54c08a ("fs: split receive_fd_replace from __receive_fd")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Move the fix to replace_fd() (Al)
- Link to v1: https://lore.kernel.org/r/20250801-fix-receive_fd_replace-v1-1-d46d600c74d6…
---
Untested, it stuck out while reading the code.
---
fs/file.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/file.c b/fs/file.c
index 6d2275c3be9c6967d16c75d1b6521f9b58980926..f8a271265913951d755a5db559938d589219c4f2 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1330,7 +1330,10 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
err = expand_files(files, fd);
if (unlikely(err < 0))
goto out_unlock;
- return do_dup2(files, file, fd, flags);
+ err = do_dup2(files, file, fd, flags);
+ if (err < 0)
+ goto out_unlock;
+ err = 0;
out_unlock:
spin_unlock(&files->file_lock);
---
base-commit: d2eedaa3909be9102d648a4a0a50ccf64f96c54f
change-id: 20250801-fix-receive_fd_replace-7fdd5ce6532d
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Since 'extension' is a fixed-size buffer, we can safely omit the size
argument and let strscpy() infer it using sizeof(). This ensures that
the string "___" (three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
fs/smb/server/smb_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..92c0562283da 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___");
} else {
if (p) {
p++;
--
2.50.1
The quilt patch titled
Subject: mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
has been removed from the -mm tree. Its filename was
mm-kmemleak-avoid-soft-lockup-in-__kmemleak_do_cleanup.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
Date: Mon, 28 Jul 2025 15:02:48 -0400
A soft lockup warning was observed on a relative small system x86-64
system with 16 GB of memory when running a debug kernel with kmemleak
enabled.
watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134]
The test system was running a workload with hot unplug happening in
parallel. Then kemleak decided to disable itself due to its inability to
allocate more kmemleak objects. The debug kernel has its
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000.
The soft lockup happened in kmemleak_do_cleanup() when the existing
kmemleak objects were being removed and deleted one-by-one in a loop via a
workqueue. In this particular case, there are at least 40,000 objects
that need to be processed and given the slowness of a debug kernel and the
fact that a raw_spinlock has to be acquired and released in
__delete_object(), it could take a while to properly handle all these
objects.
As kmemleak has been disabled in this case, the object removal and
deletion process can be further optimized as locking isn't really needed.
However, it is probably not worth the effort to optimize for such an edge
case that should rarely happen. So the simple solution is to call
cond_resched() at periodic interval in the iteration loop to avoid soft
lockup.
Link: https://lkml.kernel.org/r/20250728190248.605750-1-longman@redhat.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-soft-lockup-in-__kmemleak_do_cleanup
+++ a/mm/kmemleak.c
@@ -2184,6 +2184,7 @@ static const struct file_operations kmem
static void __kmemleak_do_cleanup(void)
{
struct kmemleak_object *object, *tmp;
+ unsigned int cnt = 0;
/*
* Kmemleak has already been disabled, no need for RCU list traversal
@@ -2192,6 +2193,10 @@ static void __kmemleak_do_cleanup(void)
list_for_each_entry_safe(object, tmp, &object_list, object_list) {
__remove_object(object);
__delete_object(object);
+
+ /* Call cond_resched() once per 64 iterations to avoid soft lockup */
+ if (!(++cnt & 0x3f))
+ cond_resched();
}
}
_
Patches currently in -mm which might be from longman(a)redhat.com are
The quilt patch titled
Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
has been removed from the -mm tree. Its filename was
mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Breno Leitao <leitao(a)debian.org>
Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
Date: Thu, 31 Jul 2025 02:57:18 -0700
When netpoll is enabled, calling pr_warn_once() while holding
kmemleak_lock in mem_pool_alloc() can cause a deadlock due to lock
inversion with the netconsole subsystem. This occurs because
pr_warn_once() may trigger netpoll, which eventually leads to
__alloc_skb() and back into kmemleak code, attempting to reacquire
kmemleak_lock.
This is the path for the deadlock.
mem_pool_alloc()
-> raw_spin_lock_irqsave(&kmemleak_lock, flags);
-> pr_warn_once()
-> netconsole subsystem
-> netpoll
-> __alloc_skb
-> __create_object
-> raw_spin_lock_irqsave(&kmemleak_lock, flags);
Fix this by setting a flag and issuing the pr_warn_once() after
kmemleak_lock is released.
Link: https://lkml.kernel.org/r/20250731-kmemleak_lock-v1-1-728fd470198f@debian.o…
Fixes: c5665868183f ("mm: kmemleak: use the memory pool for early allocations")
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reported-by: Jakub Kicinski <kuba(a)kernel.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock
+++ a/mm/kmemleak.c
@@ -470,6 +470,7 @@ static struct kmemleak_object *mem_pool_
{
unsigned long flags;
struct kmemleak_object *object;
+ bool warn = false;
/* try the slab allocator first */
if (object_cache) {
@@ -488,8 +489,10 @@ static struct kmemleak_object *mem_pool_
else if (mem_pool_free_count)
object = &mem_pool[--mem_pool_free_count];
else
- pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
+ warn = true;
raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
+ if (warn)
+ pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
return object;
}
_
Patches currently in -mm which might be from leitao(a)debian.org are
The quilt patch titled
Subject: kasan/test: fix protection against compiler elision
has been removed from the -mm tree. Its filename was
kasan-test-fix-protection-against-compiler-elision.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: kasan/test: fix protection against compiler elision
Date: Mon, 28 Jul 2025 22:11:54 +0200
The kunit test is using assignments to
"static volatile void *kasan_ptr_result" to prevent elision of memory
loads, but that's not working:
In this variable definition, the "volatile" applies to the "void", not to
the pointer.
To make "volatile" apply to the pointer as intended, it must follow
after the "*".
This makes the kasan_memchr test pass again on my system. The
kasan_strings test is still failing because all the definitions of
load_unaligned_zeropad() are lacking explicit instrumentation hooks and
ASAN does not instrument asm() memory operands.
Link: https://lkml.kernel.org/r/20250728-kasan-kunit-fix-volatile-v1-1-e7157c9af8…
Fixes: 5f1c8108e7ad ("mm:kasan: fix sparse warnings: Should it be static?")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Nihar Chaithanya <niharchaithanya(a)gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan_test_c.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kasan/kasan_test_c.c~kasan-test-fix-protection-against-compiler-elision
+++ a/mm/kasan/kasan_test_c.c
@@ -47,7 +47,7 @@ static struct {
* Some tests use these global variables to store return values from function
* calls that could otherwise be eliminated by the compiler as dead code.
*/
-static volatile void *kasan_ptr_result;
+static void *volatile kasan_ptr_result;
static volatile int kasan_int_result;
/* Probe for console output: obtains test_status lines of interest. */
_
Patches currently in -mm which might be from jannh(a)google.com are
kasan-add-test-for-slab_typesafe_by_rcu-quarantine-skipping.patch
kasan-add-test-for-slab_typesafe_by_rcu-quarantine-skipping-v2.patch
From: Chen-Yu Tsai <wens(a)csie.org>
[ Upstream commit 88828c7e940dd45d139ad4a39d702b23840a37c5 ]
On newer boards featuring the A523 SoC, the AXP323 (related to the
AXP313) is paired with the AXP717 and serves as a secondary PMIC
providing additional regulator outputs. However the MFD cells are all
registered with PLATFORM_DEVID_NONE, which causes the regulator cells
to conflict with each other.
Commit e37ec3218870 ("mfd: axp20x: Allow multiple regulators") attempted
to fix this by switching to PLATFORM_DEVID_AUTO so that the device names
would all be different, however that broke IIO channel mapping, which is
also tied to the device names. As a result the change was later reverted.
Instead, here we attempt to make sure the AXP313/AXP323 regulator cell
does not conflict by explicitly giving it an ID number. This was
previously done for the AXP809+AXP806 pair used with the A80 SoC.
Signed-off-by: Chen-Yu Tsai <wens(a)csie.org>
Link: https://lore.kernel.org/r/20250619173207.3367126-1-wens@kernel.org
Signed-off-by: Lee Jones <lee(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
Based on my analysis, here's my assessment:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Fixes a real bug**: The commit addresses a device naming conflict
issue when using AXP313/AXP323 PMICs alongside AXP717 as a secondary
PMIC. Without this fix, the kernel produces a sysfs duplicate
filename error and fails to properly register the secondary regulator
device.
2. **Small and contained change**: The fix is minimal - it only changes
one line of code from `MFD_CELL_NAME("axp20x-regulator")` to
`MFD_CELL_BASIC("axp20x-regulator", NULL, NULL, 0, 1)`, which
explicitly sets an ID of 1 for the AXP313 regulator cell.
3. **Follows established pattern**: The commit follows an existing
pattern already used in the same driver for the AXP806 PMIC (lines
1173-1174 in axp806_cells), which also sets an explicit ID (2) to
avoid conflicts when paired with AXP809.
4. **Minimal risk of regression**: The change only affects AXP313/AXP323
devices and doesn't touch other PMIC configurations. The explicit ID
assignment is a safe approach that doesn't break existing IIO channel
mappings (which was the problem with the previous PLATFORM_DEVID_AUTO
approach mentioned in the commit message).
5. **Clear problem and solution**: The commit message clearly explains
the issue (sysfs duplicate filename error) and references the history
of previous attempts to fix similar issues (commit e37ec3218870 and
its revert). The solution is targeted and doesn't introduce
architectural changes.
6. **Hardware enablement fix**: This fix enables proper functioning of
boards with the A523 SoC that use dual PMIC configurations (AXP323 +
AXP717), which would otherwise fail to initialize properly.
The commit meets the stable tree criteria of being an important bugfix
with minimal risk and contained scope. It fixes a specific hardware
configuration issue without introducing new features or making broad
architectural changes.
drivers/mfd/axp20x.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/mfd/axp20x.c b/drivers/mfd/axp20x.c
index e9914e8a29a3..25c639b348cd 100644
--- a/drivers/mfd/axp20x.c
+++ b/drivers/mfd/axp20x.c
@@ -1053,7 +1053,8 @@ static const struct mfd_cell axp152_cells[] = {
};
static struct mfd_cell axp313a_cells[] = {
- MFD_CELL_NAME("axp20x-regulator"),
+ /* AXP323 is sometimes paired with AXP717 as sub-PMIC */
+ MFD_CELL_BASIC("axp20x-regulator", NULL, NULL, 0, 1),
MFD_CELL_RES("axp313a-pek", axp313a_pek_resources),
};
--
2.39.5
This patch series resolves a sleeping function called from invalid context
bug that occurs when fuzzing USB with syzkaller on a PREEMPT_RT kernel.
The regression was introduced by the interaction of two separate patches:
one that made kcov's internal locks sleep on PREEMPT_RT for better latency
(d5d2c51f1e5f), and another that wrapped a kcov call in the USB softirq
path with local_irq_save() to prevent re-entrancy (f85d39dd7ed8).
This combination resulted in an attempt to acquire a sleeping lock from
within an atomic context, causing a kernel BUG.
To resolve this, this series makes the kcov remote path fully compatible
with atomic contexts by converting all its internal locking primitives to
non-sleeping variants. This approach is more robust than conditional
compilation as it creates a single, unified codebase that works correctly
on both RT and non-RT kernels.
The series is structured as follows:
Patch 1 converts the global kcov locks (kcov->lock and kcov_remote_lock)
to use the non-sleeping raw_spinlock_t.
Patch 2 replace the PREEMPT_RT-specific per-CPU local_lock_t back to the
original local_irq_save/restore primitives, making the per-CPU protection
non-sleeping as well.
Patches 3 and 4 are preparatory refactoring. They move the memory
allocation for remote handles out of the locked sections in the
KCOV_REMOTE_ENABLE ioctl path, which is a prerequisite for safely
using raw_spinlock_t as it forbids sleeping functions like kmalloc
within its critical section.
With these changes, I have been able to run syzkaller fuzzing on a
PREEMPT_RT kernel for a full day with no issues reported.
Reproduction details in here.
Link: https://lore.kernel.org/all/20250725201400.1078395-2-ysk@kzalloc.com/t/#u
Signed-off-by: Yunseong Kim <ysk(a)kzalloc.com>
---
Changes from v2:
1. Updated kcov_remote_reset() to use raw_spin_lock_irqsave() /
raw_spin_unlock_irqrestore() instead of raw_spin_lock() /
raw_spin_unlock(), following the interrupt disabling pattern
used in the original function that guard kcov_remote_lock.
Changes from v1:
1. Dropped the #ifdef-based PREEMPT_RT branching.
2. Convert kcov->lock and kcov_remote_lock from spinlock_t to
raw_spinlock_t. This ensures they remain true, non-sleeping
spinlocks even on PREEMPT_RT kernels.
3. Remove the local_lock_t protection for kcov_percpu_data in
kcov_remote_start/stop(). Since local_lock_t can also sleep under
RT, and the required protection is against local interrupts when
accessing per-CPU data, it is replaced with explicit
local_irq_save/restore().
4. Refactor the KCOV_REMOTE_ENABLE path to move memory allocations
out of the critical section.
5. Modify the ioctl handling logic to utilize these pre-allocated
structures within the critical section. kcov_remote_add() is
modified to accept a pre-allocated structure instead of allocating
one internally. All necessary struct kcov_remote structures are now
pre-allocated individually in kcov_ioctl() using GFP_KERNEL
(allowing sleep) before acquiring the raw spinlocks.
Changes from v0:
1. On PREEMPT_RT, separated the handling of
kcov_remote_start_usb_softirq() and kcov_remote_stop_usb_softirq()
to allow sleeping when entering kcov_remote_start_usb() /
kcov_remote_stop().
Yunseong Kim (4):
kcov: Use raw_spinlock_t for kcov->lock and kcov_remote_lock
kcov: Replace per-CPU local_lock with local_irq_save/restore
kcov: Separate KCOV_REMOTE_ENABLE ioctl helper function
kcov: move remote handle allocation outside raw spinlock
kernel/kcov.c | 248 +++++++++++++++++++++++++++-----------------------
1 file changed, 134 insertions(+), 114 deletions(-)
base-commit: 186f3edfdd41f2ae87fc40a9ccba52a3bf930994
--
2.50.0
Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"),
we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
size_t size)
[...]
/* Calculate the number of bytes we need to push, for this page
* specifically */
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
/* If we can't splice it, then copy it in, as normal */
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
/* Set the bvec pointing to the page, with len $bytes */
bvec_set_page(&bvec, page[i], bytes, offset);
/* Set the iter to $size, aka the size of the whole sendpages (!!!) */
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
try_page_again:
lock_sock(sk);
/* Sendmsg with $size size (!!!) */
rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls
for a while. This has a been a benign bug because sendpage_ok() always
returned true. With the recent slab allocator changes being slowly
introduced into next (that disallow sendpage on large kmalloc
allocations), we have recently hit out-of-bounds crashes, due to slight
differences in iov_iter behavior between the MSG_SPLICE_PAGES and
"regular" copy paths:
(MSG_SPLICE_PAGES)
skb_splice_from_iter
iov_iter_extract_pages
iov_iter_extract_bvec_pages
uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere
skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES)
skb_copy_to_page_nocache copy=iov_iter_count
[...]
copy_from_iter
/* this doesn't help */
if (unlikely(iter->count < len))
len = iter->count;
iterate_bvec
... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the
correct byte count to tcp_sendmsg_locked.
Cc: stable(a)vger.kernel.org
Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Pedro Falcato <pfalcato(a)suse.de>
---
v2:
- Add David Howells's Rb on the original patch
- Remove the offset increment, since it's dead code
drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 3a08f57d2211..f7dd32c6e5ba 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
bvec_set_page(&bvec, page[i], bytes, offset);
- iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again:
lock_sock(sk);
- rv = tcp_sendmsg_locked(sk, &msg, size);
+ rv = tcp_sendmsg_locked(sk, &msg, bytes);
release_sock(sk);
if (rv > 0) {
size -= rv;
sent += rv;
if (rv != bytes) {
- offset += rv;
bytes -= rv;
goto try_page_again;
}
--
2.50.1
When operating on struct vhost_net_ubuf_ref, the following execution
sequence is theoretically possible:
CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
// ubufs->refcount == 2
vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
vhost_net_ubuf_put_and_wait()
vhost_net_ubuf_put()
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 1
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 0
wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
// no wait occurs here because condition is already true
kfree(ubufs);
if (unlikely(!r))
wake_up(&ubufs->wait); // use-after-free
This leads to use-after-free on ubufs access. This happens because CPU1
skips waiting for wake_up() when refcount is already zero.
To prevent that use a read-side RCU critical section in vhost_net_ubuf_put(),
as suggested by Hillf Danton. For this lock to take effect, free ubufs with
kfree_rcu().
Cc: stable(a)vger.kernel.org
Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock")
Reported-by: Andrey Ryabinin <arbn(a)yandex-team.com>
Suggested-by: Hillf Danton <hdanton(a)sina.com>
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
v2:
* move reinit_completion() into vhost_net_flush(), thanks
to Hillf Danton
* add Tested-by: Lei Yang
* check that usages of put_and_wait() are consistent across
LTS kernels
v3:
* use rcu_read_lock() with kfree_rcu() instead of completion,
as suggested by Hillf Danton
drivers/vhost/net.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 6edac0c1ba9b..c6508fe0d5c8 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -99,6 +99,7 @@ struct vhost_net_ubuf_ref {
atomic_t refcount;
wait_queue_head_t wait;
struct vhost_virtqueue *vq;
+ struct rcu_head rcu;
};
#define VHOST_NET_BATCH 64
@@ -250,9 +251,13 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
{
- int r = atomic_sub_return(1, &ubufs->refcount);
+ int r;
+
+ rcu_read_lock();
+ r = atomic_sub_return(1, &ubufs->refcount);
if (unlikely(!r))
wake_up(&ubufs->wait);
+ rcu_read_unlock();
return r;
}
@@ -265,7 +270,7 @@ static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
{
vhost_net_ubuf_put_and_wait(ubufs);
- kfree(ubufs);
+ kfree_rcu(ubufs, rcu);
}
static void vhost_net_clear_ubuf_info(struct vhost_net *n)
--
2.34.1
Commit 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
allowed for newer ASICs to mix GTT and VRAM, this change also noted that
some older boards, such as Stoney and Carrizo do not support this.
It appears that at least one additional ASIC does not support this which
is Raven.
We observed this issue when migrating a device from a 5.4 to 6.6 kernel
and have confirmed that Raven also needs to be excluded from mixing GTT
and VRAM.
Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Cc: Luben Tuikov <luben.tuikov(a)amd.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.1+
Tested-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
Signed-off-by: Brian Geffon <bgeffon(a)google.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 73403744331a..5d7f13e25b7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1545,7 +1545,8 @@ uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev,
uint32_t domain)
{
if ((domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) &&
- ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY))) {
+ ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY) ||
+ (adev->asic_type == CHIP_RAVEN))) {
domain = AMDGPU_GEM_DOMAIN_VRAM;
if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD)
domain = AMDGPU_GEM_DOMAIN_GTT;
--
2.50.0.727.gbf7dc18ff4-goog
When operating on struct vhost_net_ubuf_ref, the following execution
sequence is theoretically possible:
CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
// &ubufs->refcount == 2
vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
vhost_net_ubuf_put_and_wait()
vhost_net_ubuf_put()
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 1
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 0
wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
// no wait occurs here because condition is already true
kfree(ubufs);
if (unlikely(!r))
wake_up(&ubufs->wait); // use-after-free
This leads to use-after-free on ubufs access. This happens because CPU1
skips waiting for wake_up() when refcount is already zero.
To prevent that use a completion instead of wait_queue as the ubufs
notification mechanism. wait_for_completion() guarantees that there will
be complete() call prior to its return.
We also need to reinit completion in vhost_net_flush(), because
refcnt == 0 does not mean freeing in that case.
Cc: stable(a)vger.kernel.org
Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock")
Reported-by: Andrey Ryabinin <arbn(a)yandex-team.com>
Suggested-by: Andrey Smetanin <asmetanin(a)yandex-team.ru>
Suggested-by: Hillf Danton <hdanton(a)sina.com>
Tested-by: Lei Yang <leiyang(a)redhat.com> (v1)
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
v2:
* move reinit_completion() into vhost_net_flush(), thanks
to Hillf Danton
* add Tested-by: Lei Yang
* check that usages of put_and_wait() are consistent across
LTS kernels
drivers/vhost/net.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7cbfc7d718b3..69e1bfb9627e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
* >1: outstanding ubufs
*/
atomic_t refcount;
- wait_queue_head_t wait;
+ struct completion wait;
struct vhost_virtqueue *vq;
};
@@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
if (!ubufs)
return ERR_PTR(-ENOMEM);
atomic_set(&ubufs->refcount, 1);
- init_waitqueue_head(&ubufs->wait);
+ init_completion(&ubufs->wait);
ubufs->vq = vq;
return ubufs;
}
@@ -249,14 +249,14 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
{
int r = atomic_sub_return(1, &ubufs->refcount);
if (unlikely(!r))
- wake_up(&ubufs->wait);
+ complete_all(&ubufs->wait);
return r;
}
static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
{
vhost_net_ubuf_put(ubufs);
- wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
+ wait_for_completion(&ubufs->wait);
}
static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
@@ -1381,6 +1381,7 @@ static void vhost_net_flush(struct vhost_net *n)
mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
n->tx_flush = false;
atomic_set(&n->vqs[VHOST_NET_VQ_TX].ubufs->refcount, 1);
+ reinit_completion(&n->vqs[VHOST_NET_VQ_TX].ubufs->wait);
mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
}
}
--
2.34.1
Starting with Rust 1.91.0 (expected 2025-10-30), `rustdoc` has improved
some false negatives around intra-doc links [1], and it found a broken
intra-doc link we currently have:
error: unresolved link to `include/linux/device/faux.h`
--> rust/kernel/faux.rs:7:17
|
7 | //! C header: [`include/linux/device/faux.h`]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ no item named `include/linux/device/faux.h` in scope
|
= help: to escape `[` and `]` characters, add '\' before them like `\[` or `\]`
= note: `-D rustdoc::broken-intra-doc-links` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(rustdoc::broken_intra_doc_links)]`
Our `srctree/` C header links are not intra-doc links, thus they need
the link destination.
Thus fix it.
Cc: stable(a)vger.kernel.org
Link: https://github.com/rust-lang/rust/pull/132748 [1]
Fixes: 78418f300d39 ("rust/kernel: Add faux device bindings")
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
It may have been in 1.90, but the beta branch does not have it, and the
rollup PR says 1.91, unlike the PR itself, so I picked 1.91. It happened
just after the version bump to 1.91, so it may have to do with that.
rust/kernel/faux.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/rust/kernel/faux.rs b/rust/kernel/faux.rs
index 7a906099993f..7fe2dd197e37 100644
--- a/rust/kernel/faux.rs
+++ b/rust/kernel/faux.rs
@@ -4,7 +4,7 @@
//!
//! This module provides bindings for working with faux devices in kernel modules.
//!
-//! C header: [`include/linux/device/faux.h`]
+//! C header: [`include/linux/device/faux.h`](srctree/include/linux/device/faux.h)
use crate::{bindings, device, error::code::*, prelude::*};
use core::ptr::{addr_of_mut, null, null_mut, NonNull};
base-commit: d2eedaa3909be9102d648a4a0a50ccf64f96c54f
--
2.50.1
The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF,
is determined by an asynchronous kthread. This can create a race
condition where the kthread finishes after the vDSO data has
already been populated, causing userspace to read stale values.
To fix this race, a new 'ready' flag is added to the vDSO data,
initialized to 'false' during late_initcall. This flag is checked
by both the vDSO's user-space code and the riscv_hwprobe syscall.
The syscall serves as a one-time gate, using a completion to wait
for any pending probes before populating the data and setting the
flag to 'true', thus ensuring userspace reads fresh values on its
first request.
Reported-by: Tsukasa OI <research_trasio(a)irq.a4lg.com>
Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@ir…
Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe")
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Cc: Olof Johansson <olof(a)lixom.net>
Cc: stable(a)vger.kernel.org
Co-developed-by: Palmer Dabbelt <palmer(a)dabbelt.com>
Signed-off-by: Jingwei Wang <wangjingwei(a)iscas.ac.cn>
---
Changes in v6:
- Based on Palmer's feedback, reworked the synchronization to be on-demand,
deferring the wait until the first hwprobe syscall via a 'ready' flag.
This avoids the boot-time regression from v5's approach.
Changes in v5:
- Reworked the synchronization logic to a robust "sentinel-count"
pattern based on feedback from Alexandre.
- Fixed a "multiple definition" linker error for nommu builds by changing
the header-file stub functions to `static inline`, as pointed out by Olof.
- Updated the commit message to better explain the rationale for moving
the vDSO initialization to `late_initcall`.
Changes in v4:
- Reworked the synchronization mechanism based on feedback from Palmer
and Alexandre.
- Instead of a post-hoc refresh, this version introduces a robust
completion-based framework using an atomic counter to ensure async
probes are finished before populating the vDSO.
- Moved the vdso data initialization to a late_initcall to avoid
impacting boot time.
Changes in v3:
- Retained existing blank line.
Changes in v2:
- Addressed feedback from Yixun's regarding #ifdef CONFIG_MMU usage.
- Updated commit message to provide a high-level summary.
- Added Fixes tag for commit e7c9d66e313b.
v1: https://lore.kernel.org/linux-riscv/20250521052754.185231-1-wangjingwei@isc…
arch/riscv/include/asm/hwprobe.h | 8 ++-
arch/riscv/include/asm/vdso/arch_data.h | 6 ++
arch/riscv/kernel/sys_hwprobe.c | 71 ++++++++++++++++++----
arch/riscv/kernel/unaligned_access_speed.c | 9 ++-
arch/riscv/kernel/vdso/hwprobe.c | 2 +-
5 files changed, 79 insertions(+), 17 deletions(-)
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 7fe0a379474ae2c6..3b2888126e659ea1 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -40,5 +40,11 @@ static inline bool riscv_hwprobe_pair_cmp(struct riscv_hwprobe *pair,
return pair->value == other_pair->value;
}
-
+#ifdef CONFIG_MMU
+void riscv_hwprobe_register_async_probe(void);
+void riscv_hwprobe_complete_async_probe(void);
+#else
+static inline void riscv_hwprobe_register_async_probe(void) {}
+static inline void riscv_hwprobe_complete_async_probe(void) {}
+#endif
#endif
diff --git a/arch/riscv/include/asm/vdso/arch_data.h b/arch/riscv/include/asm/vdso/arch_data.h
index da57a3786f7a53c8..88b37af55175129b 100644
--- a/arch/riscv/include/asm/vdso/arch_data.h
+++ b/arch/riscv/include/asm/vdso/arch_data.h
@@ -12,6 +12,12 @@ struct vdso_arch_data {
/* Boolean indicating all CPUs have the same static hwprobe values. */
__u8 homogeneous_cpus;
+
+ /*
+ * A gate to check and see if the hwprobe data is actually ready, as
+ * probing is deferred to avoid boot slowdowns.
+ */
+ __u8 ready;
};
#endif /* __RISCV_ASM_VDSO_ARCH_DATA_H */
diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
index 0b170e18a2beba57..fecb6790fa88e96c 100644
--- a/arch/riscv/kernel/sys_hwprobe.c
+++ b/arch/riscv/kernel/sys_hwprobe.c
@@ -5,6 +5,8 @@
* more details.
*/
#include <linux/syscalls.h>
+#include <linux/completion.h>
+#include <linux/atomic.h>
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwprobe.h>
@@ -452,28 +454,36 @@ static int hwprobe_get_cpus(struct riscv_hwprobe __user *pairs,
return 0;
}
-static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
- size_t pair_count, size_t cpusetsize,
- unsigned long __user *cpus_user,
- unsigned int flags)
-{
- if (flags & RISCV_HWPROBE_WHICH_CPUS)
- return hwprobe_get_cpus(pairs, pair_count, cpusetsize,
- cpus_user, flags);
+#ifdef CONFIG_MMU
- return hwprobe_get_values(pairs, pair_count, cpusetsize,
- cpus_user, flags);
+static DECLARE_COMPLETION(boot_probes_done);
+static atomic_t pending_boot_probes = ATOMIC_INIT(1);
+
+void riscv_hwprobe_register_async_probe(void)
+{
+ atomic_inc(&pending_boot_probes);
}
-#ifdef CONFIG_MMU
+void riscv_hwprobe_complete_async_probe(void)
+{
+ if (atomic_dec_and_test(&pending_boot_probes))
+ complete(&boot_probes_done);
+}
-static int __init init_hwprobe_vdso_data(void)
+static int complete_hwprobe_vdso_data(void)
{
struct vdso_arch_data *avd = vdso_k_arch_data;
u64 id_bitsmash = 0;
struct riscv_hwprobe pair;
int key;
+ /* We've probably already produced these values. */
+ if (likely(avd->ready))
+ return 0;
+
+ if (unlikely(!atomic_dec_and_test(&pending_boot_probes)))
+ wait_for_completion(&boot_probes_done);
+
/*
* Initialize vDSO data with the answers for the "all CPUs" case, to
* save a syscall in the common case.
@@ -501,13 +511,48 @@ static int __init init_hwprobe_vdso_data(void)
* vDSO should defer to the kernel for exotic cpu masks.
*/
avd->homogeneous_cpus = id_bitsmash != 0 && id_bitsmash != -1;
+
+ /*
+ * Make sure all the VDSO values are visible before we look at them.
+ * This pairs with the implicit "no speculativly visible accesses"
+ * barrier in the VDSO hwprobe code.
+ */
+ smp_wmb();
+ avd->ready = true;
+ return 0;
+}
+
+static int __init init_hwprobe_vdso_data(void)
+{
+ struct vdso_arch_data *avd = vdso_k_arch_data;
+
+ /*
+ * Prevent the vDSO cached values from being used, as they're not ready
+ * yet.
+ */
+ avd->ready = false;
return 0;
}
-arch_initcall_sync(init_hwprobe_vdso_data);
+late_initcall(init_hwprobe_vdso_data);
#endif /* CONFIG_MMU */
+static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
+ size_t pair_count, size_t cpusetsize,
+ unsigned long __user *cpus_user,
+ unsigned int flags)
+{
+ complete_hwprobe_vdso_data();
+
+ if (flags & RISCV_HWPROBE_WHICH_CPUS)
+ return hwprobe_get_cpus(pairs, pair_count, cpusetsize,
+ cpus_user, flags);
+
+ return hwprobe_get_values(pairs, pair_count, cpusetsize,
+ cpus_user, flags);
+}
+
SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs,
size_t, pair_count, size_t, cpusetsize, unsigned long __user *,
cpus, unsigned int, flags)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index ae2068425fbcd207..4b8ad2673b0f7470 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -379,6 +379,7 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __always_unused)
{
schedule_on_each_cpu(check_vector_unaligned_access);
+ riscv_hwprobe_complete_async_probe();
return 0;
}
@@ -473,8 +474,12 @@ static int __init check_unaligned_access_all_cpus(void)
per_cpu(vector_misaligned_access, cpu) = unaligned_vector_speed_param;
} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
- kthread_run(vec_check_unaligned_access_speed_all_cpus,
- NULL, "vec_check_unaligned_access_speed_all_cpus");
+ riscv_hwprobe_register_async_probe();
+ if (IS_ERR(kthread_run(vec_check_unaligned_access_speed_all_cpus,
+ NULL, "vec_check_unaligned_access_speed_all_cpus"))) {
+ pr_warn("Failed to create vec_unalign_check kthread\n");
+ riscv_hwprobe_complete_async_probe();
+ }
}
/*
diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
index 2ddeba6c68dda09b..bf77b4c1d2d8e803 100644
--- a/arch/riscv/kernel/vdso/hwprobe.c
+++ b/arch/riscv/kernel/vdso/hwprobe.c
@@ -27,7 +27,7 @@ static int riscv_vdso_get_values(struct riscv_hwprobe *pairs, size_t pair_count,
* homogeneous, then this function can handle requests for arbitrary
* masks.
*/
- if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus))
+ if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus) || unlikely(!avd->ready))
return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags);
/* This is something we can handle, fill out the pairs. */
--
2.50.1
The error path in adfs_fplus_read() prints an error message via
adfs_error() but incorrectly returns success (0) to the caller.
This occurs because the 'ret' variable remains set to 0 from the earlier
successful call to adfs_fplus_validate_tail().
Fix by setting 'ret = -EIO' before jumping to the error exit.
This issue was detected by smatch static analysis:
warning: fs/adfs/dir_fplus.c:146: adfs_fplus_read() warn: missing error
code 'ret'.
Fixes: d79288b4f61b ("fs/adfs: bigdir: calculate and validate directory checkbyte")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni(a)easystack.cn>
---
Changes in v2:
- Add tags of 'Fixes' and 'Cc' in commit message
- Added error description and the corresponding fix in commit message
---
fs/adfs/dir_fplus.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/adfs/dir_fplus.c b/fs/adfs/dir_fplus.c
index 4a15924014da..4334279409b2 100644
--- a/fs/adfs/dir_fplus.c
+++ b/fs/adfs/dir_fplus.c
@@ -143,6 +143,7 @@ static int adfs_fplus_read(struct super_block *sb, u32 indaddr,
if (adfs_fplus_checkbyte(dir) != t->bigdircheckbyte) {
adfs_error(sb, "dir %06x checkbyte mismatch\n", indaddr);
+ ret = -EIO;
goto out;
}
--
2.20.1
Pentium 4's which are INTEL_P4_PRESCOTT (model 0x03) and later have
a constant TSC. This was correctly captured until commit fadb6f569b10
("x86/cpu/intel: Limit the non-architectural constant_tsc model checks").
In that commit, an error was introduced while selecting the last P4
model (0x06) as the upper bound. Model 0x06 was transposed to
INTEL_P4_WILLAMETTE, which is just plain wrong. That was presumably a
simple typo, probably just copying and pasting the wrong P4 model.
Fix the constant TSC logic to cover all later P4 models. End at
INTEL_P4_CEDARMILL which accurately corresponds to the last P4 model.
Fixes: fadb6f569b10 ("x86/cpu/intel: Limit the non-architectural constant_tsc model checks")
Cc: <stable(a)vger.kernel.org> # v6.15
Signed-off-by: Suchit Karunakaran <suchitkarunakaran(a)gmail.com>
Changes since v3:
- Refined changelog
Changes since v2:
- Improve commit message
Changes since v1:
- Fix incorrect logic
---
arch/x86/kernel/cpu/intel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 076eaa41b8c8..6f5bd5dbc249 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -262,7 +262,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
- } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_WILLAMETTE) ||
+ } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_CEDARMILL) ||
(c->x86_vfm >= INTEL_CORE_YONAH && c->x86_vfm <= INTEL_IVYBRIDGE)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
}
--
2.50.1
From: Quang Le <quanglex97(a)gmail.com>
When packet_set_ring() releases po->bind_lock, another thread can
run packet_notifier() and process an NETDEV_UP event.
This race and the fix are both similar to that of commit 15fe076edea7
("net/packet: fix a race in packet_bind() and packet_notifier()").
There too the packet_notifier NETDEV_UP event managed to run while a
po->bind_lock critical section had to be temporarily released. And
the fix was similarly to temporarily set po->num to zero to keep
the socket unhooked until the lock is retaken.
The po->bind_lock in packet_set_ring and packet_notifier precede the
introduction of git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Quang Le <quanglex97(a)gmail.com>
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
v1->v2:
- fix author attribution (From: at the top)
v1: https://lore.kernel.org/netdev/20250731175132.2592130-1-willemdebruijn.kern…
---
net/packet/af_packet.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bc438d0d96a7..a7017d7f0927 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4573,10 +4573,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
spin_lock(&po->bind_lock);
was_running = packet_sock_flag(po, PACKET_SOCK_RUNNING);
num = po->num;
- if (was_running) {
- WRITE_ONCE(po->num, 0);
+ WRITE_ONCE(po->num, 0);
+ if (was_running)
__unregister_prot_hook(sk, false);
- }
+
spin_unlock(&po->bind_lock);
synchronize_net();
@@ -4608,10 +4608,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
mutex_unlock(&po->pg_vec_lock);
spin_lock(&po->bind_lock);
- if (was_running) {
- WRITE_ONCE(po->num, num);
+ WRITE_ONCE(po->num, num);
+ if (was_running)
register_prot_hook(sk);
- }
+
spin_unlock(&po->bind_lock);
if (pg_vec && (po->tp_version > TPACKET_V2)) {
/* Because we don't support block-based V3 on tx-ring */
--
2.50.1.565.gc32cd1483b-goog
During the integration of the RTL8239 POE chip + its frontend MCU, it was
noticed that multi-byte operations were basically broken in the current
driver.
Tests using SMBus Block Writes showed that the data (after the Wr + Ack
marker) was mixed up on the wire. At first glance, it looked like an
endianness problem. But for transfers were the number of count + data bytes
was not divisible by 4, the last bytes were not looking like an endianness
problem because they were were in the wrong order but not for example 0 -
which would be the case for an endianness problem with 32 bit registers. At
the end, it turned out to be a the way how i2c_write tried to add the bytes
to the send registers.
Each 32 bit register was used similar to a shift register - shifting the
various bytes up the register while the next one is added to the least
significant byte. But the I2C controller expects the first byte of the
tranmission in the least significant byte of the first register. And the
last byte (assuming it is a 16 byte transfer) in the most significant byte
of the fourth register.
While doing these tests, it was also observed that the count byte was
missing from the SMBus Block Writes. The driver just removed them from the
data->block (from the I2C subsystem). But the I2C controller DOES NOT
automatically add this byte - for example by using the configured
transmission length.
The RTL8239 MCU is not actually an SMBus compliant device. Instead, it
expects I2C Block Reads + I2C Block Writes. But according to the already
identified bugs in the driver, it was clear that the I2C controller can
simply be modified to not send the count byte for I2C_SMBUS_I2C_BLOCK_DATA.
The receive part, just needs to write the content of the receive buffer to
the correct position in data->block.
While the on-wire formwat was now correct, reads were still not possible
against the MCU (for the RTL8239 POE chip). It was always timing out
because the 2ms were not enough for sending the read request and then
receiving the 12 byte answer.
These changes were originally submitted to OpenWrt. But there are plans to
migrate OpenWrt to the upstream Linux driver. As result, the pull request
was stopped and the changes were redone against this driver.
For reasons of transparency: The work on I2C_SMBUS_I2C_BLOCK_DATA support
for the RTL8239-MCU was done on RTL931xx. All problem were therefore
detected with the patches from Jonas Jelonek [1] and not the vanilla Linux
driver. But looking through the code, it seems like these are NOT
regressions introduced by the RTL931x patchset.
[1] https://patchwork.ozlabs.org/project/linux-i2c/cover/20250727114800.3046-1-…
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
Changes in v2:
- add the missing transfer width and read length increase for the SMBus
Write/Read
- Link to v1: https://lore.kernel.org/r/20250802-i2c-rtl9300-multi-byte-v1-0-5f687e0098e2…
---
Harshal Gohel (2):
i2c: rtl9300: Fix multi-byte I2C write
i2c: rtl9300: Implement I2C block read and write
Sven Eckelmann (2):
i2c: rtl9300: Increase timeout for transfer polling
i2c: rtl9300: Add missing count byte for SMBus Block Ops
drivers/i2c/busses/i2c-rtl9300.c | 43 +++++++++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 7 deletions(-)
---
base-commit: b9ddaa95fd283bce7041550ddbbe7e764c477110
change-id: 20250802-i2c-rtl9300-multi-byte-edaa1fb0872c
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
During the integration of the RTL8239 POE chip + its frontend MCU, it was
noticed that multi-byte operations were basically broken in the current
driver.
Tests using SMBus Block Writes showed that the data (after the Wr + Ack
marker) was mixed up on the wire. At first glance, it looked like an
endianness problem. But for transfers were the number of count + data bytes
was not divisible by 4, the last bytes were not looking like an endianness
problem because they were were in the wrong order but not for example 0 -
which would be the case for an endianness problem with 32 bit registers. At
the end, it turned out to be a the way how i2c_write tried to add the bytes
to the send registers.
Each 32 bit register was used similar to a shift register - shifting the
various bytes up the register while the next one is added to the least
significant byte. But the I2C controller expects the first byte of the
tranmission in the least significant byte of the first register. And the
last byte (assuming it is a 16 byte transfer) in the most significant byte
of the fourth register.
While doing these tests, it was also observed that the count byte was
missing from the SMBus Block Writes. The driver just removed them from the
data->block (from the I2C subsystem). But the I2C controller DOES NOT
automatically add this byte - for example by using the configured
transmission length.
The RTL8239 MCU is not actually an SMBus compliant device. Instead, it
expects I2C Block Reads + I2C Block Writes. But according to the already
identified bugs in the driver, it was clear that the I2C controller can
simply be modified to not send the count byte for I2C_SMBUS_I2C_BLOCK_DATA.
The receive part, just needs to write the content of the receive buffer to
the correct position in data->block.
While the on-wire formwat was now correct, reads were still not possible
against the MCU (for the RTL8239 POE chip). It was always timing out
because the 2ms were not enough for sending the read request and then
receiving the 12 byte answer.
These changes were originally submitted to OpenWrt. But there are plans to
migrate OpenWrt to the upstream Linux driver. As result, the pull request
was stopped and the changes were redone against this driver.
For reasons of transparency: The work on I2C_SMBUS_I2C_BLOCK_DATA support
for the RTL8239-MCU was done on RTL931xx. All problem were therefore
detected with the patches from Jonas Jelonek [1] and not the vanilla Linux
driver. But looking through the code, it seems like these are NOT
regressions introduced by the RTL931x patchset.
I've picked up Alex Guo's patch [2] to reduce conflicts between pending
fixes.
[1] https://patchwork.ozlabs.org/project/linux-i2c/cover/20250727114800.3046-1-…
[2] https://lore.kernel.org/r/20250615235248.529019-1-alexguo1023@gmail.com
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
Changes in v3:
- integrated patch
https://lore.kernel.org/r/20250615235248.529019-1-alexguo1023@gmail.com
to avoid conflicts in the I2C_SMBUS_BLOCK_DATA code
- added Fixes and stable(a)vger.kernel.org to Alex Guo's patch
- added Chris Packham's Reviewed-by/Acked-by
- Link to v2: https://lore.kernel.org/r/20250803-i2c-rtl9300-multi-byte-v2-0-9b7b759fe2b6…
Changes in v2:
- add the missing transfer width and read length increase for the SMBus
Write/Read
- Link to v1: https://lore.kernel.org/r/20250802-i2c-rtl9300-multi-byte-v1-0-5f687e0098e2…
---
Alex Guo (1):
i2c: rtl9300: Fix out-of-bounds bug in rtl9300_i2c_smbus_xfer
Harshal Gohel (2):
i2c: rtl9300: Fix multi-byte I2C write
i2c: rtl9300: Implement I2C block read and write
Sven Eckelmann (2):
i2c: rtl9300: Increase timeout for transfer polling
i2c: rtl9300: Add missing count byte for SMBus Block Ops
drivers/i2c/busses/i2c-rtl9300.c | 51 ++++++++++++++++++++++++++++++++++------
1 file changed, 44 insertions(+), 7 deletions(-)
---
base-commit: 0ae982df67760cd08affa935c0fe86c8a9311797
change-id: 20250802-i2c-rtl9300-multi-byte-edaa1fb0872c
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
With a timeout of only 1 second, my rx 5700XT fails to initialize,
so this increases the timeout to 2s.
Closes https://gitlab.freedesktop.org/drm/amd/-/issues/3697
Signed-off-by: Xaver Hugl <xaver.hugl(a)kde.org>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 6d34eac0539d..ae6908b57d78 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -275,7 +275,7 @@ static int amdgpu_discovery_read_binary_from_mem(struct amdgpu_device *adev,
int i, ret = 0;
if (!amdgpu_sriov_vf(adev)) {
- /* It can take up to a second for IFWI init to complete on some dGPUs,
+ /* It can take up to two seconds for IFWI init to complete on some dGPUs,
* but generally it should be in the 60-100ms range. Normally this starts
* as soon as the device gets power so by the time the OS loads this has long
* completed. However, when a card is hotplugged via e.g., USB4, we need to
@@ -283,7 +283,7 @@ static int amdgpu_discovery_read_binary_from_mem(struct amdgpu_device *adev,
* continue.
*/
- for (i = 0; i < 1000; i++) {
+ for (i = 0; i < 2000; i++) {
msg = RREG32(mmMP0_SMN_C2PMSG_33);
if (msg & 0x80000000)
break;
--
2.50.1
Ensure that epoll instances can never form a graph deeper than
EP_MAX_NESTS+1 links.
Currently, ep_loop_check_proc() ensures that the graph is loop-free and
does some recursion depth checks, but those recursion depth checks don't
limit the depth of the resulting tree for two reasons:
- They don't look upwards in the tree.
- If there are multiple downwards paths of different lengths, only one of
the paths is actually considered for the depth check since commit
28d82dc1c4ed ("epoll: limit paths").
Essentially, the current recursion depth check in ep_loop_check_proc() just
serves to prevent it from recursing too deeply while checking for loops.
A more thorough check is done in reverse_path_check() after the new graph
edge has already been created; this checks, among other things, that no
paths going upwards from any non-epoll file with a length of more than 5
edges exist. However, this check does not apply to non-epoll files.
As a result, it is possible to recurse to a depth of at least roughly 500,
tested on v6.15. (I am unsure if deeper recursion is possible; and this may
have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion
problem").)
To fix it:
1. In ep_loop_check_proc(), note the subtree depth of each visited node,
and use subtree depths for the total depth calculation even when a subtree
has already been visited.
2. Add ep_get_upwards_depth_proc() for similarly determining the maximum
depth of an upwards walk.
3. In ep_loop_check(), use these values to limit the total path length
between epoll nodes to EP_MAX_NESTS edges.
Fixes: 22bacca48a17 ("epoll: prevent creating circular epoll structures")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jann Horn <jannh(a)google.com>
---
fs/eventpoll.c | 60 ++++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 46 insertions(+), 14 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index d4dbffdedd08..44648cc09250 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -218,6 +218,7 @@ struct eventpoll {
/* used to optimize loop detection check */
u64 gen;
struct hlist_head refs;
+ u8 loop_check_depth;
/*
* usage count, used together with epitem->dying to
@@ -2142,23 +2143,24 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
}
/**
- * ep_loop_check_proc - verify that adding an epoll file inside another
- * epoll structure does not violate the constraints, in
- * terms of closed loops, or too deep chains (which can
- * result in excessive stack usage).
+ * ep_loop_check_proc - verify that adding an epoll file @ep inside another
+ * epoll file does not create closed loops, and
+ * determine the depth of the subtree starting at @ep
*
* @ep: the &struct eventpoll to be currently checked.
* @depth: Current depth of the path being checked.
*
- * Return: %zero if adding the epoll @file inside current epoll
- * structure @ep does not violate the constraints, or %-1 otherwise.
+ * Return: depth of the subtree, or INT_MAX if we found a loop or went too deep.
*/
static int ep_loop_check_proc(struct eventpoll *ep, int depth)
{
- int error = 0;
+ int result = 0;
struct rb_node *rbp;
struct epitem *epi;
+ if (ep->gen == loop_check_gen)
+ return ep->loop_check_depth;
+
mutex_lock_nested(&ep->mtx, depth + 1);
ep->gen = loop_check_gen;
for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
@@ -2166,13 +2168,11 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth)
if (unlikely(is_file_epoll(epi->ffd.file))) {
struct eventpoll *ep_tovisit;
ep_tovisit = epi->ffd.file->private_data;
- if (ep_tovisit->gen == loop_check_gen)
- continue;
if (ep_tovisit == inserting_into || depth > EP_MAX_NESTS)
- error = -1;
+ result = INT_MAX;
else
- error = ep_loop_check_proc(ep_tovisit, depth + 1);
- if (error != 0)
+ result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1);
+ if (result > EP_MAX_NESTS)
break;
} else {
/*
@@ -2186,9 +2186,27 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth)
list_file(epi->ffd.file);
}
}
+ ep->loop_check_depth = result;
mutex_unlock(&ep->mtx);
- return error;
+ return result;
+}
+
+/**
+ * ep_get_upwards_depth_proc - determine depth of @ep when traversed upwards
+ */
+static int ep_get_upwards_depth_proc(struct eventpoll *ep, int depth)
+{
+ int result = 0;
+ struct epitem *epi;
+
+ if (ep->gen == loop_check_gen)
+ return ep->loop_check_depth;
+ hlist_for_each_entry_rcu(epi, &ep->refs, fllink)
+ result = max(result, ep_get_upwards_depth_proc(epi->ep, depth + 1) + 1);
+ ep->gen = loop_check_gen;
+ ep->loop_check_depth = result;
+ return result;
}
/**
@@ -2204,8 +2222,22 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth)
*/
static int ep_loop_check(struct eventpoll *ep, struct eventpoll *to)
{
+ int depth, upwards_depth;
+
inserting_into = ep;
- return ep_loop_check_proc(to, 0);
+ /*
+ * Check how deep down we can get from @to, and whether it is possible
+ * to loop up to @ep.
+ */
+ depth = ep_loop_check_proc(to, 0);
+ if (depth > EP_MAX_NESTS)
+ return -1;
+ /* Check how far up we can go from @ep. */
+ rcu_read_lock();
+ upwards_depth = ep_get_upwards_depth_proc(ep, 0);
+ rcu_read_unlock();
+
+ return (depth+1+upwards_depth > EP_MAX_NESTS) ? -1 : 0;
}
static void clear_tfile_check_list(void)
---
base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
change-id: 20250711-epoll-recursion-fix-fb0e336b2aeb
--
Jann Horn <jannh(a)google.com>
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 44352df58c9e..c6d89b51c678 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4454,7 +4454,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
On systems using the hash MMU, there is a software SLB preload cache that
mirrors the entries loaded into the hardware SLB buffer. This preload
cache is subject to periodic eviction — typically after every 256 context
switches — to remove old entry.
To optimize performance, the kernel skips switch_mmu_context() in
switch_mm_irqs_off() when the prev and next mm_struct are the same.
However, on hash MMU systems, this can lead to inconsistencies between
the hardware SLB and the software preload cache.
If an SLB entry for a process is evicted from the software cache on one
CPU, and the same process later runs on another CPU without executing
switch_mmu_context(), the hardware SLB may retain stale entries. If the
kernel then attempts to reload that entry, it can trigger an SLB
multi-hit error.
The following timeline shows how stale SLB entries are created and can
cause a multi-hit error when a process moves between CPUs without a
MMU context switch.
CPU 0 CPU 1
----- -----
Process P
exec swapper/1
load_elf_binary
begin_new_exc
activate_mm
switch_mm_irqs_off
switch_mmu_context
switch_slb
/*
* This invalidates all
* the entries in the HW
* and setup the new HW
* SLB entries as per the
* preload cache.
*/
context_switch
sched_migrate_task migrates process P to cpu-1
Process swapper/0 context switch (to process P)
(uses mm_struct of Process P) switch_mm_irqs_off()
switch_slb
load_slb++
/*
* load_slb becomes 0 here
* and we evict an entry from
* the preload cache with
* preload_age(). We still
* keep HW SLB and preload
* cache in sync, that is
* because all HW SLB entries
* anyways gets evicted in
* switch_slb during SLBIA.
* We then only add those
* entries back in HW SLB,
* which are currently
* present in preload_cache
* (after eviction).
*/
load_elf_binary continues...
setup_new_exec()
slb_setup_new_exec()
sched_switch event
sched_migrate_task migrates
process P to cpu-0
context_switch from swapper/0 to Process P
switch_mm_irqs_off()
/*
* Since both prev and next mm struct are same we don't call
* switch_mmu_context(). This will cause the HW SLB and SW preload
* cache to go out of sync in preload_new_slb_context. Because there
* was an SLB entry which was evicted from both HW and preload cache
* on cpu-1. Now later in preload_new_slb_context(), when we will try
* to add the same preload entry again, we will add this to the SW
* preload cache and then will add it to the HW SLB. Since on cpu-0
* this entry was never invalidated, hence adding this entry to the HW
* SLB will cause a SLB multi-hit error.
*/
load_elf_binary continues...
START_THREAD
start_thread
preload_new_slb_context
/*
* This tries to add a new EA to preload cache which was earlier
* evicted from both cpu-1 HW SLB and preload cache. This caused the
* HW SLB of cpu-0 to go out of sync with the SW preload cache. The
* reason for this was, that when we context switched back on CPU-0,
* we should have ideally called switch_mmu_context() which will
* bring the HW SLB entries on CPU-0 in sync with SW preload cache
* entries by setting up the mmu context properly. But we didn't do
* that since the prev mm_struct running on cpu-0 was same as the
* next mm_struct (which is true for swapper / kernel threads). So
* now when we try to add this new entry into the HW SLB of cpu-0,
* we hit a SLB multi-hit error.
*/
WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
Modules linked in:
CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
VOLUNTARY
Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty)
MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000
CFAR: c0000000001543b0 IRQMASK: 3
<...>
NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
LR [c0000000001543b4] slb_insert_entry+0x124/0x390
Call Trace:
0x7fffceb5ffff (unreliable)
preload_new_slb_context+0x100/0x1a0
start_thread+0x26c/0x420
load_elf_binary+0x1b04/0x1c40
bprm_execve+0x358/0x680
do_execveat_common+0x1f8/0x240
sys_execve+0x58/0x70
system_call_exception+0x114/0x300
system_call_common+0x160/0x2c4
To fix this issue, we add a code change to always switch the MMU context on
hash MMU if the SLB preload cache has aged. With this change, the
SLB multi-hit error no longer occurs.
cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
cc: Michael Ellerman <mpe(a)ellerman.id.au>
cc: Nicholas Piggin <npiggin(a)gmail.com>
Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
cc: stable(a)vger.kernel.org
Suggested-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
---
v1 -> v2 : Changed commit message and added a comment in
switch_mm_irqs_off()
v1 - https://lore.kernel.org/all/20250731161027.966196-1-donettom@linux.ibm.com/
---
arch/powerpc/mm/book3s64/slb.c | 2 +-
arch/powerpc/mm/mmu_context.c | 7 +++++--
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 6b783552403c..08daac3f978c 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -509,7 +509,7 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
* SLB preload cache.
*/
tsk->thread.load_slb++;
- if (!tsk->thread.load_slb) {
+ if (tsk->thread.load_slb == U8_MAX) {
unsigned long pc = KSTK_EIP(tsk);
preload_age(ti);
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index 3e3af29b4523..95455d787288 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -83,8 +83,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
/* Some subarchs need to track the PGD elsewhere */
switch_mm_pgdir(tsk, next);
- /* Nothing else to do if we aren't actually switching */
- if (prev == next)
+ /*
+ * Nothing else to do if we aren't actually switching and
+ * the preload slb cache has not aged
+ */
+ if ((prev == next) && (tsk->thread.load_slb != U8_MAX))
return;
/*
--
2.50.1
Testing has shown that reading multiple registers at once (for 10 bit
adc values) does not work. Set the use_single_read regmap_config flag
to make regmap split these for is.
This should fix temperature opregion accesses done by
drivers/acpi/pmic/intel_pmic_chtdc_ti.c and is also necessary for
the upcoming drivers for the ADC and battery MFD cells.
Fixes: 6bac0606fdba ("mfd: Add support for Cherry Trail Dollar Cove TI PMIC")
Cc: stable(a)vger.kernel.org
Reviewed-by: Andy Shevchenko <andy(a)kernel.org>
Signed-off-by: Hans de Goede <hansg(a)kernel.org>
---
Changes in v2:
- Update comment to: "The hardware does not support reading multiple
registers at once"
---
drivers/mfd/intel_soc_pmic_chtdc_ti.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mfd/intel_soc_pmic_chtdc_ti.c b/drivers/mfd/intel_soc_pmic_chtdc_ti.c
index 4c1a68c9f575..6daf33e07ea0 100644
--- a/drivers/mfd/intel_soc_pmic_chtdc_ti.c
+++ b/drivers/mfd/intel_soc_pmic_chtdc_ti.c
@@ -82,6 +82,8 @@ static const struct regmap_config chtdc_ti_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
.max_register = 0xff,
+ /* The hardware does not support reading multiple registers at once */
+ .use_single_read = true,
};
static const struct regmap_irq chtdc_ti_irqs[] = {
--
2.49.0
Testing has shown that reading multiple registers at once (for 10 bit
adc values) does not work. Set the use_single_read regmap_config flag
to make regmap split these for is.
This should fix temperature opregion accesses done by
drivers/acpi/pmic/intel_pmic_chtdc_ti.c and is also necessary for
the upcoming drivers for the ADC and battery MFD cells.
Fixes: 6bac0606fdba ("mfd: Add support for Cherry Trail Dollar Cove TI PMIC")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hansg(a)kernel.org>
---
drivers/mfd/intel_soc_pmic_chtdc_ti.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mfd/intel_soc_pmic_chtdc_ti.c b/drivers/mfd/intel_soc_pmic_chtdc_ti.c
index 4c1a68c9f575..a23bda8ddae8 100644
--- a/drivers/mfd/intel_soc_pmic_chtdc_ti.c
+++ b/drivers/mfd/intel_soc_pmic_chtdc_ti.c
@@ -82,6 +82,8 @@ static const struct regmap_config chtdc_ti_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
.max_register = 0xff,
+ /* Reading multiple registers at once is not supported */
+ .use_single_read = true,
};
static const struct regmap_irq chtdc_ti_irqs[] = {
--
2.49.0
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index b720e04ce7d8..8be033f1877d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4337,7 +4337,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
Hi,
Glad to know you and your company from Jordan.
I‘m Seven CTO of STHL We are a one-stop service provider for PCBA. We can help you with production from PCB to finished product assembly.
Why Partner With Us?
✅ One-Stop Expertise: From PCB fabrication, PCBA (SMT & Through-Hole), custom cable harnesses, , to final product assembly – we eliminate multi-vendor coordination risks.
✅ Cost Efficiency: 40%+ clients reduce logistics/QC costs through our integrated service model (ISO 9001:2015 certified).
✅ Speed-to-Market: Average 15% faster lead times achieved via in-house vertical integration.
Recent Success Case:
Helped a German IoT startup scale from prototype to 50K-unit/month production within 6 months through our:
PCB Design-for-Manufacturing (DFM) optimization Automated PCBA with 99.98% first-pass yield Mechanical housing CNC machining & IP67-rated assembly
Seven Marcus CTO
Shenzhen STHL Technology Co,Ltd
+8618569002840 Seven(a)pcba-china.com
在2025-06-04,Seven <seven(a)ems-sthi.com> 写道:-----原始邮件-----
发件人: Seven <seven(a)ems-sthi.com>
发件时间: 2025年06月04日 周三
收件人: [Linux-stable-mirror <linux-stable-mirror(a)lists.linaro.org>]
主题: Re:Jordan recommend me get in touch
Hi,
Glad to know you and your company from Jordan.
I‘m Seven CTO of STHL We are a one-stop service provider for PCBA. We can help you with production from PCB to finished product assembly.
Why Partner With Us?
✅ One-Stop Expertise: From PCB fabrication, PCBA (SMT & Through-Hole), custom cable harnesses, , to final product assembly – we eliminate multi-vendor coordination risks.
✅ Cost Efficiency: 40%+ clients reduce logistics/QC costs through our integrated service model (ISO 9001:2015 certified).
✅ Speed-to-Market: Average 15% faster lead times achieved via in-house vertical integration.
Recent Success Case:
Helped a German IoT startup scale from prototype to 50K-unit/month production within 6 months through our:
PCB Design-for-Manufacturing (DFM) optimization Automated PCBA with 99.98% first-pass yield Mechanical housing CNC machining & IP67-rated assembly
Seven Marcus CTO
Shenzhen STHL Technology Co,Ltd
+8618569002840 Seven(a)pcba-china.com
pnv_irq_domain_alloc() allocates interrupts at parent's interrupt
domain. If it fails in the progress, all allocated interrupts are
freed.
The number of successfully allocated interrupts so far is stored
"i". However, "i - 1" interrupts are freed. This is broken:
- One interrupt is not be freed
- If "i" is zero, "i - 1" wraps around
Correct the number of freed interrupts to "i".
Fixes: 0fcfe2247e75 ("powerpc/powernv/pci: Add MSI domains")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d8ccf2c9b98a..0166bf39ce1e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1854,7 +1854,7 @@ static int pnv_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
return 0;
out:
- irq_domain_free_irqs_parent(domain, virq, i - 1);
+ irq_domain_free_irqs_parent(domain, virq, i);
msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq, nr_irqs);
return ret;
}
--
2.39.5
pseries_irq_domain_alloc() allocates interrupts at parent's interrupt
domain. If it fails in the progress, all allocated interrupts are
freed.
The number of successfully allocated interrupts so far is stored
"i". However, "i - 1" interrupts are freed. This is broken:
- One interrupt is not be freed
- If "i" is zero, "i - 1" wraps around
Correct the number of freed interrupts to 'i'.
Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/powerpc/platforms/pseries/msi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index ee1c8c6898a3..9dc294de631f 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -593,7 +593,7 @@ static int pseries_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
out:
/* TODO: handle RTAS cleanup in ->msi_finish() ? */
- irq_domain_free_irqs_parent(domain, virq, i - 1);
+ irq_domain_free_irqs_parent(domain, virq, i);
return ret;
}
--
2.39.5
This driver, for the time being, assumes that the kernel page size is 4kB,
so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB
pages.
Signed-off-by: Simon Richter <Simon.Richter(a)hogyros.de>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/xe/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 2bb2bc052120..714d5702dfd7 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -5,6 +5,7 @@ config DRM_XE
depends on KUNIT || !KUNIT
depends on INTEL_VSEC || !INTEL_VSEC
depends on X86_PLATFORM_DEVICES || !(X86 && ACPI)
+ depends on PAGE_SIZE_4KB || COMPILE_TEST || BROKEN
select INTERVAL_TREE
# we need shmfs for the swappable backing store, and in particular
# the shmem_readpage() which depends upon tmpfs
--
2.47.2
replace_fd() returns either a negative error number or the number of the
new file descriptor. The current code misinterprets any positive file
descriptor number as an error.
Only check for negative error numbers, so that __receive_sock() is called
correctly for valid file descriptors.
Fixes: 173817151b15 ("fs: Expand __receive_fd() to accept existing fd")
Fixes: 42eb0d54c08a ("fs: split receive_fd_replace from __receive_fd")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Untested, it stuck out while reading the code.
---
fs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/file.c b/fs/file.c
index 6d2275c3be9c6967d16c75d1b6521f9b58980926..56c3a045121d8f43a54cf05e6ce1962f896339ac 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1387,7 +1387,7 @@ int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags)
if (error)
return error;
error = replace_fd(new_fd, file, o_flags);
- if (error)
+ if (error < 0)
return error;
__receive_sock(file);
return new_fd;
---
base-commit: 89748acdf226fd1a8775ff6fa2703f8412b286c8
change-id: 20250801-fix-receive_fd_replace-7fdd5ce6532d
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
The ring buffer size varies across VMBus channels. The size of sysfs
node for the ring buffer is currently hardcoded to 4 MB. Userspace
clients either use fstat() or hardcode this size for doing mmap().
To address this, make the sysfs node size dynamic to reflect the
actual ring buffer size for each channel. This will ensure that
fstat() on ring sysfs node always returns the correct size of
ring buffer.
This is a backport of the upstream commit
65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic")
with modifications, as the original patch has missing dependencies on
kernel v6.12.x. The structure "struct attribute_group" does not have
bin_size field in v6.12.x kernel so the logic of configuring size of
sysfs node for ring buffer has been moved to
vmbus_chan_bin_attr_is_visible().
Original change was not a fix, but it needs to be backported to fix size
related discrepancy caused by the commit mentioned in Fixes tag.
Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page")
Cc: <stable(a)vger.kernel.org> # 6.12.x
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
This change won't apply on older kernels currently due to missing
dependencies. I will take care of them after this goes in.
I did not retain any Reviewed-by or Tested-by tags, since the code has
changed completely, while the functionality remains same.
Requesting Michael, Dexuan, Wei to please review again.
---
drivers/hv/vmbus_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 1f519e925f06..616e63fb2f15 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = {
.name = "ring",
.mode = 0600,
},
- .size = 2 * SZ_2M,
.mmap = hv_mmap_ring_buffer_wrapper,
};
static struct attribute *vmbus_chan_attrs[] = {
@@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
/* Hide ring attribute if channel's ring_sysfs_visible is set to false */
if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
return 0;
+ attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
return attr->attr.mode;
}
--
2.34.1
object_err() reports details of an object for further debugging, such as
the freelist pointer, redzone, etc. However, if the pointer is invalid,
attempting to access object metadata can lead to a crash since it does
not point to a valid object.
In case check_valid_pointer() returns false for the pointer, only print
the pointer value and skip accessing metadata.
Fixes: 81819f0fc828 ("SLUB core")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Li Qiong <liqiong(a)nfschina.com>
---
v2:
- rephrase the commit message, add comment for object_err().
v3:
- check object pointer in object_err().
v4:
- restore changes in alloc_consistency_checks().
v5:
- rephrase message, fix code style.
---
mm/slub.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index 31e11ef256f9..b3eff1476c85 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1104,7 +1104,12 @@ static void object_err(struct kmem_cache *s, struct slab *slab,
return;
slab_bug(s, reason);
- print_trailer(s, slab, object);
+ if (!check_valid_pointer(s, slab, object)) {
+ print_slab_info(slab);
+ pr_err("Invalid pointer 0x%p\n", object);
+ } else {
+ print_trailer(s, slab, object);
+ }
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
WARN_ON(1);
--
2.30.2
The existing memstick core patch: commit 62c59a8786e6 ("memstick: Skip
allocating card when removing host") sets host->removing in
memstick_remove_host(),but still exists a critical time window where
memstick_check can run after host->eject is set but before removing is set.
In the rtsx_usb_ms driver, the problematic sequence is:
rtsx_usb_ms_drv_remove: memstick_check:
host->eject = true
cancel_work_sync(handle_req) if(!host->removing)
... memstick_alloc_card()
memstick_set_rw_addr()
memstick_new_req()
rtsx_usb_ms_request()
if(!host->eject)
skip schedule_work
wait_for_completion()
memstick_remove_host: [blocks indefinitely]
host->removing = true
flush_workqueue()
[block]
1. rtsx_usb_ms_drv_remove sets host->eject = true
2. cancel_work_sync(&host->handle_req) runs
3. memstick_check work may be executed here <-- danger window
4. memstick_remove_host sets removing = 1
During this window (step 3), memstick_check calls memstick_alloc_card,
which may indefinitely waiting for mrq_complete completion that will
never occur because rtsx_usb_ms_request sees eject=true and skips
scheduling work, memstick_set_rw_addr waits forever for completion.
This causes a deadlock when memstick_remove_host tries to flush_workqueue,
waiting for memstick_check to complete, while memstick_check is blocked
waiting for mrq_complete completion.
Fix this by setting removing=true at the start of rtsx_usb_ms_drv_remove,
before any work cancellation. This ensures memstick_check will see the
removing flag immediately and exit early, avoiding the deadlock.
Fixes: 62c59a8786e6 ("memstick: Skip allocating card when removing host")
Signed-off-by: Jiayi Li <lijiayi(a)kylinos.cn>
Cc: stable(a)vger.kernel.org
---
v1 -> v2:
Added Cc: stable(a)vger.kernel.org
---
drivers/memstick/core/memstick.c | 1 -
drivers/memstick/host/rtsx_usb_ms.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/memstick/core/memstick.c b/drivers/memstick/core/memstick.c
index 043b9ec756ff..95e65f4958f2 100644
--- a/drivers/memstick/core/memstick.c
+++ b/drivers/memstick/core/memstick.c
@@ -555,7 +555,6 @@ EXPORT_SYMBOL(memstick_add_host);
*/
void memstick_remove_host(struct memstick_host *host)
{
- host->removing = 1;
flush_workqueue(workqueue);
mutex_lock(&host->lock);
if (host->card)
diff --git a/drivers/memstick/host/rtsx_usb_ms.c b/drivers/memstick/host/rtsx_usb_ms.c
index 3878136227e4..5b5e9354fb2e 100644
--- a/drivers/memstick/host/rtsx_usb_ms.c
+++ b/drivers/memstick/host/rtsx_usb_ms.c
@@ -812,6 +812,7 @@ static void rtsx_usb_ms_drv_remove(struct platform_device *pdev)
int err;
host->eject = true;
+ msh->removing = true;
cancel_work_sync(&host->handle_req);
cancel_delayed_work_sync(&host->poll_card);
--
2.47.1
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 08b016864fc0..71b17a00d3ed 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4076,7 +4076,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 954cd962e113..c026e7cc0af1 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4183,7 +4183,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
From: Su Hui <suhui(a)nfschina.com>
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the
following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves
debugging of USB xHCI host controller failures by printing the actual
`xhc_state` value when `queue_command` fails. The commit message
shows real error messages users encounter ("xHCI dying or halted,
can't queue_command"), demonstrating this is a real-world debugging
problem.
2. **Minimal and Safe Change**: The change is extremely small and safe -
it only modifies a debug print statement from:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
```
to:
```c
xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state:
0x%x\n", xhci->xhc_state);
```
3. **No Functional Changes**: This is a pure diagnostic improvement. It
doesn't change any logic, control flow, or data structures. It only
adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for
USB functionality, and when it fails with "dying or halted" states,
knowing the exact state helps diagnose whether:
- `XHCI_STATE_DYING` (0x1) - controller is dying
- `XHCI_STATE_HALTED` (0x2) - controller is halted
- Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver
bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print
statement has no risk of introducing regressions. The worst case is
the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it:
- Fixes a real debugging limitation
- Is obviously correct
- Has been tested (signed-off and accepted by Greg KH)
- Is small (single line change)
- Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB
issues more effectively by providing the actual state value rather than
just saying "dying or halted", making it a valuable debugging
enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index cd94b0a4e021..626f02605192 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4486,7 +4486,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) ||
(xhci->xhc_state & XHCI_STATE_HALTED)) {
- xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n");
+ xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n",
+ xhci->xhc_state);
return -ESHUTDOWN;
}
--
2.39.5
During the integration of the RTL8239 POE chip + its frontend MCU, it was
noticed that multi-byte operations were basically broken in the current
driver.
Tests using SMBus Block Writes showed that the data (after the Wr + Ack
marker) was mixed up on the wire. At first glance, it looked like an
endianness problem. But for transfers were the number of count + data bytes
was not divisible by 4, the last bytes were not looking like an endianness
problem because they were were in the wrong order but not for example 0 -
which would be the case for an endianness problem with 32 bit registers. At
the end, it turned out to be a the way how i2c_write tried to add the bytes
to the send registers.
Each 32 bit register was used similar to a shift register - shifting the
various bytes up the register while the next one is added to the least
significant byte. But the I2C controller expects the first byte of the
tranmission in the least significant byte of the first register. And the
last byte (assuming it is a 16 byte transfer) in the most significant byte
of the fourth register.
While doing these tests, it was also observed that the count byte was
missing from the SMBus Block Writes. The driver just removed them from the
data->block (from the I2C subsystem). But the I2C controller DOES NOT
automatically add this byte - for example by using the configured
transmission length.
The RTL8239 MCU is not actually an SMBus compliant device. Instead, it
expects I2C Block Reads + I2C Block Writes. But according to the already
identified bugs in the driver, it was clear that the I2C controller can
simply be modified to not send the count byte for I2C_SMBUS_I2C_BLOCK_DATA.
The receive part, just needs to write the content of the receive buffer to
the correct position in data->block.
While the on-wire formwat was now correct, reads were still not possible
against the MCU (for the RTL8239 POE chip). It was always timing out
because the 2ms were not enough for sending the read request and then
receiving the 12 byte answer.
These changes were originally submitted to OpenWrt. But there are plans to
migrate OpenWrt to the upstream Linux driver. As result, the pull request
was stopped and the changes were redone against this driver.
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
Harshal Gohel (2):
i2c: rtl9300: Fix multi-byte I2C write
i2c: rtl9300: Implement I2C block read and write
Sven Eckelmann (2):
i2c: rtl9300: Increase timeout for transfer polling
i2c: rtl9300: Add missing count byte for SMBus Block Write
drivers/i2c/busses/i2c-rtl9300.c | 43 +++++++++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 7 deletions(-)
---
base-commit: b9ddaa95fd283bce7041550ddbbe7e764c477110
change-id: 20250802-i2c-rtl9300-multi-byte-edaa1fb0872c
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
The gpio-mlxbf2 driver interfaces with four GPIO controllers,
device instances 0-3. There are two IRQ resources shared between
the four controllers, and they are found in the ACPI table for
instances 0 and 3. The driver should not use platform_get_irq(),
otherwise this error is logged when probing instances 1 and 2:
mlxbf2_gpio MLNXBF22:01: error -ENXIO: IRQ index 0 not found
Fixes: 2b725265cb08 ("gpio: mlxbf2: Introduce IRQ support")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Thompson <davthompson(a)nvidia.com>
Reviewed-by: Shravan Kumar Ramani <shravankr(a)nvidia.com>
---
v4: updated logic to simply use platform_get_irq_optional()
v3: added version history
v2: added tag "Cc: stable(a)vger.kernel.org"
drivers/gpio/gpio-mlxbf2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-mlxbf2.c b/drivers/gpio/gpio-mlxbf2.c
index 6f3dda6b635f..390f2e74a9d8 100644
--- a/drivers/gpio/gpio-mlxbf2.c
+++ b/drivers/gpio/gpio-mlxbf2.c
@@ -397,7 +397,7 @@ mlxbf2_gpio_probe(struct platform_device *pdev)
gc->ngpio = npins;
gc->owner = THIS_MODULE;
- irq = platform_get_irq(pdev, 0);
+ irq = platform_get_irq_optional(pdev, 0);
if (irq >= 0) {
girq = &gs->gc.irq;
gpio_irq_chip_set_chip(girq, &mlxbf2_gpio_irq_chip);
--
2.43.2
The Qualcomm SM6375 processor is a 7nm process SoC for the mid-range market with the following features:
CPU: Eight-core design, including high-performance Kryo 670 core and efficient Kryo 265 core, optimized performance and energy efficiency.
GPU: Equipped with Adreno 642L GPU, supporting high-quality graphics and gaming experience.
AI Engine: Integrated Qualcomm AI engine to enhance intelligent features such as voice recognition and image processing.
Connectivity: Supports modern wireless standards such as 5G, Wi-Fi 6 and Bluetooth 5.2.
Multimedia: Supports 4K video encoding and decoding
Mainly used in mid-to-high-end smartphones, tablets and some IoT devices, suitable for users who need to balance cost performance and performance.
.# Part Number Manufacturer Date Code Quantity Unit Price Lead Time Condition (PCS) USD/Each one 1 SM-6375-1-PSP837-TR-00-0-AB QUALCOMM 2023+ 12000pcs US$18.00/pcs 7days New & original - stock 2 PM-6375-0-FOWNSP144-TR-01-0;TR-01-1 QUALCOMM 2023+ 12000pcs US$1.00/pcs 3 PMR-735A-0-WLNSP48-TR-05-0,TR-05-1 QUALCOMM 2023+ 12000pcs US$0.85/pcs 4 PMK-8003-0-FOWPSP36-TR-01-0 QUALCOMM 2023+ 12000pcs US$0.24/pcs 5 SDR-735-0-PSP219B-TR-01-0;TR-01-1 QUALCOMM 2023+ 12000pcs US$2.50/pcs 6 WCD-9370-0-WLPSP55-TR-01-0;TR-01-4 QUALCOMM 2023+ 12000pcs US$0.50/pcs 7 WCN-3988-0-82BWLPSP-TR-00-0 QUALCOMM 2023+ 12000pcs US$3.50/pcs 8 QET-6105-0-WLNSP24B-TR-00-1 QUALCOMM 2023+ 12000pcs US$1.20/pcs 9 QET4101-0-12WLNSP-TR-00-0 QUALCOMM 2022+ 12000pcs US$0.21/pcs
These materials are sold as a set for $28/usd, and are guaranteed to be authentic.
If you need other Qualcomm materials, please feel free to contact me
Stay in tune with product evolutions—tap . Keep Receiving Notices
Feel like taking a break? Select Configure Your Mailing.
Increase the External ROM access timeouts to prevent failures during
programming of External SPI EEPROM chips. The current timeouts are
too short for some SPI EEPROMs used with uPD720201 controllers.
The current timeout for Chip Erase in renesas_rom_erase() is 100 ms ,
the current timeout for Sector Erase issued by the controller before
Page Program in renesas_fw_download_image() is also 100 ms. Neither
timeout is sufficient for e.g. the Macronix MX25L5121E or MX25V5126F.
MX25L5121E reference manual [1] page 35 section "ERASE AND PROGRAMMING
PERFORMANCE" and page 23 section "Table 8. AC CHARACTERISTICS (Temperature
= 0°C to 70°C for Commercial grade, VCC = 2.7V ~ 3.6V)" row "tCE" indicate
that the maximum time required for Chip Erase opcode to complete is 2 s,
and for Sector Erase it is 300 ms .
MX25V5126F reference manual [2] page 47 section "13. ERASE AND PROGRAMMING
PERFORMANCE (2.3V - 3.6V)" and page 42 section "Table 8. AC CHARACTERISTICS
(Temperature = -40°C to 85°C for Industrial grade, VCC = 2.3V - 3.6V)" row
"tCE" indicate that the maximum time required for Chip Erase opcode to
complete is 3.2 s, and for Sector Erase it is 400 ms .
Update the timeouts such, that Chip Erase timeout is set to 5 seconds,
and Sector Erase timeout is set to 500 ms. Such lengthy timeouts ought
to be sufficient for majority of SPI EEPROM chips.
[1] https://www.macronix.com/Lists/Datasheet/Attachments/8634/MX25L5121E,%203V,…
[2] https://www.macronix.com/Lists/Datasheet/Attachments/8750/MX25V5126F,%202.5…
Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Marek Vasut <marek.vasut+renesas(a)mailbox.org>
---
Cc: Geert Uytterhoeven <geert+renesas(a)glider.be>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Mathias Nyman <mathias.nyman(a)intel.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: linux-renesas-soc(a)vger.kernel.org
Cc: linux-usb(a)vger.kernel.org
---
V2: Move Cc: stable above ---
---
drivers/usb/host/xhci-pci-renesas.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci-pci-renesas.c b/drivers/usb/host/xhci-pci-renesas.c
index 620f8f0febb8..86df80399c9f 100644
--- a/drivers/usb/host/xhci-pci-renesas.c
+++ b/drivers/usb/host/xhci-pci-renesas.c
@@ -47,8 +47,9 @@
#define RENESAS_ROM_ERASE_MAGIC 0x5A65726F
#define RENESAS_ROM_WRITE_MAGIC 0x53524F4D
-#define RENESAS_RETRY 10000
-#define RENESAS_DELAY 10
+#define RENESAS_RETRY 50000 /* 50000 * RENESAS_DELAY ~= 500ms */
+#define RENESAS_CHIP_ERASE_RETRY 500000 /* 500000 * RENESAS_DELAY ~= 5s */
+#define RENESAS_DELAY 10
#define RENESAS_FW_NAME "renesas_usb_fw.mem"
@@ -407,7 +408,7 @@ static void renesas_rom_erase(struct pci_dev *pdev)
/* sleep a bit while ROM is erased */
msleep(20);
- for (i = 0; i < RENESAS_RETRY; i++) {
+ for (i = 0; i < RENESAS_CHIP_ERASE_RETRY; i++) {
retval = pci_read_config_byte(pdev, RENESAS_ROM_STATUS,
&status);
status &= RENESAS_ROM_STATUS_ERASE;
--
2.47.2
Increase the External ROM access timeouts to prevent failures during
programming of External SPI EEPROM chips. The current timeouts are
too short for some SPI EEPROMs used with uPD720201 controllers.
The current timeout for Chip Erase in renesas_rom_erase() is 100 ms ,
the current timeout for Sector Erase issued by the controller before
Page Program in renesas_fw_download_image() is also 100 ms. Neither
timeout is sufficient for e.g. the Macronix MX25L5121E or MX25V5126F.
MX25L5121E reference manual [1] page 35 section "ERASE AND PROGRAMMING
PERFORMANCE" and page 23 section "Table 8. AC CHARACTERISTICS (Temperature
= 0°C to 70°C for Commercial grade, VCC = 2.7V ~ 3.6V)" row "tCE" indicate
that the maximum time required for Chip Erase opcode to complete is 2 s,
and for Sector Erase it is 300 ms .
MX25V5126F reference manual [2] page 47 section "13. ERASE AND PROGRAMMING
PERFORMANCE (2.3V - 3.6V)" and page 42 section "Table 8. AC CHARACTERISTICS
(Temperature = -40°C to 85°C for Industrial grade, VCC = 2.3V - 3.6V)" row
"tCE" indicate that the maximum time required for Chip Erase opcode to
complete is 3.2 s, and for Sector Erase it is 400 ms .
Update the timeouts such, that Chip Erase timeout is set to 5 seconds,
and Sector Erase timeout is set to 500 ms. Such lengthy timeouts ought
to be sufficient for majority of SPI EEPROM chips.
[1] https://www.macronix.com/Lists/Datasheet/Attachments/8634/MX25L5121E,%203V,…
[2] https://www.macronix.com/Lists/Datasheet/Attachments/8750/MX25V5126F,%202.5…
Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201")
Signed-off-by: Marek Vasut <marek.vasut+renesas(a)mailbox.org>
---
Cc: Geert Uytterhoeven <geert+renesas(a)glider.be>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Mathias Nyman <mathias.nyman(a)intel.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: linux-renesas-soc(a)vger.kernel.org
Cc: linux-usb(a)vger.kernel.org
---
drivers/usb/host/xhci-pci-renesas.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci-pci-renesas.c b/drivers/usb/host/xhci-pci-renesas.c
index 620f8f0febb8..86df80399c9f 100644
--- a/drivers/usb/host/xhci-pci-renesas.c
+++ b/drivers/usb/host/xhci-pci-renesas.c
@@ -47,8 +47,9 @@
#define RENESAS_ROM_ERASE_MAGIC 0x5A65726F
#define RENESAS_ROM_WRITE_MAGIC 0x53524F4D
-#define RENESAS_RETRY 10000
-#define RENESAS_DELAY 10
+#define RENESAS_RETRY 50000 /* 50000 * RENESAS_DELAY ~= 500ms */
+#define RENESAS_CHIP_ERASE_RETRY 500000 /* 500000 * RENESAS_DELAY ~= 5s */
+#define RENESAS_DELAY 10
#define RENESAS_FW_NAME "renesas_usb_fw.mem"
@@ -407,7 +408,7 @@ static void renesas_rom_erase(struct pci_dev *pdev)
/* sleep a bit while ROM is erased */
msleep(20);
- for (i = 0; i < RENESAS_RETRY; i++) {
+ for (i = 0; i < RENESAS_CHIP_ERASE_RETRY; i++) {
retval = pci_read_config_byte(pdev, RENESAS_ROM_STATUS,
&status);
status &= RENESAS_ROM_STATUS_ERASE;
--
2.47.2
The patch that broke text mode VGA-console scrolling is this one:
"vgacon: Add check for vc_origin address range in vgacon_scroll()"
commit 864f9963ec6b4b76d104d595ba28110b87158003 upstream.
How to preproduce:
(1) boot a kernel that is configured to use text mode VGA-console
(2) type commands: ls -l /usr/bin | less -S
(3) scroll up/down with cursor-down/up keys
Above mentioned patch seems to have landed in upstream and all
kernel.org stable trees with zero testing. Even minimal testing
would have shown that it breaks text mode VGA-console scrolling.
Greg, Sasha, Linus,
Please consider reverting that buggy patch from all affected trees.
--
Jari Ruusu 4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD ACDF F073 3C80 8132 F189
The quilt patch titled
Subject: mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
has been removed from the -mm tree. Its filename was
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
Date: Mon, 28 Jul 2025 10:53:55 -0700
By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
[surenb(a)google.com: v3]
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmap_lock.h | 30 ++++++++++++++++++++++++++++++
mm/mmap_lock.c | 10 +++-------
2 files changed, 33 insertions(+), 7 deletions(-)
--- a/include/linux/mmap_lock.h~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped
+++ a/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwai
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -154,6 +155,10 @@ static inline void vma_refcount_put(stru
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -183,6 +188,31 @@ static inline struct vm_area_struct *vma
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
--- a/mm/mmap_lock.c~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped
+++ a/mm/mmap_lock.c
@@ -164,8 +164,7 @@ retry:
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ retry:
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
_
Patches currently in -mm which might be from surenb(a)google.com are
The quilt patch titled
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hang
has been removed from the -mm tree. Its filename was
mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hang.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hang
Date: Mon, 28 Jul 2025 15:52:59 +0800
The current swap-in code assumes that, when a swap entry in shmem mapping
is order 0, its cached folios (if present) must be order 0 too, which
turns out not always correct.
The problem is shmem_split_large_entry is called before verifying the
folio will eventually be swapped in, one possible race is:
CPU1 CPU2
shmem_swapin_folio
/* swap in of order > 0 swap entry S1 */
folio = swap_cache_get_folio
/* folio = NULL */
order = xa_get_order
/* order > 0 */
folio = shmem_swap_alloc_folio
/* mTHP alloc failure, folio = NULL */
<... Interrupted ...>
shmem_swapin_folio
/* S1 is swapped in */
shmem_writeout
/* S1 is swapped out, folio cached */
shmem_split_large_entry(..., S1)
/* S1 is split, but the folio covering it has order > 0 now */
Now any following swapin of S1 will hang: `xa_get_order` returns 0, and
folio lookup will return a folio with order > 0. The
`xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always
return false causing swap-in to return -EEXIST.
And this looks fragile. So fix this up by allowing seeing a larger folio
in swap cache, and check the whole shmem mapping range covered by the
swapin have the right swap value upon inserting the folio. And drop the
redundant tree walks before the insertion.
This will actually improve performance, as it avoids two redundant Xarray
tree walks in the hot path, and the only side effect is that in the
failure path, shmem may redundantly reallocate a few folios causing
temporary slight memory pressure.
And worth noting, it may seems the order and value check before inserting
might help reducing the lock contention, which is not true. The swap
cache layer ensures raced swapin will either see a swap cache folio or
failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is
bypassed), so holding the folio lock and checking the folio flag is
already good enough for avoiding the lock contention. The chance that a
folio passes the swap entry value check but the shmem mapping slot has
changed should be very low.
Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250728075306.12704-2-ryncsn@gmail.com
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reviewed-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
--- a/mm/shmem.c~mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hang
+++ a/mm/shmem.c
@@ -891,7 +891,9 @@ static int shmem_add_to_page_cache(struc
pgoff_t index, void *expected, gfp_t gfp)
{
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
- long nr = folio_nr_pages(folio);
+ unsigned long nr = folio_nr_pages(folio);
+ swp_entry_t iter, swap;
+ void *entry;
VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -903,14 +905,25 @@ static int shmem_add_to_page_cache(struc
gfp &= GFP_RECLAIM_MASK;
folio_throttle_swaprate(folio, gfp);
+ swap = radix_to_swp_entry(expected);
do {
+ iter = swap;
xas_lock_irq(&xas);
- if (expected != xas_find_conflict(&xas)) {
- xas_set_err(&xas, -EEXIST);
- goto unlock;
+ xas_for_each_conflict(&xas, entry) {
+ /*
+ * The range must either be empty, or filled with
+ * expected swap entries. Shmem swap entries are never
+ * partially freed without split of both entry and
+ * folio, so there shouldn't be any holes.
+ */
+ if (!expected || entry != swp_to_radix_entry(iter)) {
+ xas_set_err(&xas, -EEXIST);
+ goto unlock;
+ }
+ iter.val += 1 << xas_get_order(&xas);
}
- if (expected && xas_find_conflict(&xas)) {
+ if (expected && iter.val - nr != swap.val) {
xas_set_err(&xas, -EEXIST);
goto unlock;
}
@@ -2359,7 +2372,7 @@ static int shmem_swapin_folio(struct ino
error = -ENOMEM;
goto failed;
}
- } else if (order != folio_order(folio)) {
+ } else if (order > folio_order(folio)) {
/*
* Swap readahead may swap in order 0 folios into swapcache
* asynchronously, while the shmem mapping can still stores
@@ -2384,15 +2397,23 @@ static int shmem_swapin_folio(struct ino
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}
+ } else if (order < folio_order(folio)) {
+ swap.val = round_down(swap.val, 1 << folio_order(folio));
+ index = round_down(index, 1 << folio_order(folio));
}
alloced:
- /* We have to do this with folio locked to prevent races */
+ /*
+ * We have to do this with the folio locked to prevent races.
+ * The shmem_confirm_swap below only checks if the first swap
+ * entry matches the folio, that's enough to ensure the folio
+ * is not used outside of shmem, as shmem swap entries
+ * and swap cache folios are never partially freed.
+ */
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
- folio->swap.val != swap.val ||
!shmem_confirm_swap(mapping, index, swap) ||
- xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
+ folio->swap.val != swap.val) {
error = -EEXIST;
goto unlock;
}
_
Patches currently in -mm which might be from kasong(a)tencent.com are
The quilt patch titled
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
has been removed from the -mm tree. Its filename was
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3.patch
This patch was dropped because it was folded into mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
Date: Tue, 29 Jul 2025 07:57:09 -0700
- Addressed Lorenzo's nits, per Lorenzo Stoakes
- Added a warning comment for vma_start_read()
- Added Reviewed-by and Acked-by, per Vlastimil Babka and Lorenzo Stoakes
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmap_lock.h | 7 +++++++
mm/mmap_lock.c | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
--- a/include/linux/mmap_lock.h~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/include/linux/mmap_lock.h
@@ -155,6 +155,10 @@ static inline void vma_refcount_put(stru
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -194,9 +198,12 @@ static inline struct vm_area_struct *vma
if (unlikely(vma->vm_mm != mm)) {
/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
struct mm_struct *other_mm = vma->vm_mm;
+
/*
* __mmdrop() is a heavy operation and we don't need RCU
* protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
*/
rcu_read_unlock();
mmgrab(other_mm);
--- a/mm/mmap_lock.c~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/mm/mmap_lock.c
@@ -235,7 +235,7 @@ retry:
goto fallback;
}
- /* Verify the vma is not behind of the last search position. */
+ /* Verify the vma is not behind the last search position. */
if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
Hello maintainers,
This series addresses a defect observed on certain hardware platforms using Linux kernel 6.1.147 with the i915 driver. The issue concerns hot plug detection (HPD) logic,
leading to unreliable or missed detection events on affected hardware. This is happening on some specific devices.
### Background
Issue:
On Simatic IPC227E, we observed unreliable or missing hot plug detection events, while on Simatic IPC227G (otherwise similar platform), expected hot plug behavior was maintained.
Affected kernel:
This patch series is intended for the Linux 6.1.y stable tree only (tested on 6.1.147)
Most of the tests were conducted on 6.1.147 (manual/standalone kernel build, CIP/Isar context).
Root cause analysis:
I do not have access to hardware signal traces or scope data to conclusively prove the root cause at electrical level. My understanding is based on observed driver behavior and logs.
Therefore my assumption as to the real cause is that on IPC227G, HPD IRQ storms are apparently not occurring, so the standard HPD IRQ-based detection works as expected. On IPC227E,
frequent HPD interrupts trigger the i915 driver’s storm detection logic, causing it to switch to polling mode. Therefore polling does not resume correctly, leading to the hotplug
issue this series addresses. Device IPC227E's behavior triggers this kernel edge case, likely due to slight variations in signal integrity, electrical margins, or internal component timing.
Device IPC227G, functions as expected, possibly due to cleaner electrical signaling or more optimal timing characteristics, thus avoiding the triggering condition.
Conclusion:
This points to a hardware-software interaction where kernel code assumes nicer signaling or margins than IPC227E is able to provide, exposing logic gaps not visible on more robust hardware.
### Patches
Patches 1-4:
- Partial backports of upstream commits; only the relevant logic or fixes are applied, with other code omitted due to downstream divergence.
- Applied minimal merging without exhaustive backport of all intermediate upstream changes.
Patch 5:
- Contains cherry-picked logic plus context/compatibility amendments as needed. Ensures that the driver builds.
- Together these fixes greatly improve reliability of hotplug detection on both devices, with no regression detected in our setups.
Thank you for your review,
Nicusor Huhulea
This patch series contains the following changes:
Dmitry Baryshkov (2):
drm/probe_helper: extract two helper functions
drm/probe-helper: enable and disable HPD on connectors
Imre Deak (2):
drm/i915: Fix HPD polling, reenabling the output poll work as needed
drm: Add an HPD poll helper to reschedule the poll work
Nicusor Huhulea (1):
drm/i915: fixes for i915 Hot Plug Detection and build/runtime issues
drivers/gpu/drm/drm_probe_helper.c | 127 ++++++++++++++-----
drivers/gpu/drm/i915/display/intel_hotplug.c | 4 +-
include/drm/drm_modeset_helper_vtables.h | 22 ++++
include/drm/drm_probe_helper.h | 1 +
4 files changed, 122 insertions(+), 32 deletions(-)
--
2.39.2
The patch titled
Subject: mm: fix possible deadlock in console_trylock_spinning
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-possible-deadlock-in-console_trylock_spinning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gu Bowen <gubowen5(a)huawei.com>
Subject: mm: fix possible deadlock in console_trylock_spinning
Date: Wed, 30 Jul 2025 17:49:14 +0800
kmemleak_scan_thread() invokes scan_block() which may invoke a nomal
printk() to print warning message. This can cause a deadlock in the
scenario reported below:
CPU0 CPU1
---- ----
lock(kmemleak_lock);
lock(&port->lock);
lock(kmemleak_lock);
lock(console_owner);
To solve this problem, switch to printk_safe mode before printing warning
message, this will redirect all printk()-s to a special per-CPU buffer,
which will be flushed later from a safe context (irq work), and this
deadlock problem can be avoided.
Our syztester report the following lockdep error:
======================================================
WARNING: possible circular locking dependency detected
5.10.0-22221-gca646a51dd00 #16 Not tainted
------------------------------------------------------
kmemleak/182 is trying to acquire lock:
ffffffffaf9e9020 (console_owner){-...}-{0:0}, at: console_trylock_spinning+0xda/0x1d0 kernel/printk/printk.c:1900
but task is already holding lock:
ffffffffb007cf58 (kmemleak_lock){-.-.}-{2:2}, at: scan_block+0x3d/0x220 mm/kmemleak.c:1310
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (kmemleak_lock){-.-.}-{2:2}:
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3b/0x60 kernel/locking/spinlock.c:164
create_object.isra.0+0x36/0x80 mm/kmemleak.c:691
kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
slab_post_alloc_hook mm/slab.h:518 [inline]
slab_alloc_node mm/slub.c:2987 [inline]
slab_alloc mm/slub.c:2995 [inline]
__kmalloc+0x637/0xb60 mm/slub.c:4100
kmalloc include/linux/slab.h:620 [inline]
tty_buffer_alloc+0x127/0x140 drivers/tty/tty_buffer.c:176
__tty_buffer_request_room+0x9b/0x110 drivers/tty/tty_buffer.c:276
tty_insert_flip_string_fixed_flag+0x60/0x130 drivers/tty/tty_buffer.c:321
tty_insert_flip_string include/linux/tty_flip.h:36 [inline]
tty_insert_flip_string_and_push_buffer+0x3a/0xb0 drivers/tty/tty_buffer.c:578
process_output_block+0xc2/0x2e0 drivers/tty/n_tty.c:592
n_tty_write+0x298/0x540 drivers/tty/n_tty.c:2433
do_tty_write drivers/tty/tty_io.c:1041 [inline]
file_tty_write.constprop.0+0x29b/0x4b0 drivers/tty/tty_io.c:1147
redirected_tty_write+0x51/0x90 drivers/tty/tty_io.c:1176
call_write_iter include/linux/fs.h:2117 [inline]
do_iter_readv_writev+0x274/0x350 fs/read_write.c:741
do_iter_write+0xbb/0x1f0 fs/read_write.c:867
vfs_writev+0xfa/0x380 fs/read_write.c:940
do_writev+0xd6/0x1d0 fs/read_write.c:983
do_syscall_64+0x2b/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
-> #2 (&port->lock){-.-.}-{2:2}:
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3b/0x60 kernel/locking/spinlock.c:164
tty_port_tty_get+0x1f/0xa0 drivers/tty/tty_port.c:289
tty_port_default_wakeup+0xb/0x30 drivers/tty/tty_port.c:48
serial8250_tx_chars+0x259/0x430 drivers/tty/serial/8250/8250_port.c:1906
__start_tx drivers/tty/serial/8250/8250_port.c:1598 [inline]
serial8250_start_tx+0x304/0x320 drivers/tty/serial/8250/8250_port.c:1720
uart_write+0x1a1/0x2e0 drivers/tty/serial/serial_core.c:635
do_output_char+0x2c0/0x370 drivers/tty/n_tty.c:444
process_output drivers/tty/n_tty.c:511 [inline]
n_tty_write+0x269/0x540 drivers/tty/n_tty.c:2445
do_tty_write drivers/tty/tty_io.c:1041 [inline]
file_tty_write.constprop.0+0x29b/0x4b0 drivers/tty/tty_io.c:1147
call_write_iter include/linux/fs.h:2117 [inline]
do_iter_readv_writev+0x274/0x350 fs/read_write.c:741
do_iter_write+0xbb/0x1f0 fs/read_write.c:867
vfs_writev+0xfa/0x380 fs/read_write.c:940
do_writev+0xd6/0x1d0 fs/read_write.c:983
do_syscall_64+0x2b/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
-> #1 (&port_lock_key){-.-.}-{2:2}:
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3b/0x60 kernel/locking/spinlock.c:164
serial8250_console_write+0x292/0x320 drivers/tty/serial/8250/8250_port.c:3458
call_console_drivers.constprop.0+0x185/0x240 kernel/printk/printk.c:1988
console_unlock+0x2b4/0x640 kernel/printk/printk.c:2648
register_console.part.0+0x2a1/0x390 kernel/printk/printk.c:3024
univ8250_console_init+0x24/0x2b drivers/tty/serial/8250/8250_core.c:724
console_init+0x188/0x24b kernel/printk/printk.c:3134
start_kernel+0x2b0/0x41e init/main.c:1072
secondary_startup_64_no_verify+0xc3/0xcb
-> #0 (console_owner){-...}-{0:0}:
check_prev_add+0xfa/0x1380 kernel/locking/lockdep.c:2988
check_prevs_add+0x1d8/0x3c0 kernel/locking/lockdep.c:3113
validate_chain+0x5df/0xac0 kernel/locking/lockdep.c:3729
__lock_acquire+0x514/0x940 kernel/locking/lockdep.c:4958
lock_acquire+0x15a/0x3a0 kernel/locking/lockdep.c:5569
console_trylock_spinning+0x10d/0x1d0 kernel/printk/printk.c:1921
vprintk_emit+0x1a5/0x270 kernel/printk/printk.c:2134
printk+0xb2/0xe7 kernel/printk/printk.c:2183
lookup_object.cold+0xf/0x24 mm/kmemleak.c:405
scan_block+0x1fa/0x220 mm/kmemleak.c:1357
scan_object+0xdd/0x140 mm/kmemleak.c:1415
scan_gray_list+0x8f/0x1c0 mm/kmemleak.c:1453
kmemleak_scan+0x649/0xf30 mm/kmemleak.c:1608
kmemleak_scan_thread+0x94/0xb6 mm/kmemleak.c:1721
kthread+0x1c4/0x210 kernel/kthread.c:328
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299
other info that might help us debug this:
Chain exists of:
console_owner --> &port->lock --> kmemleak_lock
Link: https://lkml.kernel.org/r/20250730094914.566582-1-gubowen5@huawei.com
Signed-off-by: Gu Bowen <gubowen5(a)huawei.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Lu Jialin <lujialin4(a)huawei.com>
Cc: Waiman Long <longman(a)redhat.com>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: <stable(a)vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/kmemleak.c~mm-fix-possible-deadlock-in-console_trylock_spinning
+++ a/mm/kmemleak.c
@@ -437,9 +437,11 @@ static struct kmemleak_object *__lookup_
else if (untagged_objp == untagged_ptr || alias)
return object;
else {
+ __printk_safe_enter();
kmemleak_warn("Found object by alias at 0x%08lx\n",
ptr);
dump_object_info(object);
+ __printk_safe_exit();
break;
}
}
_
Patches currently in -mm which might be from gubowen5(a)huawei.com are
mm-fix-possible-deadlock-in-console_trylock_spinning.patch
The quilt patch titled
Subject: mm: shmem: fix the shmem large folio allocation for the i915 driver
has been removed from the -mm tree. Its filename was
mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver.patch
This patch was dropped because an alternative patch was or shall be merged
------------------------------------------------------
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Subject: mm: shmem: fix the shmem large folio allocation for the i915 driver
Date: Mon, 28 Jul 2025 16:03:53 +0800
After commit acd7ccb284b8 ("mm: shmem: add large folio support for
tmpfs"), we extend the 'huge=' option to allow any sized large folios for
tmpfs, which means tmpfs will allow getting a highest order hint based on
the size of write() and fallocate() paths, and then will try each
allowable large order.
However, when the i915 driver allocates shmem memory, it doesn't provide
hint information about the size of the large folio to be allocated,
resulting in the inability to allocate PMD-sized shmem, which in turn
affects GPU performance.
To fix this issue, add the 'end' information for shmem_read_folio_gfp() to
help allocate PMD-sized large folios. Additionally, use the maximum
allocation chunk (via mapping_max_folio_size()) to determine the size of
the large folios to allocate in the i915 driver.
Patryk added:
: In my tests, the performance drop ranges from a few percent up to 13%
: in Unigine Superposition under heavy memory usage on the CPU Core Ultra
: 155H with the Xe 128 EU GPU. Other users have reported performance
: impact up to 30% on certain workloads. Please find more in the
: regressions reports:
: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14645
: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845
:
: I believe the change should be backported to all active kernel branches
: after version 6.12.
Link: https://lkml.kernel.org/r/0d734549d5ed073c80b11601da3abdd5223e1889.17536898…
Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reported-by: Patryk Kowalczyk <patryk(a)kowalczyk.ws>
Reported-by: Ville Syrj��l�� <ville.syrjala(a)linux.intel.com>
Tested-by: Patryk Kowalczyk <patryk(a)kowalczyk.ws>
Cc: Christan K��nig <christian.koenig(a)amd.com>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Huang Ray <Ray.Huang(a)amd.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Jonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Mathew Brost <matthew.brost(a)intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Thomas Zimemrmann <tzimmermann(a)suse.de>
Cc: Tvrtko Ursulin <tursulin(a)ursulin.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/gpu/drm/drm_gem.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 7 ++++++-
drivers/gpu/drm/ttm/ttm_backup.c | 2 +-
include/linux/shmem_fs.h | 4 ++--
mm/shmem.c | 7 ++++---
5 files changed, 14 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/drm_gem.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/drivers/gpu/drm/drm_gem.c
@@ -627,7 +627,7 @@ struct page **drm_gem_get_pages(struct d
i = 0;
while (i < npages) {
long nr;
- folio = shmem_read_folio_gfp(mapping, i,
+ folio = shmem_read_folio_gfp(mapping, i, 0,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
goto fail;
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -69,6 +69,7 @@ int shmem_sg_alloc_table(struct drm_i915
struct scatterlist *sg;
unsigned long next_pfn = 0; /* suppress gcc warning */
gfp_t noreclaim;
+ size_t chunk;
int ret;
if (overflows_type(size / PAGE_SIZE, page_count))
@@ -94,6 +95,7 @@ int shmem_sg_alloc_table(struct drm_i915
mapping_set_unevictable(mapping);
noreclaim = mapping_gfp_constraint(mapping, ~__GFP_RECLAIM);
noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
+ chunk = mapping_max_folio_size(mapping);
sg = st->sgl;
st->nents = 0;
@@ -105,10 +107,13 @@ int shmem_sg_alloc_table(struct drm_i915
0,
}, *s = shrink;
gfp_t gfp = noreclaim;
+ loff_t bytes = (page_count - i) << PAGE_SHIFT;
+ loff_t pos = i << PAGE_SHIFT;
+ bytes = min_t(loff_t, chunk, bytes);
do {
cond_resched();
- folio = shmem_read_folio_gfp(mapping, i, gfp);
+ folio = shmem_read_folio_gfp(mapping, i, pos + bytes, gfp);
if (!IS_ERR(folio))
break;
--- a/drivers/gpu/drm/ttm/ttm_backup.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/drivers/gpu/drm/ttm/ttm_backup.c
@@ -100,7 +100,7 @@ ttm_backup_backup_page(struct file *back
struct folio *to_folio;
int ret;
- to_folio = shmem_read_folio_gfp(mapping, idx, alloc_gfp);
+ to_folio = shmem_read_folio_gfp(mapping, idx, 0, alloc_gfp);
if (IS_ERR(to_folio))
return PTR_ERR(to_folio);
--- a/include/linux/shmem_fs.h~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/include/linux/shmem_fs.h
@@ -153,12 +153,12 @@ enum sgp_type {
int shmem_get_folio(struct inode *inode, pgoff_t index, loff_t write_end,
struct folio **foliop, enum sgp_type sgp);
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
- pgoff_t index, gfp_t gfp);
+ pgoff_t index, loff_t end, gfp_t gfp);
static inline struct folio *shmem_read_folio(struct address_space *mapping,
pgoff_t index)
{
- return shmem_read_folio_gfp(mapping, index, mapping_gfp_mask(mapping));
+ return shmem_read_folio_gfp(mapping, index, 0, mapping_gfp_mask(mapping));
}
static inline struct page *shmem_read_mapping_page(
--- a/mm/shmem.c~mm-shmem-fix-the-shmem-large-folio-allocation-for-the-i915-driver
+++ a/mm/shmem.c
@@ -5930,6 +5930,7 @@ int shmem_zero_setup(struct vm_area_stru
* shmem_read_folio_gfp - read into page cache, using specified page allocation flags.
* @mapping: the folio's address_space
* @index: the folio index
+ * @end: end of a read if allocating a new folio
* @gfp: the page allocator flags to use if allocating
*
* This behaves as a tmpfs "read_cache_page_gfp(mapping, index, gfp)",
@@ -5942,14 +5943,14 @@ int shmem_zero_setup(struct vm_area_stru
* with the mapping_gfp_mask(), to avoid OOMing the machine unnecessarily.
*/
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
- pgoff_t index, gfp_t gfp)
+ pgoff_t index, loff_t end, gfp_t gfp)
{
#ifdef CONFIG_SHMEM
struct inode *inode = mapping->host;
struct folio *folio;
int error;
- error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE,
+ error = shmem_get_folio_gfp(inode, index, end, &folio, SGP_CACHE,
gfp, NULL, NULL);
if (error)
return ERR_PTR(error);
@@ -5968,7 +5969,7 @@ EXPORT_SYMBOL_GPL(shmem_read_folio_gfp);
struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp)
{
- struct folio *folio = shmem_read_folio_gfp(mapping, index, gfp);
+ struct folio *folio = shmem_read_folio_gfp(mapping, index, 0, gfp);
struct page *page;
if (IS_ERR(folio))
_
Patches currently in -mm which might be from baolin.wang(a)linux.alibaba.com are
The following changes since commit 347e9f5043c89695b01e66b3ed111755afcf1911:
Linux 6.16-rc6 (2025-07-13 14:25:58 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 6693731487a8145a9b039bc983d77edc47693855:
vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers (2025-08-01 09:11:09 -0400)
Changes from v1:
drop commits that I put in there by mistake. Sorry!
----------------------------------------------------------------
virtio, vhost: features, fixes
vhost can now support legacy threading
if enabled in Kconfig
vsock memory allocation strategies for
large buffers have been improved,
reducing pressure on kmalloc
vhost now supports the in-order feature
guest bits missed the merge window
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Alok Tiwari (4):
virtio: Fix typo in register_virtio_device() doc comment
vhost-scsi: Fix typos and formatting in comments and logs
vhost: Fix typos
vhost-scsi: Fix check for inline_sg_cnt exceeding preallocated limit
Anders Roxell (1):
vdpa: Fix IDR memory leak in VDUSE module exit
Cindy Lu (1):
vhost: Reintroduce kthread API and add mode selection
Dr. David Alan Gilbert (2):
vhost: vringh: Remove unused iotlb functions
vhost: vringh: Remove unused functions
Dragos Tatulea (2):
vdpa/mlx5: Fix needs_teardown flag calculation
vdpa/mlx5: Fix release of uninitialized resources on error path
Gerd Hoffmann (1):
drm/virtio: implement virtio_gpu_shutdown
Jason Wang (3):
vhost: fail early when __vhost_add_used() fails
vhost: basic in order support
vhost_net: basic in_order support
Michael S. Tsirkin (2):
virtio: fix comments, readability
virtio: document ENOSPC
Mike Christie (1):
vhost-scsi: Fix log flooding with target does not exist errors
Pei Xiao (1):
vhost: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))
Viresh Kumar (2):
virtio-mmio: Remove virtqueue list from mmio device
virtio-vdpa: Remove virtqueue list
WangYuli (1):
virtio: virtio_dma_buf: fix missing parameter documentation
Will Deacon (9):
vhost/vsock: Avoid allocating arbitrarily-sized SKBs
vsock/virtio: Validate length in packet header before skb_put()
vsock/virtio: Move length check to callers of virtio_vsock_skb_rx_put()
vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page
vsock/virtio: Rename virtio_vsock_alloc_skb()
vsock/virtio: Move SKB allocation lower-bound check to callers
vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
vsock/virtio: Rename virtio_vsock_skb_rx_put()
vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers
drivers/gpu/drm/virtio/virtgpu_drv.c | 8 +-
drivers/vdpa/mlx5/core/mr.c | 3 +
drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 1 +
drivers/vhost/Kconfig | 18 ++
drivers/vhost/net.c | 88 +++++---
drivers/vhost/scsi.c | 24 +-
drivers/vhost/vhost.c | 377 ++++++++++++++++++++++++++++----
drivers/vhost/vhost.h | 30 ++-
drivers/vhost/vringh.c | 118 ----------
drivers/vhost/vsock.c | 15 +-
drivers/virtio/virtio.c | 7 +-
drivers/virtio/virtio_dma_buf.c | 2 +
drivers/virtio/virtio_mmio.c | 52 +----
drivers/virtio/virtio_ring.c | 4 +
drivers/virtio/virtio_vdpa.c | 44 +---
include/linux/virtio.h | 2 +-
include/linux/virtio_vsock.h | 46 +++-
include/linux/vringh.h | 12 -
include/uapi/linux/vhost.h | 29 +++
kernel/vhost_task.c | 2 +-
net/vmw_vsock/virtio_transport.c | 20 +-
net/vmw_vsock/virtio_transport_common.c | 3 +-
23 files changed, 575 insertions(+), 342 deletions(-)
The patch titled
Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Breno Leitao <leitao(a)debian.org>
Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
Date: Thu, 31 Jul 2025 02:57:18 -0700
When netpoll is enabled, calling pr_warn_once() while holding
kmemleak_lock in mem_pool_alloc() can cause a deadlock due to lock
inversion with the netconsole subsystem. This occurs because
pr_warn_once() may trigger netpoll, which eventually leads to
__alloc_skb() and back into kmemleak code, attempting to reacquire
kmemleak_lock.
This is the path for the deadlock.
mem_pool_alloc()
-> raw_spin_lock_irqsave(&kmemleak_lock, flags);
-> pr_warn_once()
-> netconsole subsystem
-> netpoll
-> __alloc_skb
-> __create_object
-> raw_spin_lock_irqsave(&kmemleak_lock, flags);
Fix this by setting a flag and issuing the pr_warn_once() after
kmemleak_lock is released.
Link: https://lkml.kernel.org/r/20250731-kmemleak_lock-v1-1-728fd470198f@debian.o…
Fixes: c5665868183fec ("mm: kmemleak: use the memory pool for early allocations")
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reported-by: Jakub Kicinski <kuba(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock
+++ a/mm/kmemleak.c
@@ -470,6 +470,7 @@ static struct kmemleak_object *mem_pool_
{
unsigned long flags;
struct kmemleak_object *object;
+ bool warn = false;
/* try the slab allocator first */
if (object_cache) {
@@ -488,8 +489,10 @@ static struct kmemleak_object *mem_pool_
else if (mem_pool_free_count)
object = &mem_pool[--mem_pool_free_count];
else
- pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
+ warn = true;
raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
+ if (warn)
+ pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
return object;
}
_
Patches currently in -mm which might be from leitao(a)debian.org are
mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock.patch
The patch titled
Subject: mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-fix-kmap_local-lifo-ordering-for-config_highpte.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sasha Levin <sashal(a)kernel.org>
Subject: mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
Date: Thu, 31 Jul 2025 10:44:31 -0400
With CONFIG_HIGHPTE on 32-bit ARM, move_pages_pte() maps PTE pages using
kmap_local_page(), which requires unmapping in Last-In-First-Out order.
The current code maps dst_pte first, then src_pte, but unmaps them in the
same order (dst_pte, src_pte), violating the LIFO requirement. This
causes the warning in kunmap_local_indexed():
WARNING: CPU: 0 PID: 604 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
addr \!= __fix_to_virt(FIX_KMAP_BEGIN + idx)
Fix this by reversing the unmap order to respect LIFO ordering.
This issue follows the same pattern as similar fixes:
- commit eca6828403b8 ("crypto: skcipher - fix mismatch between mapping and unmapping order")
- commit 8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
Both of which addressed the same fundamental requirement that kmap_local
operations must follow LIFO ordering.
Link: https://lkml.kernel.org/r/20250731144431.773923-1-sashal@kernel.org
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/userfaultfd.c~mm-userfaultfd-fix-kmap_local-lifo-ordering-for-config_highpte
+++ a/mm/userfaultfd.c
@@ -1453,10 +1453,15 @@ out:
folio_unlock(src_folio);
folio_put(src_folio);
}
- if (dst_pte)
- pte_unmap(dst_pte);
+ /*
+ * Unmap in reverse order (LIFO) to maintain proper kmap_local
+ * index ordering when CONFIG_HIGHPTE is enabled. We mapped dst_pte
+ * first, then src_pte, so we must unmap src_pte first, then dst_pte.
+ */
if (src_pte)
pte_unmap(src_pte);
+ if (dst_pte)
+ pte_unmap(dst_pte);
mmu_notifier_invalidate_range_end(&range);
if (si)
put_swap_device(si);
_
Patches currently in -mm which might be from sashal(a)kernel.org are
mm-userfaultfd-fix-kmap_local-lifo-ordering-for-config_highpte.patch
The patch titled
Subject: mm/debug_vm_pgtable: clear page table entries at destroy_args()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Herton R. Krzesinski" <herton(a)redhat.com>
Subject: mm/debug_vm_pgtable: clear page table entries at destroy_args()
Date: Thu, 31 Jul 2025 18:40:51 -0300
The mm/debug_vm_pagetable test allocates manually page table entries for
the tests it runs, using also its manually allocated mm_struct. That in
itself is ok, but when it exits, at destroy_args() it fails to clear those
entries with the *_clear functions.
The problem is that leaves stale entries. If another process allocates an
mm_struct with a pgd at the same address, it may end up running into the
stale entry. This is happening in practice on a debug kernel with
CONFIG_DEBUG_VM_PGTABLE=y, for example this is the output with some extra
debugging I added (it prints a warning trace if pgtables_bytes goes
negative, in addition to the warning at check_mm() function):
[ 2.539353] debug_vm_pgtable: [get_random_vaddr ]: random_vaddr is 0x7ea247140000
[ 2.539366] kmem_cache info
[ 2.539374] kmem_cachep 0x000000002ce82385 - freelist 0x0000000000000000 - offset 0x508
[ 2.539447] debug_vm_pgtable: [init_args ]: args->mm is 0x000000002267cc9e
(...)
[ 2.552800] WARNING: CPU: 5 PID: 116 at include/linux/mm.h:2841 free_pud_range+0x8bc/0x8d0
[ 2.552816] Modules linked in:
[ 2.552843] CPU: 5 UID: 0 PID: 116 Comm: modprobe Not tainted 6.12.0-105.debug_vm2.el10.ppc64le+debug #1 VOLUNTARY
[ 2.552859] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW910.00 (VL910_062) hv:phyp pSeries
[ 2.552872] NIP: c0000000007eef3c LR: c0000000007eef30 CTR: c0000000003d8c90
[ 2.552885] REGS: c0000000622e73b0 TRAP: 0700 Not tainted (6.12.0-105.debug_vm2.el10.ppc64le+debug)
[ 2.552899] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 0000000a
[ 2.552954] CFAR: c0000000008f03f0 IRQMASK: 0
[ 2.552954] GPR00: c0000000007eef30 c0000000622e7650 c000000002b1ac00 0000000000000001
[ 2.552954] GPR04: 0000000000000008 0000000000000000 c0000000007eef30 ffffffffffffffff
[ 2.552954] GPR08: 00000000ffff00f5 0000000000000001 0000000000000048 0000000000004000
[ 2.552954] GPR12: 00000003fa440000 c000000017ffa300 c0000000051d9f80 ffffffffffffffdb
[ 2.552954] GPR16: 0000000000000000 0000000000000008 000000000000000a 60000000000000e0
[ 2.552954] GPR20: 4080000000000000 c0000000113af038 00007fffcf130000 0000700000000000
[ 2.552954] GPR24: c000000062a6a000 0000000000000001 8000000062a68000 0000000000000001
[ 2.552954] GPR28: 000000000000000a c000000062ebc600 0000000000002000 c000000062ebc760
[ 2.553170] NIP [c0000000007eef3c] free_pud_range+0x8bc/0x8d0
[ 2.553185] LR [c0000000007eef30] free_pud_range+0x8b0/0x8d0
[ 2.553199] Call Trace:
[ 2.553207] [c0000000622e7650] [c0000000007eef30] free_pud_range+0x8b0/0x8d0 (unreliable)
[ 2.553229] [c0000000622e7750] [c0000000007f40b4] free_pgd_range+0x284/0x3b0
[ 2.553248] [c0000000622e7800] [c0000000007f4630] free_pgtables+0x450/0x570
[ 2.553274] [c0000000622e78e0] [c0000000008161c0] exit_mmap+0x250/0x650
[ 2.553292] [c0000000622e7a30] [c0000000001b95b8] __mmput+0x98/0x290
[ 2.558344] [c0000000622e7a80] [c0000000001d1018] exit_mm+0x118/0x1b0
[ 2.558361] [c0000000622e7ac0] [c0000000001d141c] do_exit+0x2ec/0x870
[ 2.558376] [c0000000622e7b60] [c0000000001d1ca8] do_group_exit+0x88/0x150
[ 2.558391] [c0000000622e7bb0] [c0000000001d1db8] sys_exit_group+0x48/0x50
[ 2.558407] [c0000000622e7be0] [c00000000003d810] system_call_exception+0x1e0/0x4c0
[ 2.558423] [c0000000622e7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
(...)
[ 2.558892] ---[ end trace 0000000000000000 ]---
[ 2.559022] BUG: Bad rss-counter state mm:000000002267cc9e type:MM_ANONPAGES val:1
[ 2.559037] BUG: non-zero pgtables_bytes on freeing mm: -6144
Here the modprobe process ended up with an allocated mm_struct from the
mm_struct slab that was used before by the debug_vm_pgtable test. That is
not a problem, since the mm_struct is initialized again etc., however, if
it ends up using the same pgd table, it bumps into the old stale entry
when clearing/freeing the page table entries, so it tries to free an entry
already gone (that one which was allocated by the debug_vm_pgtable test),
which also explains the negative pgtables_bytes since it's accounting for
not allocated entries in the current process.
As far as I looked pgd_{alloc,free} etc. does not clear entries, and
clearing of the entries is explicitly done in the free_pgtables->
free_pgd_range->free_p4d_range->free_pud_range->free_pmd_range->
free_pte_range path. However, the debug_vm_pgtable test does not call
free_pgtables, since it allocates mm_struct and entries manually for its
test and eg. not goes through page faults. So it also should clear
manually the entries before exit at destroy_args().
This problem was noticed on a reboot X number of times test being done on
a powerpc host, with a debug kernel with CONFIG_DEBUG_VM_PGTABLE enabled.
Depends on the system, but on a 100 times reboot loop the problem could
manifest once or twice, if a process ends up getting the right mm->pgd
entry with the stale entries used by mm/debug_vm_pagetable. After using
this patch, I couldn't reproduce/experience the problems anymore. I was
able to reproduce the problem as well on latest upstream kernel (6.16).
I also modified destroy_args() to use mmput() instead of mmdrop(), there
is no reason to hold mm_users reference and not release the mm_struct
entirely, and in the output above with my debugging prints I already had
patched it to use mmput, it did not fix the problem, but helped in the
debugging as well.
Link: https://lkml.kernel.org/r/20250731214051.4115182-1-herton@redhat.com
Fixes: 3c9b84f044a9e ("mm/debug_vm_pgtable: introduce struct pgtable_debug_args")
Signed-off-by: Herton R. Krzesinski <herton(a)redhat.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Gavin Shan <gshan(a)redhat.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/debug_vm_pgtable.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args
+++ a/mm/debug_vm_pgtable.c
@@ -1041,29 +1041,34 @@ static void __init destroy_args(struct p
/* Free page table entries */
if (args->start_ptep) {
+ pmd_clear(args->pmdp);
pte_free(args->mm, args->start_ptep);
mm_dec_nr_ptes(args->mm);
}
if (args->start_pmdp) {
+ pud_clear(args->pudp);
pmd_free(args->mm, args->start_pmdp);
mm_dec_nr_pmds(args->mm);
}
if (args->start_pudp) {
+ p4d_clear(args->p4dp);
pud_free(args->mm, args->start_pudp);
mm_dec_nr_puds(args->mm);
}
- if (args->start_p4dp)
+ if (args->start_p4dp) {
+ pgd_clear(args->pgdp);
p4d_free(args->mm, args->start_p4dp);
+ }
/* Free vma and mm struct */
if (args->vma)
vm_area_free(args->vma);
if (args->mm)
- mmdrop(args->mm);
+ mmput(args->mm);
}
static struct page * __init
_
Patches currently in -mm which might be from herton(a)redhat.com are
mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args.patch
To prevent timing attacks, HMAC value comparison needs to be constant
time. Replace the memcmp() with the correct function, crypto_memneq().
Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
drivers/char/tpm/Kconfig | 1 +
drivers/char/tpm/tpm2-sessions.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index dddd702b2454a..f9d8a4e966867 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -31,10 +31,11 @@ config TCG_TPM2_HMAC
bool "Use HMAC and encrypted transactions on the TPM bus"
default X86_64
select CRYPTO_ECDH
select CRYPTO_LIB_AESCFB
select CRYPTO_LIB_SHA256
+ select CRYPTO_LIB_UTILS
help
Setting this causes us to deploy a scheme which uses request
and response HMACs in addition to encryption for
communicating with the TPM to prevent or detect bus snooping
and interposer attacks (see tpm-security.rst). Saying Y
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index bdb119453dfbe..5fbd62ee50903 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -69,10 +69,11 @@
#include <linux/unaligned.h>
#include <crypto/kpp.h>
#include <crypto/ecdh.h>
#include <crypto/hash.h>
#include <crypto/hmac.h>
+#include <crypto/utils.h>
/* maximum number of names the TPM must remember for authorization */
#define AUTH_MAX_NAMES 3
#define AES_KEY_BYTES AES_KEYSIZE_128
@@ -827,16 +828,15 @@ int tpm_buf_check_hmac_response(struct tpm_chip *chip, struct tpm_buf *buf,
sha256_update(&sctx, auth->our_nonce, sizeof(auth->our_nonce));
sha256_update(&sctx, &auth->attrs, 1);
/* we're done with the rphash, so put our idea of the hmac there */
tpm2_hmac_final(&sctx, auth->session_key, sizeof(auth->session_key)
+ auth->passphrase_len, rphash);
- if (memcmp(rphash, &buf->data[offset_s], SHA256_DIGEST_SIZE) == 0) {
- rc = 0;
- } else {
+ if (crypto_memneq(rphash, &buf->data[offset_s], SHA256_DIGEST_SIZE)) {
dev_err(&chip->dev, "TPM: HMAC check failed\n");
goto out;
}
+ rc = 0;
/* now do response decryption */
if (auth->attrs & TPM2_SA_ENCRYPT) {
/* need key and IV */
tpm2_KDFa(auth->session_key, SHA256_DIGEST_SIZE
--
2.50.1
The patch titled
Subject: kunit: kasan_test: disable fortify string checker on kasan_strings() test
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kunit-kasan_test-disable-fortify-string-checker-on-kasan_strings-test.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yeoreum Yun <yeoreum.yun(a)arm.com>
Subject: kunit: kasan_test: disable fortify string checker on kasan_strings() test
Date: Fri, 1 Aug 2025 13:02:36 +0100
Similar to commit 09c6304e38e4 ("kasan: test: fix compatibility with
FORTIFY_SOURCE") the kernel is panicing in kasan_string().
This is due to the `src` and `ptr` not being hidden from the optimizer
which would disable the runtime fortify string checker.
Call trace:
__fortify_panic+0x10/0x20 (P)
kasan_strings+0x980/0x9b0
kunit_try_run_case+0x68/0x190
kunit_generic_run_threadfn_adapter+0x34/0x68
kthread+0x1c4/0x228
ret_from_fork+0x10/0x20
Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000)
---[ end trace 0000000000000000 ]---
note: kunit_try_catch[128] exited with irqs disabled
note: kunit_try_catch[128] exited with preempt_count 1
# kasan_strings: try faulted: last
** replaying previous printk message **
# kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600
# kasan_strings: internal error occurred preventing test case from running: -4
Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com
Fixes: 73228c7ecc5e ("KASAN: port KASAN Tests to KUnit")
Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan_test_c.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/kasan/kasan_test_c.c~kunit-kasan_test-disable-fortify-string-checker-on-kasan_strings-test
+++ a/mm/kasan/kasan_test_c.c
@@ -1578,9 +1578,11 @@ static void kasan_strings(struct kunit *
ptr = kmalloc(size, GFP_KERNEL | __GFP_ZERO);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
+ OPTIMIZER_HIDE_VAR(ptr);
src = kmalloc(KASAN_GRANULE_SIZE, GFP_KERNEL | __GFP_ZERO);
strscpy(src, "f0cacc1a0000000", KASAN_GRANULE_SIZE);
+ OPTIMIZER_HIDE_VAR(src);
/*
* Make sure that strscpy() does not trigger KASAN if it overreads into
_
Patches currently in -mm which might be from yeoreum.yun(a)arm.com are
kunit-kasan_test-disable-fortify-string-checker-on-kasan_strings-test.patch
The commit in the Fixes tag breaks my laptop (found by git bisect).
My home RJ45 LAN cable cannot connect after that commit.
The call to netif_carrier_on() should be done when netif_carrier_ok()
is false. Not when it's true. Because calling netif_carrier_on() when
__LINK_STATE_NOCARRIER is not set actually does nothing.
Cc: Armando Budianto <sprite(a)gnuweeb.org>
Cc: stable(a)vger.kernel.org
Closes: https://lore.kernel.org/netdev/0752dee6-43d6-4e1f-81d2-4248142cccd2@gnuweeb…
Fixes: 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event")
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
drivers/net/usb/usbnet.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index bc1d8631ffe0..1eb98eeb64f9 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1107,31 +1107,31 @@ static const struct ethtool_ops usbnet_ethtool_ops = {
};
/*-------------------------------------------------------------------------*/
static void __handle_link_change(struct usbnet *dev)
{
if (!test_bit(EVENT_DEV_OPEN, &dev->flags))
return;
if (!netif_carrier_ok(dev->net)) {
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
/* kill URBs for reading packets to save bus bandwidth */
unlink_urbs(dev, &dev->rxq);
/*
* tx_timeout will unlink URBs for sending packets and
* tx queue is stopped by netcore after link becomes off
*/
} else {
- if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
- netif_carrier_on(dev->net);
-
/* submitting URBs for reading packets */
tasklet_schedule(&dev->bh);
}
/* hard_mtu or rx_urb_size may change during link change */
usbnet_update_max_qlen(dev);
clear_bit(EVENT_LINK_CHANGE, &dev->flags);
}
--
Ammar Faizi
The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
up and down events when the WWAN interface is activated on the modem-side.
Interrupt URBs will in consecutive polls grab:
* Link Connected
* Link Disconnected
* Link Connected
Where the last Connected is then a stable link state.
When the system is under load this may cause the unlink_urbs() work in
__handle_link_change() to not complete before the next usbnet_link_change()
call turns the carrier on again, allowing rx_submit() to queue new SKBs.
In that event the URB queue is filled faster than it can drain, ending up
in a RCU stall:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
Call trace:
arch_local_irq_enable+0x4/0x8
local_bh_enable+0x18/0x20
__netdev_alloc_skb+0x18c/0x1cc
rx_submit+0x68/0x1f8 [usbnet]
rx_alloc_submit+0x4c/0x74 [usbnet]
usbnet_bh+0x1d8/0x218 [usbnet]
usbnet_bh_tasklet+0x10/0x18 [usbnet]
tasklet_action_common+0xa8/0x110
tasklet_action+0x2c/0x34
handle_softirqs+0x2cc/0x3a0
__do_softirq+0x10/0x18
____do_softirq+0xc/0x14
call_on_irq_stack+0x24/0x34
do_softirq_own_stack+0x18/0x20
__irq_exit_rcu+0xa8/0xb8
irq_exit_rcu+0xc/0x30
el1_interrupt+0x34/0x48
el1h_64_irq_handler+0x14/0x1c
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0x38/0x48
xhci_urb_dequeue+0x1ac/0x45c [xhci_hcd]
unlink1+0xd4/0xdc [usbcore]
usb_hcd_unlink_urb+0x70/0xb0 [usbcore]
usb_unlink_urb+0x24/0x44 [usbcore]
unlink_urbs.constprop.0.isra.0+0x64/0xa8 [usbnet]
__handle_link_change+0x34/0x70 [usbnet]
usbnet_deferred_kevent+0x1c0/0x320 [usbnet]
process_scheduled_works+0x2d0/0x48c
worker_thread+0x150/0x1dc
kthread+0xd8/0xe8
ret_from_fork+0x10/0x20
Get around the problem by delaying the carrier on to the scheduled work.
This needs a new flag to keep track of the necessary action.
The carrier ok check cannot be removed as it remains required for the
LINK_RESET event flow.
Fixes: 4b49f58fff00 ("usbnet: handle link change")
Cc: stable(a)vger.kernel.org
Signed-off-by: John Ernberg <john.ernberg(a)actia.se>
---
I've been testing this quite aggressively over a night, and seems equally
stable to my first approach. I'm a little bit concerned that the bit stuff
can now race (although much smaller) in the opposite direction, that a
carrier off can occur between test_and_clear_bit() and the carrier on
action in the handler. Leaving the carrier on when it shouldn't be.
v2:
- target tree in patch description.
- Drop Ming Lei from address list as their address bounces.
- Rework solution based on feedback by Jakub (let me know if you want a
Suggested-by tag, if we're keeping this direction)
v1: https://lore.kernel.org/netdev/20250710085028.1070922-1-john.ernberg@actia.…
Tested on 6.12.20 and forward ported. Stack trace from 6.12.20.
---
drivers/net/usb/usbnet.c | 11 ++++++++---
include/linux/usb/usbnet.h | 1 +
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index c04e715a4c2a..bc1d8631ffe0 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1122,6 +1122,9 @@ static void __handle_link_change(struct usbnet *dev)
* tx queue is stopped by netcore after link becomes off
*/
} else {
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
/* submitting URBs for reading packets */
tasklet_schedule(&dev->bh);
}
@@ -2009,10 +2012,12 @@ EXPORT_SYMBOL(usbnet_manage_power);
void usbnet_link_change(struct usbnet *dev, bool link, bool need_reset)
{
/* update link after link is reseted */
- if (link && !need_reset)
- netif_carrier_on(dev->net);
- else
+ if (link && !need_reset) {
+ set_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
+ } else {
+ clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
netif_carrier_off(dev->net);
+ }
if (need_reset && link)
usbnet_defer_kevent(dev, EVENT_LINK_RESET);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 0b9f1e598e3a..4bc6bb01a0eb 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -76,6 +76,7 @@ struct usbnet {
# define EVENT_LINK_CHANGE 11
# define EVENT_SET_RX_MODE 12
# define EVENT_NO_IP_ALIGN 13
+# define EVENT_LINK_CARRIER_ON 14
/* This one is special, as it indicates that the device is going away
* there are cyclic dependencies between tasklet, timer and bh
* that must be broken
--
2.49.0
Hi,
The issue that I originally fixed with
commit 8ff4fb276e23 ("pinctrl: amd: Clear GPIO debounce for suspend")
I have found out will be a problem on more platforms due to more designs
switching to power button controlled this way.
I think it would be a good idea to take it back to stable kernels.
Thanks!
Hi Greg,
The below two patches are needed on linux-5.15.y and linux-6.1.y, please
help to add them to the stable tree.
b7a62611fab7 usb: chipidea: add USB PHY event
87ed257acb09 usb: phy: mxs: disconnect line when USB charger is attached
They are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git branch usb-testing
Thanks,
Xu Yang
+ stable
+ regressions
New subject
Great news.
Greg, Sasha,
Can you please pull in these 3 commits specifically to 6.6.y to fix a
regression that was reported by Morgan in 6.6.y:
commit 12753d71e8c5 ("ACPI: CPPC: Add helper to get the highest
performance value")
commit ed429c686b79 ("cpufreq: amd-pstate: Enable amd-pstate preferred
core support")
commit 3d291fe47fe1 ("cpufreq: amd-pstate: fix the highest frequency
issue which limits performance")
Further details are below.
Thanks!
On 9/5/2024 16:09, Jones, Morgan wrote:
> Mario,
>
> Confirmed. Thank you for the help! Slightly different refs on my end:
>
> Remotes:
>
> next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (fetch)
> next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (push)
> origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git (fetch)
> origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git (push)
> superm1 https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/ (fetch)
> superm1 https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/ (push)
> torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
> torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
>
> Patches:
>
> git format-patch 12753d71e8c5^..12753d71e8c5
> git format-patch f3a052391822b772b4e27f2594526cf1eb103cab^..f3a052391822b772b4e27f2594526cf1eb103cab
> git format-patch bf202e654bfa57fb8cf9d93d4c6855890b70b9c4^..bf202e654bfa57fb8cf9d93d4c6855890b70b9c4
>
> Results:
>
> Linux redact 6.6.48 #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980 x86_64 GNU/Linux
>
> analyzing CPU 56:
> driver: amd-pstate-epp
> CPUs which run at the same hardware frequency: 56
> CPUs which need to have their frequency coordinated by software: 56
> maximum transition latency: Cannot determine or is not supported.
> hardware limits: 400 MHz - 3.35 GHz
> available cpufreq governors: performance powersave
> current policy: frequency should be within 400 MHz and 3.35 GHz.
> The governor "performance" may decide which speed to use
> within this range.
> current CPU frequency: Unable to call hardware
> current CPU frequency: 2.09 GHz (asserted by call to kernel)
> boost state support:
> Supported: yes
> Active: yes
> AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.35 GHz.
> AMD PSTATE Nominal Performance: 152. Nominal Frequency: 2.00 GHz.
> AMD PSTATE Lowest Non-linear Performance: 115. Lowest Non-linear Frequency: 1.51 GHz.
> AMD PSTATE Lowest Performance: 31. Lowest Frequency: 400 MHz.
>
> And our builds are back to being fast with `amd_pstate=active amd_prefcore=enable amd_pstate.shared_mem=1`.
>
> Morgan
>
> -----Original Message-----
> From: Mario Limonciello <mario.limonciello(a)amd.com>
> Sent: Thursday, September 5, 2024 8:12 AM
> To: Jones, Morgan <Morgan.Jones(a)viasat.com>
> Cc: linux-pm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; David Arcari <darcari(a)redhat.com>; Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>; rafael(a)kernel.org; viresh.kumar(a)linaro.org; gautham.shenoy(a)amd.com; perry.yuan(a)amd.com; skhan(a)linuxfoundation.org; li.meng(a)amd.com; ray.huang(a)amd.com
> Subject: Re: [EXTERNAL] Re: [PATCH v2 2/2] cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems
>
> Hi Morgan,
>
> Please apply these 3 commits:
>
> commit 12753d71e8c5 ("ACPI: CPPC: Add helper to get the highest performance value") commit ed429c686b79 ("cpufreq: amd-pstate: Enable amd-pstate preferred core support") commit 3d291fe47fe1 ("cpufreq: amd-pstate: fix the highest frequency issue which limits performance")
>
> The first two should help your system, the third will prevent introducing a regression on a different one.
>
> Assuming that works we should ask @stable to pull all 3 in to fix this regression.
>
> Thanks,
>
> On 9/4/2024 08:57, Mario Limonciello wrote:
>> Morgan,
>>
>> I was referring specfiically to the version that landed in Linus' tree:
>> https://urldefense.us/v3/__https://git.kernel.org/torvalds/c/8164f7433
>> 264__;!!C5Asm8uRnZQmlRln!aIZEDEbIUKD7OrxN0b0KjoqKYDL2yMkwk4EK7x_oSnyHQ
>> 6MEq7yt6JHjd0TD9DgEYEWDcF58OKL8c7G11bT3dSqL8eM$
>>
>> But yeah it's effectively the same thing. In any case, it's not the
>> solution.
>>
>> We had some internal discussion and suspect this is due to missing
>> prefcore patches in 6.6 as that feature landed in 6.9. We'll try to
>> reproduce this on a Rome system and come back with our findings and
>> suggestions what to do.
>>
>> Thanks,
>>
>
From: Nirmal Patel <nirmal.patel(a)linux.intel.com>
[ Upstream commit 886e67100b904cb1b106ed1dfa8a60696aff519a ]
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.
Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.
As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.
Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.c…
Signed-off-by: Nirmal Patel <nirmal.patel(a)linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
(cherry picked from commit 886e67100b904cb1b106ed1dfa8a60696aff519a)
[ This patch has already been backported to the Ubuntu 5.15 kernel
and fixes boot issues on Intel platforms with VMD rev 04,
confirmed on version 5.15.189. ]
Signed-off-by: Artur Piechocki <artur.piechocki(a)open-e.com>
---
drivers/pci/controller/vmd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 846590706a38..d0d18e9c6968 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -799,6 +799,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
vmd_attach_resources(vmd);
if (vmd->irq_domain)
dev_set_msi_domain(&vmd->bus->dev, vmd->irq_domain);
+ else
+ dev_set_msi_domain(&vmd->bus->dev,
+ dev_get_msi_domain(&vmd->dev->dev));
WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
"domain"), "Can't create symlink to domain\n");
--
2.50.1
When working on an issue in the MPTCP selftests due to a recent
backport, I noticed it was due to a missing backports. A few years ago,
we were not properly monitoring the failed backports, and we missed a
few patches:
- 857898eb4b28 ("selftests: mptcp: add missing join check")
- 0c1f78a49af7 ("mptcp: fix error mibs accounting")
- 31bf11de146c ("mptcp: introduce MAPPING_BAD_CSUM")
- fd37c2ecb21f ("selftests: mptcp: Initialize variables to quiet gcc 12 warnings")
- c886d70286bf ("mptcp: do not queue data on closed subflows")
An extra patch has been added to ease the other backports:
- b8e0def397d7 ("mptcp: drop unused sk in mptcp_push_release")
Geliang Tang (1):
mptcp: drop unused sk in mptcp_push_release
Mat Martineau (1):
selftests: mptcp: Initialize variables to quiet gcc 12 warnings
Matthieu Baerts (1):
selftests: mptcp: add missing join check
Paolo Abeni (3):
mptcp: fix error mibs accounting
mptcp: introduce MAPPING_BAD_CSUM
mptcp: do not queue data on closed subflows
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 17 ++++++++++------
net/mptcp/protocol.h | 11 ++++++----
net/mptcp/subflow.c | 20 +++++++++----------
.../selftests/net/mptcp/mptcp_connect.c | 2 +-
.../testing/selftests/net/mptcp/mptcp_join.sh | 1 +
6 files changed, 30 insertions(+), 22 deletions(-)
--
2.50.0
From: Willem de Bruijn <willemb(a)google.com>
When packet_set_ring() releases po->bind_lock, another thread can
run packet_notifier() and process an NETDEV_UP event.
This race and the fix are both similar to that of commit 15fe076edea7
("net/packet: fix a race in packet_bind() and packet_notifier()").
There too the packet_notifier NETDEV_UP event managed to run while a
po->bind_lock critical section had to be temporarily released. And
the fix was similarly to temporarily set po->num to zero to keep
the socket unhooked until the lock is retaken.
The po->bind_lock in packet_set_ring and packet_notifier precede the
introduction of git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Quang Le <quanglex97(a)gmail.com>
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
net/packet/af_packet.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bc438d0d96a7..a7017d7f0927 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4573,10 +4573,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
spin_lock(&po->bind_lock);
was_running = packet_sock_flag(po, PACKET_SOCK_RUNNING);
num = po->num;
- if (was_running) {
- WRITE_ONCE(po->num, 0);
+ WRITE_ONCE(po->num, 0);
+ if (was_running)
__unregister_prot_hook(sk, false);
- }
+
spin_unlock(&po->bind_lock);
synchronize_net();
@@ -4608,10 +4608,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
mutex_unlock(&po->pg_vec_lock);
spin_lock(&po->bind_lock);
- if (was_running) {
- WRITE_ONCE(po->num, num);
+ WRITE_ONCE(po->num, num);
+ if (was_running)
register_prot_hook(sk);
- }
+
spin_unlock(&po->bind_lock);
if (pg_vec && (po->tp_version > TPACKET_V2)) {
/* Because we don't support block-based V3 on tx-ring */
--
2.50.1.565.gc32cd1483b-goog
The logic to synthesize constant_tsc for Pentium 4s (Family 15) is
wrong. Since INTEL_P4_PRESCOTT is numerically greater than
INTEL_P4_WILLAMETTE, the logic always results in false and never sets
X86_FEATURE_CONSTANT_TSC for any Pentium 4 model.
The error was introduced while replacing the x86_model check with a VFM
one. The original check was as follows:
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
(c->x86 == 0x6 && c->x86_model >= 0x0e))
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
Fix the logic to cover all Pentium 4 models from Prescott (model 3) to
Cedarmill (model 6) which is the last model released in Family 15.
Fixes: fadb6f569b10 ("x86/cpu/intel: Limit the non-architectural constant_tsc model checks")
Cc: <stable(a)vger.kernel.org> # v6.15
Signed-off-by: Suchit Karunakaran <suchitkarunakaran(a)gmail.com>
---
Changes since v2:
- Improve commit message
Changes since v1:
- Fix incorrect logic
arch/x86/kernel/cpu/intel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 076eaa41b8c8..6f5bd5dbc249 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -262,7 +262,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
- } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_WILLAMETTE) ||
+ } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_CEDARMILL) ||
(c->x86_vfm >= INTEL_CORE_YONAH && c->x86_vfm <= INTEL_IVYBRIDGE)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
}
--
2.50.1
While working on a BeaglePlay with the UART controller and making it use
DMA, I figured the DMA controller completion helper was clearly
misbehaving.
On one side, with slow devices like a UART controller, the helper uses
an unusual polling delay of 1s.
On the other side, the helper can also perform 0us sleeps which means
the driver does not sleep at all in some situations and keeps the CPU
busy waiting.
Finally, while I was digging into these issues, it felt like the helper
was a bit too complex and could be simplified, which is what I did in
patch 3. I initially worked on v6.7.x which did not had the spinlocks, I
hope I didn't got them wrong.
I also tested this with v6.17-rc7.
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Miquel Raynal (3):
dmaengine: ti: k3-udma: Reduce completion polling delay
dmaengine: ti: k3-udma: Ensure a minimum polling delay
dmaengine: ti: k3-udma: Simplify the completion worker
drivers/dma/ti/k3-udma.c | 85 ++++++++++++++++++++----------------------------
1 file changed, 36 insertions(+), 49 deletions(-)
---
base-commit: b59220b9fa2c3da4295d71913146cd64b869fcf9
change-id: 20250731-ti-dma-timeout-1c64868b5baf
Best regards,
--
Miquel Raynal <miquel.raynal(a)bootlin.com>
Hi,
Hope today's treating you well!
I wanted to know if you would be interested in purchasing the attendees list of NRF 2025 Retail's Big Show ?
Attendees count: 25,000 Leads
Contact Information: Company Name, Web URL, Contact Name, Title, Direct Email, Phone Number, Mailing Address, Industry, Employee Size, Annual Sales.
If this ignites your interest please let me know, I'd be glad to offer you the pricing options, for your assessment.
I'm grateful for your quick reply. Can't wait to hear from you.
Best
Alyssa Campbell
Demand Generation Manager
B2B Connect, Inc.
Please respond with a 'REMOVE' if you don't wish to receive further emails
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
When trying to set MCR[2], XON1 is incorrectly accessed instead. And when
writing to the TCR register to configure flow control levels, we are
incorrectly writing to the MSR register. The default value of $00 is then
used for TCR, which means that selectable trigger levels in FCR are used
in place of TCR.
TCR/TLR access requires EFR[4] (enable enhanced functions) and MCR[2]
to be set. EFR[4] is already set in probe().
MCR access requires LCR[7] to be zero.
Since LCR is set to $BF when trying to set MCR[2], XON1 is incorrectly
accessed instead because MCR shares the same address space as XON1.
Since MCR[2] is unmodified and still zero, when writing to TCR we are in
fact writing to MSR because TCR/TLR registers share the same address space
as MSR/SPR.
Fix by first removing useless reconfiguration of EFR[4] (enable enhanced
functions), as it is already enabled in sc16is7xx_probe() since commit
43c51bb573aa ("sc16is7xx: make sure device is in suspend once probed").
Now LCR is $00, which means that MCR access is enabled.
Also remove regcache_cache_bypass() calls since we no longer access the
enhanced registers set, and TCR is already declared as volatile (in fact
by declaring MSR as volatile, which shares the same address).
Finally disable access to TCR/TLR registers after modifying them by
clearing MCR[2].
Note: the comment about "... and internal clock div" is wrong and can be
ignored/removed as access to internal clock div registers (DLL/DLH)
is permitted only when LCR[7] is logic 1, not when enhanced features
is enabled. And DLL/DLH access is not needed in sc16is7xx_startup().
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
---
drivers/tty/serial/sc16is7xx.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 5ea8aadb6e69c..9056cb82456f1 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1177,17 +1177,6 @@ static int sc16is7xx_startup(struct uart_port *port)
sc16is7xx_port_write(port, SC16IS7XX_FCR_REG,
SC16IS7XX_FCR_FIFO_BIT);
- /* Enable EFR */
- sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
- SC16IS7XX_LCR_CONF_MODE_B);
-
- regcache_cache_bypass(one->regmap, true);
-
- /* Enable write access to enhanced features and internal clock div */
- sc16is7xx_port_update(port, SC16IS7XX_EFR_REG,
- SC16IS7XX_EFR_ENABLE_BIT,
- SC16IS7XX_EFR_ENABLE_BIT);
-
/* Enable TCR/TLR */
sc16is7xx_port_update(port, SC16IS7XX_MCR_REG,
SC16IS7XX_MCR_TCRTLR_BIT,
@@ -1199,7 +1188,8 @@ static int sc16is7xx_startup(struct uart_port *port)
SC16IS7XX_TCR_RX_RESUME(24) |
SC16IS7XX_TCR_RX_HALT(48));
- regcache_cache_bypass(one->regmap, false);
+ /* Disable TCR/TLR access */
+ sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, 0);
/* Now, initialize the UART */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_WORD_LEN_8);
base-commit: 260f6f4fda93c8485c8037865c941b42b9cba5d2
--
2.39.5
From: Nirmal Patel <nirmal.patel(a)linux.intel.com>
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.
Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.
As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.
Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.c…
Cc: stable(a)vger.kernel.org
Signed-off-by: Nirmal Patel <nirmal.patel(a)linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Artur Piechocki <artur.piechocki(a)open-e.com>
---
drivers/pci/controller/vmd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index eb05cceab964..5015adc04d19 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -853,6 +853,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
vmd_attach_resources(vmd);
if (vmd->irq_domain)
dev_set_msi_domain(&vmd->bus->dev, vmd->irq_domain);
+ else
+ dev_set_msi_domain(&vmd->bus->dev,
+ dev_get_msi_domain(&vmd->dev->dev));
vmd_acpi_begin();
--
2.50.1
> This is a note to let you know that I've just added the patch titled
>
> Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"
>
> to the 6.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> revert-cgroup_freezer-cgroup_freezing-check-if-not-f.patch
> and it can be found in the queue-6.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it.
>
The patch ("sched,freezer: Remove unnecessary warning in __thaw_task") should also be merged to
prevent triggering another warning in __thaw_task().
Best regards,
Ridong
>
>
> commit ee09694849a570cbc32015b5329b7c2f3f778748
> Author: Chen Ridong <chenridong(a)huawei.com>
> Date: Thu Jul 17 08:55:50 2025 +0000
>
> Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"
>
> [ Upstream commit 14a67b42cb6f3ab66f41603c062c5056d32ea7dd ]
>
> This reverts commit cff5f49d433fcd0063c8be7dd08fa5bf190c6c37.
>
> Commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not
> frozen") modified the cgroup_freezing() logic to verify that the FROZEN
> flag is not set, affecting the return value of the freezing() function,
> in order to address a warning in __thaw_task.
>
> A race condition exists that may allow tasks to escape being frozen. The
> following scenario demonstrates this issue:
>
> CPU 0 (get_signal path) CPU 1 (freezer.state reader)
> try_to_freeze read freezer.state
> __refrigerator freezer_read
> update_if_frozen
> WRITE_ONCE(current->__state, TASK_FROZEN);
> ...
> /* Task is now marked frozen */
> /* frozen(task) == true */
> /* Assuming other tasks are frozen */
> freezer->state |= CGROUP_FROZEN;
> /* freezing(current) returns false */
> /* because cgroup is frozen (not freezing) */
> break out
> __set_current_state(TASK_RUNNING);
> /* Bug: Task resumes running when it should remain frozen */
>
> The existing !frozen(p) check in __thaw_task makes the
> WARN_ON_ONCE(freezing(p)) warning redundant. Removing this warning enables
> reverting the commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check
> if not frozen") to resolve the issue.
>
> The warning has been removed in the previous patch. This patch revert the
> commit cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not
> frozen") to complete the fix.
>
> Fixes: cff5f49d433f ("cgroup_freezer: cgroup_freezing: Check if not frozen")
> Reported-by: Zhong Jiawei<zhongjiawei1(a)huawei.com>
> Signed-off-by: Chen Ridong <chenridong(a)huawei.com>
> Signed-off-by: Tejun Heo <tj(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/kernel/cgroup/legacy_freezer.c b/kernel/cgroup/legacy_freezer.c index 507b8f19a262e..dd9417425d929 100644
> --- a/kernel/cgroup/legacy_freezer.c
> +++ b/kernel/cgroup/legacy_freezer.c
> @@ -66,15 +66,9 @@ static struct freezer *parent_freezer(struct freezer *freezer) bool cgroup_freezing(struct task_struct *task) {
> bool ret;
> - unsigned int state;
>
> rcu_read_lock();
> - /* Check if the cgroup is still FREEZING, but not FROZEN. The extra
> - * !FROZEN check is required, because the FREEZING bit is not cleared
> - * when the state FROZEN is reached.
> - */
> - state = task_freezer(task)->state;
> - ret = (state & CGROUP_FREEZING) && !(state & CGROUP_FROZEN);
> + ret = task_freezer(task)->state & CGROUP_FREEZING;
> rcu_read_unlock();
>
> return ret;
>
The driver stores a reference to the `usb_device` structure (`udev`)
in its private data (`data->udev`), which can persist beyond the
immediate context of the `bfusb_probe()` function.
Without proper reference count management, this can lead to two issues:
1. A `use-after-free` scenario if `udev` is accessed after its main
reference count drops to zero (e.g., if the device is disconnected
and the `data` structure is still active).
2. A `memory leak` if `udev`'s reference count is not properly
decremented during driver disconnect, preventing the `usb_device`
object from being freed.
To correctly manage the `udev` lifetime, explicitly increment its
reference count with `usb_get_dev(udev)` when storing it in the
driver's private data. Correspondingly, decrement the reference count
with `usb_put_dev(data->udev)` in the `bfusb_disconnect()` callback.
This ensures `udev` remains valid while referenced by the driver's
private data and is properly released when no longer needed.
Signed-off-by: Salah Triki <salah.triki(a)gmail.com>
Fixes: 9c724357f432d ("[Bluetooth] Code cleanup of the drivers source code")
Cc: stable(a)vger.kernel.org
Cc: Marcel Holtmann <marcel(a)holtmann.org>
Cc: Luiz Augusto von Dentz <luiz.dentz(a)gmail.com>
Cc: linux-bluetooth(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
Changes in v3:
- Add tag Cc
Changes in v2:
- Add tags Fixes and Cc
drivers/bluetooth/bfusb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/bluetooth/bfusb.c b/drivers/bluetooth/bfusb.c
index 8df310983bf6..f966bd8361b0 100644
--- a/drivers/bluetooth/bfusb.c
+++ b/drivers/bluetooth/bfusb.c
@@ -622,7 +622,7 @@ static int bfusb_probe(struct usb_interface *intf, const struct usb_device_id *i
if (!data)
return -ENOMEM;
- data->udev = udev;
+ data->udev = usb_get_dev(udev);
data->bulk_in_ep = bulk_in_ep->desc.bEndpointAddress;
data->bulk_out_ep = bulk_out_ep->desc.bEndpointAddress;
data->bulk_pkt_size = le16_to_cpu(bulk_out_ep->desc.wMaxPacketSize);
@@ -701,6 +701,8 @@ static void bfusb_disconnect(struct usb_interface *intf)
usb_set_intfdata(intf, NULL);
+ usb_put_dev(data->udev);
+
bfusb_close(hdev);
hci_unregister_dev(hdev);
--
2.43.0
Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken
the logic in iomap_dio_bio_iter() in a way that when the device does
support FUA (or has no writeback cache) and the direct IO happens to
freshly allocated or unwritten extents, we will *not* issue fsync after
completing direct IO O_SYNC / O_DSYNC write because the
IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set. Fix the problem by
clearing IOMAP_DIO_WRITE_THROUGH whenever we do not perform FUA write as
it was originally intended.
CC: John Garry <john.g.garry(a)oracle.com>
CC: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com>
Fixes: d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/iomap/direct-io.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
BTW, I've spotted this because some performance tests got suspiciously fast
on recent kernels :) Sadly no easy improvement to cherry-pick for me to fix
customer issue I'm chasing...
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6f25d4cfea9f..b84f6af2eb4c 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -363,14 +363,14 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
if (iomap->flags & IOMAP_F_SHARED)
dio->flags |= IOMAP_DIO_COW;
- if (iomap->flags & IOMAP_F_NEW) {
+ if (iomap->flags & IOMAP_F_NEW)
need_zeroout = true;
- } else if (iomap->type == IOMAP_MAPPED) {
- if (iomap_dio_can_use_fua(iomap, dio))
- bio_opf |= REQ_FUA;
- else
- dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
- }
+ else if (iomap->type == IOMAP_MAPPED &&
+ iomap_dio_can_use_fua(iomap, dio))
+ bio_opf |= REQ_FUA;
+
+ if (!(bio_opf & REQ_FUA))
+ dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
/*
* We can only do deferred completion for pure overwrites that
--
2.43.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 37848a456fc38c191aedfe41f662cc24db8c23d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072839-wildly-gala-e85f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 37848a456fc38c191aedfe41f662cc24db8c23d9 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Tue, 15 Jul 2025 20:43:28 +0200
Subject: [PATCH] selftests: mptcp: connect: also cover alt modes
The "mmap" and "sendfile" alternate modes for mptcp_connect.sh/.c are
available from the beginning, but only tested when mptcp_connect.sh is
manually launched with "-m mmap" or "-m sendfile", not via the
kselftests helpers.
The MPTCP CI was manually running "mptcp_connect.sh -m mmap", but not
"-m sendfile". Plus other CIs, especially the ones validating the stable
releases, were not validating these alternate modes.
To make sure these modes are validated by these CIs, add two new test
programs executing mptcp_connect.sh with the alternate modes.
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Cc: stable(a)vger.kernel.org
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-1-8230ddd824…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index e47788bfa671..c6b030babba8 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -4,7 +4,8 @@ top_srcdir = ../../../../..
CFLAGS += -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
-TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \
+TEST_PROGS := mptcp_connect.sh mptcp_connect_mmap.sh mptcp_connect_sendfile.sh \
+ pm_netlink.sh mptcp_join.sh diag.sh \
simult_flows.sh mptcp_sockopt.sh userspace_pm.sh
TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq mptcp_diag
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
new file mode 100755
index 000000000000..5dd30f9394af
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m mmap "${@}"
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
new file mode 100755
index 000000000000..1d16fb1cc9bb
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m sendfile "${@}"
According to documentation, the eDP PHY on x1e80100 has another clock
called refclk. Rework the driver to allow different number of clocks
based on match data and add this refclk to the x1e80100. Fix the
dt-bindings schema and add the clock to the DT node as well.
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Abel Vesa (3):
dt-bindings: phy: qcom-edp: Add missing clock for X Elite
phy: qcom: edp: Add missing refclk for X1E80100
arm64: dts: qcom: Add missing TCSR refclk to the eDP PHY
.../devicetree/bindings/phy/qcom,edp-phy.yaml | 23 +++++++++++-
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 6 ++-
drivers/phy/qualcomm/phy-qcom-edp.c | 43 ++++++++++++++++++----
3 files changed, 62 insertions(+), 10 deletions(-)
---
base-commit: 79fb37f39b77bbf9a56304e9af843cd93a7a1916
change-id: 20250730-phy-qcom-edp-add-missing-refclk-5ab82828f8e7
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
From: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Due to a flaw in the hardware design, the GL9763e replay timer frequently
times out when ASPM is enabled. As a result, the warning messages will
often appear in the system log when the system accesses the GL9763e
PCI config. Therefore, the replay timer timeout must be masked.
Signed-off-by: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Fixes: 1ae1d2d6e555 ("mmc: sdhci-pci-gli: Add Genesys Logic GL9763E support")
Cc: stable(a)vger.kernel.org
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
---
drivers/mmc/host/sdhci-pci-gli.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
index 436f0460222f..3a1de477e9af 100644
--- a/drivers/mmc/host/sdhci-pci-gli.c
+++ b/drivers/mmc/host/sdhci-pci-gli.c
@@ -1782,6 +1782,9 @@ static void gl9763e_hw_setting(struct sdhci_pci_slot *slot)
value |= FIELD_PREP(GLI_9763E_HS400_RXDLY, GLI_9763E_HS400_RXDLY_5);
pci_write_config_dword(pdev, PCIE_GLI_9763E_CLKRXDLY, value);
+ /* mask the replay timer timeout of AER */
+ sdhci_gli_mask_replay_timer_timeout(pdev);
+
pci_read_config_dword(pdev, PCIE_GLI_9763E_VHS, &value);
value &= ~GLI_9763E_VHS_REV;
value |= FIELD_PREP(GLI_9763E_VHS_REV, GLI_9763E_VHS_REV_R);
--
2.43.0
The `xa_store()` function can fail due to memory allocation issues or other
internal errors. Currently, the return value of `xa_store()` is not
checked, which can lead to a memory leak if it fails to store `numa_info`.
This patch checks the return value of `xa_store()`. If an error is
detected, the allocated `numa_info` is freed, and NULL is returned to
indicate the failure, preventing a memory leak and ensuring proper error
handling.
Signed-off-by: Salah Triki <salah.triki(a)gmail.com>
Fixes: 1cc823011a23f ("drm/amdgpu: Store additional numa node information")
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Simona Vetter <simona(a)ffwll.ch>
Cc: amd-gfx(a)lists.freedesktop.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Improve description
- Add tags Fixes and Cc
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index f5466c592d94..b4a3e4d3e957 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -876,7 +876,7 @@ static inline uint64_t amdgpu_acpi_get_numa_size(int nid)
static struct amdgpu_numa_info *amdgpu_acpi_get_numa_info(uint32_t pxm)
{
- struct amdgpu_numa_info *numa_info;
+ struct amdgpu_numa_info *numa_info, *old;
int nid;
numa_info = xa_load(&numa_info_xa, pxm);
@@ -898,7 +898,11 @@ static struct amdgpu_numa_info *amdgpu_acpi_get_numa_info(uint32_t pxm)
} else {
numa_info->size = amdgpu_acpi_get_numa_size(nid);
}
- xa_store(&numa_info_xa, numa_info->pxm, numa_info, GFP_KERNEL);
+ old = xa_store(&numa_info_xa, numa_info->pxm, numa_info, GFP_KERNEL);
+ if (xa_is_err(old)) {
+ kfree(numa_info);
+ return NULL;
+ }
}
return numa_info;
--
2.43.0
From: Mario Limonciello <mario.limonciello(a)amd.com>
This reverts commit 66abb996999de0d440a02583a6e70c2c24deab45.
This broke custom brightness curves but it wasn't obvious because
of other related changes. Custom brightness curves are always
from a 0-255 input signal. The correct fix was to fix the default
value which was done by [1].
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4412
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/amd-gfx/0f094c4b-d2a3-42cd-824c-dc2858a5618d@kernel…
Reviewed-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Roman Li <roman.li(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 16347ca2396a..31ea57edeb45 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4800,16 +4800,16 @@ static int get_brightness_range(const struct amdgpu_dm_backlight_caps *caps,
return 1;
}
-/* Rescale from [min..max] to [0..MAX_BACKLIGHT_LEVEL] */
+/* Rescale from [min..max] to [0..AMDGPU_MAX_BL_LEVEL] */
static inline u32 scale_input_to_fw(int min, int max, u64 input)
{
- return DIV_ROUND_CLOSEST_ULL(input * MAX_BACKLIGHT_LEVEL, max - min);
+ return DIV_ROUND_CLOSEST_ULL(input * AMDGPU_MAX_BL_LEVEL, max - min);
}
-/* Rescale from [0..MAX_BACKLIGHT_LEVEL] to [min..max] */
+/* Rescale from [0..AMDGPU_MAX_BL_LEVEL] to [min..max] */
static inline u32 scale_fw_to_input(int min, int max, u64 input)
{
- return min + DIV_ROUND_CLOSEST_ULL(input * (max - min), MAX_BACKLIGHT_LEVEL);
+ return min + DIV_ROUND_CLOSEST_ULL(input * (max - min), AMDGPU_MAX_BL_LEVEL);
}
static void convert_custom_brightness(const struct amdgpu_dm_backlight_caps *caps,
--
2.34.1
Due to updates to the Device Tree (migrating to onboard USB hub nodes
instead of (badly) hacking things with a gpio regulator that doesn't
actually work properly), we now need to enable the onboard USB hub
driver in U-Boot.
This anticipates upcoming breakage when 6.16 DT will be merged into
U-Boot's dts/upstream.
The series can be applied as is before v6.16 DT is merged or only the
defconfig changes after 6.16 DT has been merged.
The last two patches are simply to avoid probing devices that aren't
actually routed on RK3399 Puma, which is nice to have but doesn't fix
anything.
Note that this depends on the following series:
https://lore.kernel.org/u-boot/20250722-usb_onboard_hub_cypress_hx3-v4-0-91…
Signed-off-by: Quentin Schulz <quentin.schulz(a)cherry.de>
---
Lukasz Czechowski (2):
dt-bindings: usb: cypress,hx3: Add support for all variants
arm64: dts: rockchip: fix internal USB hub instability on RK3399 Puma
Quentin Schulz (4):
configs: puma-rk3399: enable onboard USB hub support
dt-bindings: usb: usb-device: relax compatible pattern to a contains
arm64: dts: rockchip: disable unrouted USB controllers and PHY on RK3399 Puma
arm64: dts: rockchip: disable unrouted USB controllers and PHY on RK3399 Puma with Haikou
configs/puma-rk3399_defconfig | 1 +
dts/upstream/Bindings/usb/cypress,hx3.yaml | 19 +++++++--
dts/upstream/Bindings/usb/usb-device.yaml | 3 +-
.../src/arm64/rockchip/rk3399-puma-haikou.dts | 8 ----
dts/upstream/src/arm64/rockchip/rk3399-puma.dtsi | 48 +++++++++++-----------
5 files changed, 43 insertions(+), 36 deletions(-)
---
base-commit: 5a8dd2e0c848135b5c96af291aa96e79acc923ec
change-id: 20250730-puma-usb-cypress-2d2957024424
prerequisite-change-id: 20250425-usb_onboard_hub_cypress_hx3-2831983f1ede:v4
prerequisite-patch-id: 515a13b22600d40716e9c36d16b084086ce7d474
prerequisite-patch-id: 9f8a11bd6c66e976c51dc0f0bc3292183f9403f3
prerequisite-patch-id: 45ef5b9422333db5fcd23d95b6570b320635c49b
prerequisite-patch-id: 19092f1c3db746292401b8513807439c87ea9589
prerequisite-patch-id: fced7578d40069c5fe83d97aa42476015cf9cbda
Best regards,
--
Quentin Schulz <quentin.schulz(a)cherry.de>
This is the start of the stable review cycle for the 6.6.100 release.
There are 111 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 24 Jul 2025 13:43:10 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.100-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.100-rc1
Michael C. Pratt <mcpratt(a)pm.me>
nvmem: layouts: u-boot-env: remove crc32 endianness conversion
Johan Hovold <johan+linaro(a)kernel.org>
i2c: omap: fix deprecated of_property_read_bool() use
Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
Revert "selftests/bpf: dummy_st_ops should reject 0 for non-nullable params"
Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
Revert "selftests/bpf: adjust dummy_st_ops_success to detect additional error"
Arun Raghavan <arun(a)asymptotic.io>
ASoC: fsl_sai: Force a software reset when starting in consumer mode
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
regulator: pwm-regulator: Manage boot-on with disabled PWM channels
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
regulator: pwm-regulator: Calculate the output voltage for disabled PWMs
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
i2c: omap: Handle omap_i2c_init() errors in omap_i2c_probe()
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
i2c: omap: Fix an error handling path in omap_i2c_probe()
Jayesh Choudhary <j-choudhary(a)ti.com>
i2c: omap: Add support for setting mux
Krishna Kurapati <krishna.kurapati(a)oss.qualcomm.com>
usb: dwc3: qcom: Don't leave BCR asserted
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: Don't try to recover devices lost during warm reset.
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: Fix flushing of delayed work used for post resume purposes
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: Fix flushing and scheduling of delayed work that tunes runtime pm
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: hub: fix detection of high tier USB3 devices behind suspended hubs
Mark Brown <broonie(a)kernel.org>
arm64: Filter out SME hwcaps when FEAT_SME isn't implemented
Al Viro <viro(a)zeniv.linux.org.uk>
clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
Eric Dumazet <edumazet(a)google.com>
ipv6: make addrconf_wq single threaded
Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
sched: Change nr_uninterruptible type to unsigned long
Chen Ridong <chenridong(a)huawei.com>
Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"
David Howells <dhowells(a)redhat.com>
rxrpc: Fix transmission of an abort in response to an abort
David Howells <dhowells(a)redhat.com>
rxrpc: Fix recv-recv race of completed call
William Liu <will(a)willsroot.io>
net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree
Joseph Huang <Joseph.Huang(a)garmin.com>
net: bridge: Do not offload IGMP/MLD messages
Dong Chenchen <dongchenchen2(a)huawei.com>
net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
Jakub Kicinski <kuba(a)kernel.org>
tls: always refresh the queue when reading sock
Li Tian <litian(a)redhat.com>
hv_netvsc: Set VF priv_flags to IFF_NO_ADDRCONF before open to prevent IPv6 addrconf
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU
Florian Westphal <fw(a)strlen.de>
netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
Yue Haibing <yuehaibing(a)huawei.com>
ipv6: mcast: Delay put pmc->idev in mld_del_delrec()
Christoph Paasch <cpaasch(a)openai.com>
net/mlx5: Correctly set gso_size when LRO is used
Zijun Hu <zijun.hu(a)oss.qualcomm.com>
Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: SMP: If an unallowed command is received consider it a failure
Alessandro Gasbarroni <alex.gasbarroni(a)gmail.com>
Bluetooth: hci_sync: fix connectable extended advertising when using static random address
Kuniyuki Iwashima <kuniyu(a)google.com>
Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb()
Oliver Neukum <oneukum(a)suse.com>
usb: net: sierra: check for no status endpoint
Dave Ertman <david.m.ertman(a)intel.com>
ice: add NULL check in eswitch lag check
Marius Zachmann <mail(a)mariuszachmann.de>
hwmon: (corsair-cpro) Validate the size of the received input buffer
Paolo Abeni <pabeni(a)redhat.com>
selftests: net: increase inter-packet timeout in udpgro.sh
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: remove scan request n_channels counted_by
Yu Kuai <yukuai3(a)huawei.com>
nvme: fix misaccounting of nvme-mpath inflight I/O
Sean Anderson <sean.anderson(a)linux.dev>
net: phy: Don't register LEDs for genphy
Zheng Qixing <zhengqixing(a)huawei.com>
nvme: fix inconsistent RCU list manipulation in nvme_ns_add_to_ctrl_list()
Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
smb: client: fix use-after-free in cifs_oplock_break
Kuniyuki Iwashima <kuniyu(a)google.com>
rpl: Fix use-after-free in rpl_do_srh_inline().
Xiang Mei <xmei5(a)asu.edu>
net/sched: sch_qfq: Fix race condition on qfq_aggregate
Ming Lei <ming.lei(a)redhat.com>
block: fix kobject leak in blk_unregister_queue
Alok Tiwari <alok.a.tiwari(a)oracle.com>
net: emaclite: Fix missing pointer increment in aligned_read()
Zizhi Wo <wozizhi(a)huawei.com>
cachefiles: Fix the incorrect return value in __cachefiles_write()
Paul Chaignon <paul.chaignon(a)gmail.com>
bpf: Reject %p% format string in bprintf-like helpers
Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
soundwire: amd: fix for clearing command status register
Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
soundwire: amd: fix for handling slave alerts after link is down
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fix initialization of data for instructions that write to subdevice
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fix use of uninitialized data in insn_rw_emulate_bits()
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fix some signed shift left operations
Ian Abbott <abbotti(a)mev.co.uk>
comedi: Fail COMEDI_INSNLIST ioctl if n_insns is too large
Ian Abbott <abbotti(a)mev.co.uk>
comedi: das6402: Fix bit shift out of bounds
Ian Abbott <abbotti(a)mev.co.uk>
comedi: das16m1: Fix bit shift out of bounds
Ian Abbott <abbotti(a)mev.co.uk>
comedi: aio_iiro_16: Fix bit shift out of bounds
Ian Abbott <abbotti(a)mev.co.uk>
comedi: pcl812: Fix bit shift out of bounds
Chen Ni <nichen(a)iscas.ac.cn>
iio: adc: stm32-adc: Fix race in installing chained IRQ handler
Fabio Estevam <festevam(a)denx.de>
iio: adc: max1363: Reorder mode_list[] entries
Fabio Estevam <festevam(a)denx.de>
iio: adc: max1363: Fix MAX1363_4X_CHANS/MAX1363_8X_CHANS[]
Sean Nyekjaer <sean(a)geanix.com>
iio: accel: fxls8962af: Fix use after free in fxls8962af_fifo_flush
Andrew Jeffery <andrew(a)codeconstruct.com.au>
soc: aspeed: lpc-snoop: Don't disable channels that aren't enabled
Andrew Jeffery <andrew(a)codeconstruct.com.au>
soc: aspeed: lpc-snoop: Cleanup resources in stack-order
Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
smb: client: fix use-after-free in crypt_message when using async crypto
Ilya Leoshkevich <iii(a)linux.ibm.com>
s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL again
Maulik Shah <maulik.shah(a)oss.qualcomm.com>
pmdomain: governor: Consider CPU latency tolerance from pm_domain_cpu_gov
Jiawen Wu <jiawenwu(a)trustnetic.com>
net: libwx: properly reset Rx ring descriptor
Jiawen Wu <jiawenwu(a)trustnetic.com>
net: libwx: fix the using of Rx buffer DMA
Jiawen Wu <jiawenwu(a)trustnetic.com>
net: libwx: remove duplicate page_pool_put_full_page()
Judith Mendez <jm(a)ti.com>
mmc: sdhci_am654: Workaround for Errata i2312
Edson Juliano Drosdeck <edson.drosdeck(a)gmail.com>
mmc: sdhci-pci: Quirk for broken command queuing on Intel GLK-based Positivo models
Thomas Fourier <fourier.thomas(a)gmail.com>
mmc: bcm2835: Fix dma_unmap_sg() nents value
Nathan Chancellor <nathan(a)kernel.org>
memstick: core: Zero initialize id_reg in h_memstick_read_dev_id()
Jan Kara <jack(a)suse.cz>
isofs: Verify inode mode when loading from disk
Dan Carpenter <dan.carpenter(a)linaro.org>
dmaengine: nbpfaxi: Fix memory corruption in probe()
Yun Lu <luyun(a)kylinos.cn>
af_packet: fix soft lockup issue caused by tpacket_snd()
Yun Lu <luyun(a)kylinos.cn>
af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
Jakob Unterwurzacher <jakob.unterwurzacher(a)cherry.de>
arm64: dts: rockchip: use cs-gpios for spi1 on ringneck
Francesco Dolcini <francesco.dolcini(a)toradex.com>
arm64: dts: freescale: imx8mm-verdin: Keep LDO5 always on
Tim Harvey <tharvey(a)gateworks.com>
arm64: dts: imx8mp-venice-gw74xx: fix TPM SPI frequency
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Update the list of the PCI supported devices
Nathan Chancellor <nathan(a)kernel.org>
phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept()
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring/poll: fix POLLERR handling
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda/realtek: Add quirk for ASUS ROG Strix G712LWS
Eeli Haapalainen <eeli.haapalainen(a)protonmail.com>
drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resume
Tomas Glozar <tglozar(a)redhat.com>
tracing/osnoise: Fix crash in timerlat_dump_stack()
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Add down_write(trace_event_sem) when adding trace event
Nathan Chancellor <nathan(a)kernel.org>
tracing/probes: Avoid using params uninitialized in parse_btf_arg()
Benjamin Tissoires <bentiss(a)kernel.org>
HID: core: do not bypass hid_hw_raw_request
Benjamin Tissoires <bentiss(a)kernel.org>
HID: core: ensure __hid_request reserves the report ID as the first byte
Benjamin Tissoires <bentiss(a)kernel.org>
HID: core: ensure the allocated report buffer can contain the reserved report ID
Sheng Yong <shengyong1(a)xiaomi.com>
dm-bufio: fix sched in atomic context
Cheng Ming Lin <chengminglin(a)mxic.com.tw>
spi: Add check for 8-bit transfer with 8 IO mode support
Thomas Fourier <fourier.thomas(a)gmail.com>
pch_uart: Fix dma_sync_sg_for_device() nents value
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - set correct controller type for Acer NGR200
Steffen Bätz <steffen(a)innosonix.de>
nvmem: imx-ocotp: fix MAC address byte length
Alok Tiwari <alok.a.tiwari(a)oracle.com>
thunderbolt: Fix bit masking in tb_dp_port_set_hops()
Mario Limonciello <mario.limonciello(a)amd.com>
thunderbolt: Fix wake on connect at runtime
Clément Le Goffic <clement.legoffic(a)foss.st.com>
i2c: stm32: fix the device used for the DMA map
Xinyu Liu <1171169449(a)qq.com>
usb: gadget: configfs: Fix OOB read on empty string write
Drew Hamilton <drew.hamilton(a)zetier.com>
usb: musb: fix gadget state on disconnect
Ryan Mann (NDI) <rmann(a)ndigital.com>
USB: serial: ftdi_sio: add support for NDI EMGUIDE GEMINI
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add Foxconn T99W640
Fabio Porcedda <fabio.porcedda(a)gmail.com>
USB: serial: option: add Telit Cinterion FE910C04 (ECM) composition
Haotien Hsu <haotienh(a)nvidia.com>
phy: tegra: xusb: Disable periodic tracking on Tegra234
Wayne Chang <waynec(a)nvidia.com>
phy: tegra: xusb: Decouple CYA_TRK_CODE_UPDATE_ON_IDLE from trk_hw_mode
Wayne Chang <waynec(a)nvidia.com>
phy: tegra: xusb: Fix unbalanced regulator disable in UTMI PHY mode
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/boot/dts/freescale/imx8mm-verdin.dtsi | 1 +
.../boot/dts/freescale/imx8mp-venice-gw74xx.dts | 2 +-
arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi | 23 +++++++
arch/arm64/kernel/cpufeature.c | 35 ++++++----
arch/s390/net/bpf_jit_comp.c | 10 ++-
block/blk-sysfs.c | 1 +
drivers/base/power/domain_governor.c | 18 ++++-
drivers/bluetooth/btusb.c | 78 ++++++++++++----------
drivers/comedi/comedi_fops.c | 30 ++++++++-
drivers/comedi/drivers.c | 17 +++--
drivers/comedi/drivers/aio_iiro_16.c | 3 +-
drivers/comedi/drivers/das16m1.c | 3 +-
drivers/comedi/drivers/das6402.c | 3 +-
drivers/comedi/drivers/pcl812.c | 3 +-
drivers/dma/nbpfaxi.c | 11 ++-
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 1 +
drivers/hid/hid-core.c | 21 ++++--
drivers/hwmon/corsair-cpro.c | 5 ++
drivers/i2c/busses/Kconfig | 1 +
drivers/i2c/busses/i2c-omap.c | 30 ++++++++-
drivers/i2c/busses/i2c-stm32.c | 8 +--
drivers/i2c/busses/i2c-stm32f7.c | 4 +-
drivers/iio/accel/fxls8962af-core.c | 2 +
drivers/iio/adc/max1363.c | 43 ++++++------
drivers/iio/adc/stm32-adc-core.c | 7 +-
drivers/input/joystick/xpad.c | 2 +-
drivers/md/dm-bufio.c | 6 +-
drivers/memstick/core/memstick.c | 2 +-
drivers/mmc/host/bcm2835.c | 3 +-
drivers/mmc/host/sdhci-pci-core.c | 3 +-
drivers/mmc/host/sdhci_am654.c | 9 ++-
drivers/net/ethernet/intel/ice/ice_lag.c | 3 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 ++--
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
drivers/net/ethernet/wangxun/libwx/wx_hw.c | 7 +-
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 20 ++----
drivers/net/ethernet/wangxun/libwx/wx_type.h | 2 -
drivers/net/ethernet/xilinx/xilinx_emaclite.c | 2 +-
drivers/net/hyperv/netvsc_drv.c | 5 +-
drivers/net/phy/phy_device.c | 6 +-
drivers/net/usb/sierra_net.c | 4 ++
drivers/nvme/host/core.c | 6 +-
drivers/nvmem/imx-ocotp-ele.c | 5 +-
drivers/nvmem/imx-ocotp.c | 5 +-
drivers/nvmem/u-boot-env.c | 6 +-
drivers/phy/tegra/xusb-tegra186.c | 77 ++++++++++++---------
drivers/phy/tegra/xusb.h | 1 +
drivers/regulator/pwm-regulator.c | 40 +++++++++++
drivers/soc/aspeed/aspeed-lpc-snoop.c | 13 +++-
drivers/soundwire/amd_manager.c | 4 +-
drivers/spi/spi.c | 14 ++--
drivers/thunderbolt/switch.c | 10 +--
drivers/thunderbolt/tb.h | 2 +-
drivers/thunderbolt/usb4.c | 12 ++--
drivers/tty/serial/pch_uart.c | 2 +-
drivers/usb/core/hub.c | 36 +++++++++-
drivers/usb/core/hub.h | 1 +
drivers/usb/dwc3/dwc3-qcom.c | 8 +--
drivers/usb/gadget/configfs.c | 4 ++
drivers/usb/musb/musb_gadget.c | 2 +
drivers/usb/serial/ftdi_sio.c | 2 +
drivers/usb/serial/ftdi_sio_ids.h | 3 +
drivers/usb/serial/option.c | 5 ++
fs/cachefiles/io.c | 2 -
fs/cachefiles/ondemand.c | 4 +-
fs/isofs/inode.c | 9 ++-
fs/namespace.c | 5 ++
fs/smb/client/file.c | 10 ++-
fs/smb/client/smb2ops.c | 7 +-
include/net/cfg80211.h | 2 +-
include/net/netfilter/nf_conntrack.h | 15 ++++-
include/trace/events/rxrpc.h | 3 +
io_uring/net.c | 12 ++--
io_uring/poll.c | 2 -
kernel/bpf/helpers.c | 11 ++-
kernel/cgroup/legacy_freezer.c | 8 +--
kernel/sched/loadavg.c | 2 +-
kernel/sched/sched.h | 2 +-
kernel/trace/trace_events.c | 5 ++
kernel/trace/trace_osnoise.c | 2 +-
kernel/trace/trace_probe.c | 2 +-
net/8021q/vlan.c | 42 +++++++++---
net/8021q/vlan.h | 1 +
net/bluetooth/hci_sync.c | 4 +-
net/bluetooth/l2cap_core.c | 26 ++++++--
net/bluetooth/l2cap_sock.c | 3 +
net/bluetooth/smp.c | 21 +++++-
net/bluetooth/smp.h | 1 +
net/bridge/br_switchdev.c | 3 +
net/ipv6/addrconf.c | 3 +-
net/ipv6/mcast.c | 2 +-
net/ipv6/rpl_iptunnel.c | 8 +--
net/netfilter/nf_conntrack_core.c | 26 ++++++--
net/packet/af_packet.c | 27 ++++----
net/phonet/pep.c | 2 +-
net/rxrpc/call_accept.c | 1 +
net/rxrpc/output.c | 3 +
net/rxrpc/recvmsg.c | 19 +++++-
net/sched/sch_htb.c | 4 +-
net/sched/sch_qfq.c | 30 ++++++---
net/tls/tls_strp.c | 3 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/soc/fsl/fsl_sai.c | 14 ++--
.../selftests/bpf/prog_tests/dummy_st_ops.c | 27 --------
.../selftests/bpf/progs/dummy_st_ops_success.c | 13 +---
tools/testing/selftests/net/udpgro.sh | 8 +--
107 files changed, 753 insertions(+), 351 deletions(-)
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 80 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 10 ++-
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 129 insertions(+), 34 deletions(-)
--
2.50.1.552.g942d659e1b-goog
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
v1 -> v2: fix missing sign-off
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 82 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 12 ++--
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 131 insertions(+), 36 deletions(-)
--
2.50.1.552.g942d659e1b-goog
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 82 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 12 ++--
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 131 insertions(+), 36 deletions(-)
--
2.50.1.552.g942d659e1b-goog
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
v1 -> v2: fix missing sign-off
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 37848a456fc38c191aedfe41f662cc24db8c23d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072839-machinist-coherence-ab5f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 37848a456fc38c191aedfe41f662cc24db8c23d9 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Tue, 15 Jul 2025 20:43:28 +0200
Subject: [PATCH] selftests: mptcp: connect: also cover alt modes
The "mmap" and "sendfile" alternate modes for mptcp_connect.sh/.c are
available from the beginning, but only tested when mptcp_connect.sh is
manually launched with "-m mmap" or "-m sendfile", not via the
kselftests helpers.
The MPTCP CI was manually running "mptcp_connect.sh -m mmap", but not
"-m sendfile". Plus other CIs, especially the ones validating the stable
releases, were not validating these alternate modes.
To make sure these modes are validated by these CIs, add two new test
programs executing mptcp_connect.sh with the alternate modes.
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Cc: stable(a)vger.kernel.org
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-1-8230ddd824…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index e47788bfa671..c6b030babba8 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -4,7 +4,8 @@ top_srcdir = ../../../../..
CFLAGS += -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
-TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \
+TEST_PROGS := mptcp_connect.sh mptcp_connect_mmap.sh mptcp_connect_sendfile.sh \
+ pm_netlink.sh mptcp_join.sh diag.sh \
simult_flows.sh mptcp_sockopt.sh userspace_pm.sh
TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq mptcp_diag
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
new file mode 100755
index 000000000000..5dd30f9394af
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m mmap "${@}"
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
new file mode 100755
index 000000000000..1d16fb1cc9bb
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m sendfile "${@}"
From: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Due to a flaw in the hardware design, the GL9763e replay timer frequently
times out when ASPM is enabled. As a result, the warning messages will
often appear in the system log when the system accesses the GL9763e
PCI config. Therefore, the replay timer timeout must be masked.
Signed-off-by: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Fixes: 1ae1d2d6e555 ("mmc: sdhci-pci-gli: Add Genesys Logic GL9763E support")
Cc: stable(a)vger.kernel.org
---
drivers/mmc/host/sdhci-pci-gli.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
index 436f0460222f..3a1de477e9af 100644
--- a/drivers/mmc/host/sdhci-pci-gli.c
+++ b/drivers/mmc/host/sdhci-pci-gli.c
@@ -1782,6 +1782,9 @@ static void gl9763e_hw_setting(struct sdhci_pci_slot *slot)
value |= FIELD_PREP(GLI_9763E_HS400_RXDLY, GLI_9763E_HS400_RXDLY_5);
pci_write_config_dword(pdev, PCIE_GLI_9763E_CLKRXDLY, value);
+ /* mask the replay timer timeout of AER */
+ sdhci_gli_mask_replay_timer_timeout(pdev);
+
pci_read_config_dword(pdev, PCIE_GLI_9763E_VHS, &value);
value &= ~GLI_9763E_VHS_REV;
value |= FIELD_PREP(GLI_9763E_VHS_REV, GLI_9763E_VHS_REV_R);
--
2.43.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6aae87fe7f180cd93a74466cdb6cf2aa9bb28798
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025072104-bacteria-resend-dcff@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6aae87fe7f180cd93a74466cdb6cf2aa9bb28798 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Le=20Goffic?= <clement.legoffic(a)foss.st.com>
Date: Fri, 4 Jul 2025 10:39:15 +0200
Subject: [PATCH] i2c: stm32f7: unmap DMA mapped buffer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Before each I2C transfer using DMA, the I2C buffer is DMA'pped to make
sure the memory buffer is DMA'able. This is handle in the function
`stm32_i2c_prep_dma_xfer()`.
If the transfer fails for any reason the I2C buffer must be unmap.
Use the dma_callback to factorize the code and fix this issue.
Note that the `stm32f7_i2c_dma_callback()` is now called in case of DMA
transfer success and error and that the `complete()` on the dma_complete
completion structure is done inconditionnally in case of transfer
success or error as well as the `dmaengine_terminate_async()`.
This is allowed as a `complete()` in case transfer error has no effect
as well as a `dmaengine_terminate_async()` on a transfer success.
Also fix the unneeded cast and remove not more needed variables.
Fixes: 7ecc8cfde553 ("i2c: i2c-stm32f7: Add DMA support")
Signed-off-by: Clément Le Goffic <clement.legoffic(a)foss.st.com>
Cc: <stable(a)vger.kernel.org> # v4.18+
Acked-by: Alain Volmat <alain.volmat(a)foss.st.com>
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
Link: https://lore.kernel.org/r/20250704-i2c-upstream-v4-2-84a095a2c728@foss.st.c…
diff --git a/drivers/i2c/busses/i2c-stm32f7.c b/drivers/i2c/busses/i2c-stm32f7.c
index 817d081460c2..73a7b8894c0d 100644
--- a/drivers/i2c/busses/i2c-stm32f7.c
+++ b/drivers/i2c/busses/i2c-stm32f7.c
@@ -739,10 +739,11 @@ static void stm32f7_i2c_disable_dma_req(struct stm32f7_i2c_dev *i2c_dev)
static void stm32f7_i2c_dma_callback(void *arg)
{
- struct stm32f7_i2c_dev *i2c_dev = (struct stm32f7_i2c_dev *)arg;
+ struct stm32f7_i2c_dev *i2c_dev = arg;
struct stm32_i2c_dma *dma = i2c_dev->dma;
stm32f7_i2c_disable_dma_req(i2c_dev);
+ dmaengine_terminate_async(dma->chan_using);
dma_unmap_single(i2c_dev->dev, dma->dma_buf, dma->dma_len,
dma->dma_data_dir);
complete(&dma->dma_complete);
@@ -1510,7 +1511,6 @@ static irqreturn_t stm32f7_i2c_handle_isr_errs(struct stm32f7_i2c_dev *i2c_dev,
u16 addr = f7_msg->addr;
void __iomem *base = i2c_dev->base;
struct device *dev = i2c_dev->dev;
- struct stm32_i2c_dma *dma = i2c_dev->dma;
/* Bus error */
if (status & STM32F7_I2C_ISR_BERR) {
@@ -1551,10 +1551,8 @@ static irqreturn_t stm32f7_i2c_handle_isr_errs(struct stm32f7_i2c_dev *i2c_dev,
}
/* Disable dma */
- if (i2c_dev->use_dma) {
- stm32f7_i2c_disable_dma_req(i2c_dev);
- dmaengine_terminate_async(dma->chan_using);
- }
+ if (i2c_dev->use_dma)
+ stm32f7_i2c_dma_callback(i2c_dev);
i2c_dev->master_mode = false;
complete(&i2c_dev->complete);
@@ -1600,7 +1598,6 @@ static irqreturn_t stm32f7_i2c_isr_event_thread(int irq, void *data)
{
struct stm32f7_i2c_dev *i2c_dev = data;
struct stm32f7_i2c_msg *f7_msg = &i2c_dev->f7_msg;
- struct stm32_i2c_dma *dma = i2c_dev->dma;
void __iomem *base = i2c_dev->base;
u32 status, mask;
int ret;
@@ -1619,10 +1616,8 @@ static irqreturn_t stm32f7_i2c_isr_event_thread(int irq, void *data)
dev_dbg(i2c_dev->dev, "<%s>: Receive NACK (addr %x)\n",
__func__, f7_msg->addr);
writel_relaxed(STM32F7_I2C_ICR_NACKCF, base + STM32F7_I2C_ICR);
- if (i2c_dev->use_dma) {
- stm32f7_i2c_disable_dma_req(i2c_dev);
- dmaengine_terminate_async(dma->chan_using);
- }
+ if (i2c_dev->use_dma)
+ stm32f7_i2c_dma_callback(i2c_dev);
f7_msg->result = -ENXIO;
}
@@ -1640,8 +1635,7 @@ static irqreturn_t stm32f7_i2c_isr_event_thread(int irq, void *data)
ret = wait_for_completion_timeout(&i2c_dev->dma->dma_complete, HZ);
if (!ret) {
dev_dbg(i2c_dev->dev, "<%s>: Timed out\n", __func__);
- stm32f7_i2c_disable_dma_req(i2c_dev);
- dmaengine_terminate_async(dma->chan_using);
+ stm32f7_i2c_dma_callback(i2c_dev);
f7_msg->result = -ETIMEDOUT;
}
}
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Cc: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
v1 -> v2: fix missing sign-off
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Edip Hazuri <edip(a)medip.dev>
The mute led on this laptop is using ALC245 but requires a quirk to work
This patch enables the existing quirk for the device.
Tested on Victus 16-S0063NT Laptop. The LED behaviour works
as intended.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Edip Hazuri <edip(a)medip.dev>
---
sound/hda/codecs/realtek/alc269.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index 05019fa732..77322ff8a6 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -6528,6 +6528,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8bbe, "HP Victus 16-r0xxx (MB 8BBE)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bc8, "HP Victus 15-fa1xxx", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bcd, "HP Omen 16-xd0xxx", ALC245_FIXUP_HP_MUTE_LED_V1_COEFBIT),
+ SND_PCI_QUIRK(0x103c, 0x8bd4, "HP Victus 16-s0xxx (MB 8BD4)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bdd, "HP Envy 17", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x8bde, "HP Envy 17", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x8bdf, "HP Envy 15", ALC287_FIXUP_CS35L41_I2C_2),
--
2.50.1
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series backports Lorenzo's series to the 5.4 kernel.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
Lorenzo Stoakes (3):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: perform the mapping_map_writable() check after call_mmap()
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 ++--
include/linux/mm.h | 26 +++++++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/mmap.c | 26 ++++++++++++++++----------
mm/shmem.c | 2 +-
8 files changed, 42 insertions(+), 24 deletions(-)
--
2.50.1.552.g942d659e1b-goog
The process of adding an I2C adapter can invoke I2C accesses on that new
adapter (see i2c_detect()).
Ensure we have set the adapter's driver data to avoid null pointer
dereferences in the xfer functions during the adapter add.
This has been noted in the past and the same fix proposed but not
completed. See:
https://lore.kernel.org/lkml/ef597e73-ed71-168e-52af-0d19b03734ac@vigem.de/
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sumanth Gavini <sumanth.gavini(a)yahoo.com>
---
drivers/hid/hid-mcp2221.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-mcp2221.c b/drivers/hid/hid-mcp2221.c
index de52e9f7bb8c..9973545c1c4b 100644
--- a/drivers/hid/hid-mcp2221.c
+++ b/drivers/hid/hid-mcp2221.c
@@ -873,12 +873,12 @@ static int mcp2221_probe(struct hid_device *hdev,
"MCP2221 usb-i2c bridge on hidraw%d",
((struct hidraw *)hdev->hidraw)->minor);
+ i2c_set_adapdata(&mcp->adapter, mcp);
ret = i2c_add_adapter(&mcp->adapter);
if (ret) {
hid_err(hdev, "can't add usb-i2c adapter: %d\n", ret);
goto err_i2c;
}
- i2c_set_adapdata(&mcp->adapter, mcp);
/* Setup GPIO chip */
mcp->gc = devm_kzalloc(&hdev->dev, sizeof(*mcp->gc), GFP_KERNEL);
--
2.43.0
The userprogs infrastructure does not expect clang being used with GNU ld
and in that case uses /usr/bin/ld for linking, not the configured $(LD).
This fallback is problematic as it will break when cross-compiling.
Mixing clang and GNU ld is used for example when building for SPARC64,
as ld.lld is not sufficient; see Documentation/kbuild/llvm.rst.
Relax the check around --ld-path so it gets used for all linkers.
Fixes: dfc1b168a8c4 ("kbuild: userprogs: use correct lld when linking through clang")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Nathan, you originally proposed the check for $(CONFIG_LD_IS_LLD) [0],
could you take a look at this?
[0] https://lore.kernel.org/all/20250213175437.GA2756218@ax162/
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index c09766beb7eff4780574682b8ea44475fc0a5188..e300c6546c845c300edb5f0033719963c7da8f9b 100644
--- a/Makefile
+++ b/Makefile
@@ -1134,7 +1134,7 @@ KBUILD_USERCFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD
KBUILD_USERLDFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS))
# userspace programs are linked via the compiler, use the correct linker
-ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_LD_IS_LLD),yy)
+ifneq ($(CONFIG_CC_IS_CLANG),)
KBUILD_USERLDFLAGS += --ld-path=$(LD)
endif
---
base-commit: 6832a9317eee280117cd695fa885b2b7a7a38daf
change-id: 20250723-userprogs-clang-gnu-ld-7a1c16fc852d
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
This patchset fixes the xe driver probe fail with -ETIMEDOUT issue in
linux 6.12.35 and later version. The failure is caused by commit
d42b44736ea2 ("drm/xe/gt: Update handling of xe_force_wake_get return"),
which incorrectly handles the return value of xe_force_wake_get as
"refcounted domain mask" (as introduced in 6.13), rather than status
code (as used in 6.12).
In 6.12 stable kernel, xe_force_wake_get still returns a status code.
The update incorrectly treats the return value as a mask, causing the
return value of 0 to be misinterpreted as an error. As a result, the
driver probe fails with -ETIMEDOUT in xe_pci_probe -> xe_device_probe
-> xe_gt_init_hwconfig -> xe_force_wake_get.
[ 1254.323172] xe 0000:00:02.0: [drm] Found ALDERLAKE_P (device ID 46a6) display version 13.00 stepping D0
[ 1254.323175] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] ALDERLAKE_P 46a6:000c dgfx:0 gfx:Xe_LP (12.00) media:Xe_M (12.00) display:yes dma_m_s:39 tc:1 gscfi:0 cscfi:0
[ 1254.323275] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] Stepping = (G:C0, M:C0, B:**)
[ 1254.323328] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] SR-IOV support: no (mode: none)
[ 1254.323379] xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
[ 1254.323475] xe 0000:00:02.0: probe with driver xe failed with error -110
Similar return handling issue cause by API mismatch are also found in:
Commit 95a75ed2b005 ("drm/xe/tests/mocs: Update xe_force_wake_get() return handling")
Commit 9ffd6ec2de08 ("drm/xe/devcoredump: Update handling of xe_force_wake_get return")
This patchset fixes them by reverting them all.
Additionally, commit deb05f8431f3 ("drm/xe/forcewake: Add a helper
xe_force_wake_ref_has_domain()") is also reverted as it is not needed in
6.12.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5373
Cc: Badal Nilawar <badal.nilawar(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Nirmoy Das <nirmoy.das(a)intel.com>
Tomita Moeko (4):
Revert "drm/xe/gt: Update handling of xe_force_wake_get return"
Revert "drm/xe/tests/mocs: Update xe_force_wake_get() return handling"
Revert "drm/xe/devcoredump: Update handling of xe_force_wake_get
return"
Revert "drm/xe/forcewake: Add a helper xe_force_wake_ref_has_domain()"
drivers/gpu/drm/xe/tests/xe_mocs.c | 21 +++---
drivers/gpu/drm/xe/xe_devcoredump.c | 14 ++--
drivers/gpu/drm/xe/xe_force_wake.h | 16 -----
drivers/gpu/drm/xe/xe_gt.c | 105 +++++++++++++---------------
4 files changed, 63 insertions(+), 93 deletions(-)
--
2.47.2
Greg recently reported 3 patches that could not be applied without
conflicts in v6.1:
- f8a1d9b18c5e ("mptcp: make fallback action and fallback decision atomic")
- def5b7b2643e ("mptcp: plug races between subflow fail and subflow creation")
- da9b2fc7b73d ("mptcp: reset fallback status gracefully at disconnect() time")
Conflicts have been resolved, and documented in each patch.
Paolo Abeni (3):
mptcp: make fallback action and fallback decision atomic
mptcp: plug races between subflow fail and subflow creation
mptcp: reset fallback status gracefully at disconnect() time
net/mptcp/options.c | 3 ++-
net/mptcp/pm.c | 8 ++++++-
net/mptcp/protocol.c | 55 ++++++++++++++++++++++++++++++++++++--------
net/mptcp/protocol.h | 27 +++++++++++++++++-----
net/mptcp/subflow.c | 30 +++++++++++++++---------
5 files changed, 95 insertions(+), 28 deletions(-)
--
2.50.0
The patch titled
Subject: mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting
has been added to the -mm mm-new branch. Its filename is
mm-memory-tiering-fix-pgpromote_candidate-counting.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ruan Shiyang <ruansy.fnst(a)fujitsu.com>
Subject: mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting
Date: Tue, 29 Jul 2025 11:51:01 +0800
Goto-san reported confusing pgpromote statistics where the
pgpromote_success count significantly exceeded pgpromote_candidate.
On a system with three nodes (nodes 0-1: DRAM 4GB, node 2: NVDIMM 4GB):
# Enable demotion only
echo 1 > /sys/kernel/mm/numa/demotion_enabled
numactl -m 0-1 memhog -r200 3500M >/dev/null &
pid=$!
sleep 2
numactl memhog -r100 2500M >/dev/null &
sleep 10
kill -9 $pid # terminate the 1st memhog
# Enable promotion
echo 2 > /proc/sys/kernel/numa_balancing
After a few seconds, we observeed `pgpromote_candidate < pgpromote_success`
$ grep -e pgpromote /proc/vmstat
pgpromote_success 2579
pgpromote_candidate 0
In this scenario, after terminating the first memhog, the conditions for
pgdat_free_space_enough() are quickly met, and triggers promotion.
However, these migrated pages are only counted for in PGPROMOTE_SUCCESS,
not in PGPROMOTE_CANDIDATE.
To solve these confusing statistics, introduce PGPROMOTE_CANDIDATE_NRL to
count the missed promotion pages. And also, not counting these pages into
PGPROMOTE_CANDIDATE is to avoid changing the existing algorithm or
performance of the promotion rate limit.
Link: https://lkml.kernel.org/r/20250729035101.1601407-1-ruansy.fnst@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Signed-off-by: Ruan Shiyang <ruansy.fnst(a)fujitsu.com>
Reported-by: Yasunori Gotou (Fujitsu) <y-goto(a)fujitsu.com>
Suggested-by: Huang Ying <ying.huang(a)linux.alibaba.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 16 +++++++++++++++-
kernel/sched/fair.c | 5 +++--
mm/vmstat.c | 1 +
3 files changed, 19 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/include/linux/mmzone.h
@@ -234,7 +234,21 @@ enum node_stat_item {
#endif
#ifdef CONFIG_NUMA_BALANCING
PGPROMOTE_SUCCESS, /* promote successfully */
- PGPROMOTE_CANDIDATE, /* candidate pages to promote */
+ /**
+ * Candidate pages for promotion based on hint fault latency. This
+ * counter is used to control the promotion rate and adjust the hot
+ * threshold.
+ */
+ PGPROMOTE_CANDIDATE,
+ /**
+ * Not rate-limited (NRL) candidate pages for those can be promoted
+ * without considering hot threshold because of enough free pages in
+ * fast-tier node. These promotions bypass the regular hotness checks
+ * and do NOT influence the promotion rate-limiter or
+ * threshold-adjustment logic.
+ * This is for statistics/monitoring purposes.
+ */
+ PGPROMOTE_CANDIDATE_NRL,
#endif
/* PGDEMOTE_*: pages demoted */
PGDEMOTE_KSWAPD,
--- a/kernel/sched/fair.c~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/kernel/sched/fair.c
@@ -1940,11 +1940,13 @@ bool should_numa_migrate_memory(struct t
struct pglist_data *pgdat;
unsigned long rate_limit;
unsigned int latency, th, def_th;
+ long nr = folio_nr_pages(folio);
pgdat = NODE_DATA(dst_nid);
if (pgdat_free_space_enough(pgdat)) {
/* workload changed, reset hot threshold */
pgdat->nbp_threshold = 0;
+ mod_node_page_state(pgdat, PGPROMOTE_CANDIDATE_NRL, nr);
return true;
}
@@ -1958,8 +1960,7 @@ bool should_numa_migrate_memory(struct t
if (latency >= th)
return false;
- return !numa_promotion_rate_limit(pgdat, rate_limit,
- folio_nr_pages(folio));
+ return !numa_promotion_rate_limit(pgdat, rate_limit, nr);
}
this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
--- a/mm/vmstat.c~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/mm/vmstat.c
@@ -1280,6 +1280,7 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_NUMA_BALANCING
[I(PGPROMOTE_SUCCESS)] = "pgpromote_success",
[I(PGPROMOTE_CANDIDATE)] = "pgpromote_candidate",
+ [I(PGPROMOTE_CANDIDATE_NRL)] = "pgpromote_candidate_nrl",
#endif
[I(PGDEMOTE_KSWAPD)] = "pgdemote_kswapd",
[I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
_
Patches currently in -mm which might be from ruansy.fnst(a)fujitsu.com are
mm-memory-tiering-fix-pgpromote_candidate-counting.patch
The patch titled
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
has been added to the -mm mm-unstable branch. Its filename is
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
Date: Tue, 29 Jul 2025 07:57:09 -0700
- Addressed Lorenzo's nits, per Lorenzo Stoakes
- Added a warning comment for vma_start_read()
- Added Reviewed-by and Acked-by, per Vlastimil Babka and Lorenzo Stoakes
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmap_lock.h | 7 +++++++
mm/mmap_lock.c | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
--- a/include/linux/mmap_lock.h~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/include/linux/mmap_lock.h
@@ -155,6 +155,10 @@ static inline void vma_refcount_put(stru
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -194,9 +198,12 @@ static inline struct vm_area_struct *vma
if (unlikely(vma->vm_mm != mm)) {
/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
struct mm_struct *other_mm = vma->vm_mm;
+
/*
* __mmdrop() is a heavy operation and we don't need RCU
* protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
*/
rcu_read_unlock();
mmgrab(other_mm);
--- a/mm/mmap_lock.c~mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3
+++ a/mm/mmap_lock.c
@@ -235,7 +235,7 @@ retry:
goto fallback;
}
- /* Verify the vma is not behind of the last search position. */
+ /* Verify the vma is not behind the last search position. */
if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped.patch
mm-fix-a-uaf-when-vma-mm-is-freed-after-vma-vm_refcnt-got-dropped-v3.patch
From: Filipe Manana <fdmanana(a)suse.com>
commit f9baa501b4fd6962257853d46ddffbc21f27e344 upstream.
There are a few exceptional cases where cloning an inline extent needs to
copy the inline extent data into a page of the destination inode.
When this happens, we end up starting a transaction while having a dirty
page for the destination inode and while having the range locked in the
destination's inode iotree too. Because when reserving metadata space
for a transaction we may need to flush existing delalloc in case there is
not enough free space, we have a mechanism in place to prevent a deadlock,
which was introduced in commit 3d45f221ce627d ("btrfs: fix deadlock when
cloning inline extent and low on free metadata space").
However when using qgroups, a transaction also reserves metadata qgroup
space, which can also result in flushing delalloc in case there is not
enough available space at the moment. When this happens we deadlock, since
flushing delalloc requires locking the file range in the inode's iotree
and the range was already locked at the very beginning of the clone
operation, before attempting to start the transaction.
When this issue happens, stack traces like the following are reported:
[72747.556262] task:kworker/u81:9 state:D stack: 0 pid: 225 ppid: 2 flags:0x00004000
[72747.556268] Workqueue: writeback wb_workfn (flush-btrfs-1142)
[72747.556271] Call Trace:
[72747.556273] __schedule+0x296/0x760
[72747.556277] schedule+0x3c/0xa0
[72747.556279] io_schedule+0x12/0x40
[72747.556284] __lock_page+0x13c/0x280
[72747.556287] ? generic_file_readonly_mmap+0x70/0x70
[72747.556325] extent_write_cache_pages+0x22a/0x440 [btrfs]
[72747.556331] ? __set_page_dirty_nobuffers+0xe7/0x160
[72747.556358] ? set_extent_buffer_dirty+0x5e/0x80 [btrfs]
[72747.556362] ? update_group_capacity+0x25/0x210
[72747.556366] ? cpumask_next_and+0x1a/0x20
[72747.556391] extent_writepages+0x44/0xa0 [btrfs]
[72747.556394] do_writepages+0x41/0xd0
[72747.556398] __writeback_single_inode+0x39/0x2a0
[72747.556403] writeback_sb_inodes+0x1ea/0x440
[72747.556407] __writeback_inodes_wb+0x5f/0xc0
[72747.556410] wb_writeback+0x235/0x2b0
[72747.556414] ? get_nr_inodes+0x35/0x50
[72747.556417] wb_workfn+0x354/0x490
[72747.556420] ? newidle_balance+0x2c5/0x3e0
[72747.556424] process_one_work+0x1aa/0x340
[72747.556426] worker_thread+0x30/0x390
[72747.556429] ? create_worker+0x1a0/0x1a0
[72747.556432] kthread+0x116/0x130
[72747.556435] ? kthread_park+0x80/0x80
[72747.556438] ret_from_fork+0x1f/0x30
[72747.566958] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[72747.566961] Call Trace:
[72747.566964] __schedule+0x296/0x760
[72747.566968] ? finish_wait+0x80/0x80
[72747.566970] schedule+0x3c/0xa0
[72747.566995] wait_extent_bit.constprop.68+0x13b/0x1c0 [btrfs]
[72747.566999] ? finish_wait+0x80/0x80
[72747.567024] lock_extent_bits+0x37/0x90 [btrfs]
[72747.567047] btrfs_invalidatepage+0x299/0x2c0 [btrfs]
[72747.567051] ? find_get_pages_range_tag+0x2cd/0x380
[72747.567076] __extent_writepage+0x203/0x320 [btrfs]
[72747.567102] extent_write_cache_pages+0x2bb/0x440 [btrfs]
[72747.567106] ? update_load_avg+0x7e/0x5f0
[72747.567109] ? enqueue_entity+0xf4/0x6f0
[72747.567134] extent_writepages+0x44/0xa0 [btrfs]
[72747.567137] ? enqueue_task_fair+0x93/0x6f0
[72747.567140] do_writepages+0x41/0xd0
[72747.567144] __filemap_fdatawrite_range+0xc7/0x100
[72747.567167] btrfs_run_delalloc_work+0x17/0x40 [btrfs]
[72747.567195] btrfs_work_helper+0xc2/0x300 [btrfs]
[72747.567200] process_one_work+0x1aa/0x340
[72747.567202] worker_thread+0x30/0x390
[72747.567205] ? create_worker+0x1a0/0x1a0
[72747.567208] kthread+0x116/0x130
[72747.567211] ? kthread_park+0x80/0x80
[72747.567214] ret_from_fork+0x1f/0x30
[72747.569686] task:fsstress state:D stack: 0 pid:841421 ppid:841417 flags:0x00000000
[72747.569689] Call Trace:
[72747.569691] __schedule+0x296/0x760
[72747.569694] schedule+0x3c/0xa0
[72747.569721] try_flush_qgroup+0x95/0x140 [btrfs]
[72747.569725] ? finish_wait+0x80/0x80
[72747.569753] btrfs_qgroup_reserve_data+0x34/0x50 [btrfs]
[72747.569781] btrfs_check_data_free_space+0x5f/0xa0 [btrfs]
[72747.569804] btrfs_buffered_write+0x1f7/0x7f0 [btrfs]
[72747.569810] ? path_lookupat.isra.48+0x97/0x140
[72747.569833] btrfs_file_write_iter+0x81/0x410 [btrfs]
[72747.569836] ? __kmalloc+0x16a/0x2c0
[72747.569839] do_iter_readv_writev+0x160/0x1c0
[72747.569843] do_iter_write+0x80/0x1b0
[72747.569847] vfs_writev+0x84/0x140
[72747.569869] ? btrfs_file_llseek+0x38/0x270 [btrfs]
[72747.569873] do_writev+0x65/0x100
[72747.569876] do_syscall_64+0x33/0x40
[72747.569879] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[72747.569899] task:fsstress state:D stack: 0 pid:841424 ppid:841417 flags:0x00004000
[72747.569903] Call Trace:
[72747.569906] __schedule+0x296/0x760
[72747.569909] schedule+0x3c/0xa0
[72747.569936] try_flush_qgroup+0x95/0x140 [btrfs]
[72747.569940] ? finish_wait+0x80/0x80
[72747.569967] __btrfs_qgroup_reserve_meta+0x36/0x50 [btrfs]
[72747.569989] start_transaction+0x279/0x580 [btrfs]
[72747.570014] clone_copy_inline_extent+0x332/0x490 [btrfs]
[72747.570041] btrfs_clone+0x5b7/0x7a0 [btrfs]
[72747.570068] ? lock_extent_bits+0x64/0x90 [btrfs]
[72747.570095] btrfs_clone_files+0xfc/0x150 [btrfs]
[72747.570122] btrfs_remap_file_range+0x3d8/0x4a0 [btrfs]
[72747.570126] do_clone_file_range+0xed/0x200
[72747.570131] vfs_clone_file_range+0x37/0x110
[72747.570134] ioctl_file_clone+0x7d/0xb0
[72747.570137] do_vfs_ioctl+0x138/0x630
[72747.570140] __x64_sys_ioctl+0x62/0xc0
[72747.570143] do_syscall_64+0x33/0x40
[72747.570146] entry_SYSCALL_64_after_hwframe+0x44/0xa9
So fix this by skipping the flush of delalloc for an inode that is
flagged with BTRFS_INODE_NO_DELALLOC_FLUSH, meaning it is currently under
such a special case of cloning an inline extent, when flushing delalloc
during qgroup metadata reservation.
The special cases for cloning inline extents were added in kernel 5.7 by
by commit 05a5a7621ce66c ("Btrfs: implement full reflink support for
inline extents"), while having qgroup metadata space reservation flushing
delalloc when low on space was added in kernel 5.9 by commit
c53e9653605dbf ("btrfs: qgroup: try to flush qgroup space when we get
-EDQUOT"). So use a "Fixes:" tag for the later commit to ease stable
kernel backports.
Reported-by: Wang Yugui <wangyugui(a)e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20210421083137.31E3.409509F4@e16-tech.c…
Fixes: c53e9653605dbf ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable(a)vger.kernel.org # 5.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on 5.10.y, Passed false to
btrfs_start_delalloc_flush() in fs/btrfs/transaction.c file to
maintain the default behaviour]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
fs/btrfs/ctree.h | 2 +-
fs/btrfs/inode.c | 4 ++--
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/qgroup.c | 2 +-
fs/btrfs/send.c | 4 ++--
fs/btrfs/transaction.c | 2 +-
6 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7ad3091db571..d9d6a57acafe 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3013,7 +3013,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
struct inode *inode, u64 new_size,
u32 min_type);
-int btrfs_start_delalloc_snapshot(struct btrfs_root *root);
+int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_context);
int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr,
bool in_reclaim_context);
int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8d7ca8a21525..99aad39fad13 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9566,7 +9566,7 @@ static int start_delalloc_inodes(struct btrfs_root *root,
return ret;
}
-int btrfs_start_delalloc_snapshot(struct btrfs_root *root)
+int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_context)
{
struct writeback_control wbc = {
.nr_to_write = LONG_MAX,
@@ -9579,7 +9579,7 @@ int btrfs_start_delalloc_snapshot(struct btrfs_root *root)
if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
return -EROFS;
- return start_delalloc_inodes(root, &wbc, true, false);
+ return start_delalloc_inodes(root, &wbc, true, in_reclaim_context);
}
int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr,
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 24c4d059cfab..9d5dfcec22de 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1030,7 +1030,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
*/
btrfs_drew_read_lock(&root->snapshot_lock);
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, false);
if (ret)
goto out;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 95a39d535a82..bc1feb97698c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3704,7 +3704,7 @@ static int try_flush_qgroup(struct btrfs_root *root)
return 0;
}
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, true);
if (ret < 0)
goto out;
btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1);
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3e7bb24eb227..d86b4d13cae4 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -7207,7 +7207,7 @@ static int flush_delalloc_roots(struct send_ctx *sctx)
int i;
if (root) {
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, false);
if (ret)
return ret;
btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX);
@@ -7215,7 +7215,7 @@ static int flush_delalloc_roots(struct send_ctx *sctx)
for (i = 0; i < sctx->clone_roots_cnt; i++) {
root = sctx->clone_roots[i].root;
- ret = btrfs_start_delalloc_snapshot(root);
+ ret = btrfs_start_delalloc_snapshot(root, false);
if (ret)
return ret;
btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 21a5a963c70e..424b1dd3fe27 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2045,7 +2045,7 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_trans_handle *trans)
list_for_each_entry(pending, head, list) {
int ret;
- ret = btrfs_start_delalloc_snapshot(pending->root);
+ ret = btrfs_start_delalloc_snapshot(pending->root, false);
if (ret)
return ret;
}
--
2.40.4
From: Sean Christopherson <seanjc(a)google.com>
[ Upstream commit 17bcd714426386fda741a4bccd96a2870179344b ]
Free vCPUs before freeing any VM state, as both SVM and VMX may access
VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs
to be kicked out of nested guest mode.
Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was
called") partially fixed the issue, but for unknown reasons only moved the
MMU unloading before VM destruction. Complete the change, and free all
vCPU state prior to destroying VM state, as nVMX accesses even more state
than nSVM.
In addition to the AVIC, KVM can hit a use-after-free on MSR filters:
kvm_msr_allowed+0x4c/0xd0
__kvm_set_msr+0x12d/0x1e0
kvm_set_msr+0x19/0x40
load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel]
nested_vmx_vmexit+0x715/0xbd0 [kvm_intel]
nested_vmx_free_vcpu+0x33/0x50 [kvm_intel]
vmx_free_vcpu+0x54/0xc0 [kvm_intel]
kvm_arch_vcpu_destroy+0x28/0xf0
kvm_vcpu_destroy+0x12/0x50
kvm_arch_destroy_vm+0x12c/0x1c0
kvm_put_kvm+0x263/0x3c0
kvm_vm_release+0x21/0x30
and an upcoming fix to process injectable interrupts on nested VM-Exit
will access the PIC:
BUG: kernel NULL pointer dereference, address: 0000000000000090
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re
RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm]
Call Trace:
<TASK>
kvm_cpu_has_injectable_intr+0xe/0x60 [kvm]
nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel]
nested_vmx_free_vcpu+0x40/0x50 [kvm_intel]
vmx_vcpu_free+0x2d/0x80 [kvm_intel]
kvm_arch_vcpu_destroy+0x2d/0x130 [kvm]
kvm_destroy_vcpus+0x8a/0x100 [kvm]
kvm_arch_destroy_vm+0xa7/0x1d0 [kvm]
kvm_destroy_vm+0x172/0x300 [kvm]
kvm_vcpu_release+0x31/0x50 [kvm]
Inarguably, both nSVM and nVMX need to be fixed, but punt on those
cleanups for the moment. Conceptually, vCPUs should be freed before VM
state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs
are created, so it stands to reason that they must be freed _after_ vCPUs
are destroyed.
Reported-by: Aaron Lewis <aaronlewis(a)google.com>
Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com
Cc: Jim Mattson <jmattson(a)google.com>
Cc: Yan Zhao <yan.y.zhao(a)intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: Kai Huang <kai.huang(a)intel.com>
Cc: Isaku Yamahata <isaku.yamahata(a)intel.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Kevin Cheng <chengkev(a)google.com>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f378d479fea3f..7f91b11e6f0ec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12888,11 +12888,11 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
mutex_unlock(&kvm->slots_lock);
}
kvm_unload_vcpu_mmus(kvm);
+ kvm_destroy_vcpus(kvm);
kvm_x86_call(vm_destroy)(kvm);
kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
- kvm_destroy_vcpus(kvm);
kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
kvm_mmu_uninit_vm(kvm);
--
2.50.1.470.g6ba607880d-goog
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Cc: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Cc: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Ming Lei <ming.lei(a)redhat.com>
[ Upstream commit a647a524a46736786c95cdb553a070322ca096e3 ]
rq_qos framework is only applied on request based driver, so:
1) rq_qos_done_bio() needn't to be called for bio based driver
2) rq_qos_done_bio() needn't to be called for bio which isn't tracked,
such as bios ended from error handling code.
Especially in bio_endio():
1) request queue is referred via bio->bi_bdev->bd_disk->queue, which
may be gone since request queue refcount may not be held in above two
cases
2) q->rq_qos may be freed in blk_cleanup_queue() when calling into
__rq_qos_done_bio()
Fix the potential kernel panic by not calling rq_qos_ops->done_bio if
the bio isn't tracked. This way is safe because both ioc_rqos_done_bio()
and blkcg_iolatency_done_bio() are nop if the bio isn't tracked.
Reported-by: Yu Kuai <yukuai3(a)huawei.com>
Cc: tj(a)kernel.org
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Acked-by: Tejun Heo <tj(a)kernel.org>
Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on 5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
block/bio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bio.c b/block/bio.c
index 88a09c31095f..7851f54edc76 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1430,7 +1430,7 @@ void bio_endio(struct bio *bio)
if (!bio_integrity_endio(bio))
return;
- if (bio->bi_disk)
+ if (bio->bi_disk && bio_flagged(bio, BIO_TRACKED))
rq_qos_done_bio(bio->bi_disk->queue, bio);
/*
--
2.40.4
From: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
commit ac2140449184a26eac99585b7f69814bd3ba8f2d upstream.
This commit addresses a potential null pointer dereference issue in the
`dcn32_acquire_idle_pipe_for_head_pipe_in_layer` function. The issue
could occur when `head_pipe` is null.
The fix adds a check to ensure `head_pipe` is not null before asserting
it. If `head_pipe` is null, the function returns NULL to prevent a
potential null pointer dereference.
Reported by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn32/dcn32_resource.c:2690 dcn32_acquire_idle_pipe_for_head_pipe_in_layer() error: we previously assumed 'head_pipe' could be null (see line 2681)
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Roman Li <roman.li(a)amd.com>
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[ Daniil: dcn32 was moved from drivers/gpu/drm/amd/display/dc to
drivers/gpu/drm/amd/display/dc/resource since commit
8b8eed05a1c6 ("drm/amd/display: Refactor resource into component directory").
The path is changed accordingly to apply the patch on 6.1.y. and 6.6.y ]
Signed-off-by: Daniil Dulov <d.dulov(a)aladdin.ru>
---
Backport fix for CVE-2024-49918
drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
index 1b1534ffee9f..591c3166a468 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
@@ -2563,8 +2563,10 @@ struct pipe_ctx *dcn32_acquire_idle_pipe_for_head_pipe_in_layer(
struct resource_context *old_ctx = &stream->ctx->dc->current_state->res_ctx;
int head_index;
- if (!head_pipe)
+ if (!head_pipe) {
ASSERT(0);
+ return NULL;
+ }
/*
* Modified from dcn20_acquire_idle_pipe_for_layer
--
2.34.1