From: Su Hui suhui@nfschina.com
[ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ]
When encounters some errors like these: xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui suhui@nfschina.com Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is suitable for backporting to stable kernel trees for the following reasons:
1. **Enhanced Debugging for Real-World Issues**: The commit improves debugging of USB xHCI host controller failures by printing the actual `xhc_state` value when `queue_command` fails. The commit message shows real error messages users encounter ("xHCI dying or halted, can't queue_command"), demonstrating this is a real-world debugging problem.
2. **Minimal and Safe Change**: The change is extremely small and safe - it only modifies a debug print statement from: ```c xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n"); ``` to: ```c xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n", xhci->xhc_state); ```
3. **No Functional Changes**: This is a pure diagnostic improvement. It doesn't change any logic, control flow, or data structures. It only adds the state value (0x%x format) to an existing debug message.
4. **Important for Troubleshooting**: The xHCI driver is critical for USB functionality, and when it fails with "dying or halted" states, knowing the exact state helps diagnose whether: - `XHCI_STATE_DYING` (0x1) - controller is dying - `XHCI_STATE_HALTED` (0x2) - controller is halted - Both states (0x3) - controller has both flags set
This distinction is valuable for debugging hardware issues, driver bugs, or system problems.
5. **Zero Risk of Regression**: Adding a parameter to a debug print statement has no risk of introducing regressions. The worst case is the debug message prints the state value.
6. **Follows Stable Rules**: This meets stable kernel criteria as it: - Fixes a real debugging limitation - Is obviously correct - Has been tested (signed-off and accepted by Greg KH) - Is small (single line change) - Doesn't add new features, just improves existing diagnostics
The commit helps system administrators and developers diagnose USB issues more effectively by providing the actual state value rather than just saying "dying or halted", making it a valuable debugging enhancement for stable kernels.
drivers/usb/host/xhci-ring.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 08b016864fc0..71b17a00d3ed 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -4076,7 +4076,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
if ((xhci->xhc_state & XHCI_STATE_DYING) || (xhci->xhc_state & XHCI_STATE_HALTED)) { - xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n"); + xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n", + xhci->xhc_state); return -ESHUTDOWN; }
From: Prashant Malani pmalani@google.com
[ Upstream commit 0a1416a49e63c320f6e6c1c8d07e1b58c0d4a3f3 ]
AMU counters on certain CPPC-based platforms tend to yield inaccurate delivered performance measurements on systems that are idle/mostly idle. This results in an inaccurate frequency being stored by cpufreq in its policy structure when the CPU is brought online. [1]
Consequently, if the userspace governor tries to set the frequency to a new value, there is a possibility that it would be the erroneous value stored earlier. In such a scenario, cpufreq would assume that the requested frequency has already been set and return early, resulting in the correct/new frequency request never making it to the hardware.
Since the operating frequency is liable to this sort of inconsistency, mark the CPPC driver with CPUFREQ_NEED_UPDATE_LIMITS so that it is always invoked when a target frequency update is requested.
Link: https://lore.kernel.org/linux-pm/20250619000925.415528-3-pmalani@google.com/ [1] Suggested-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Prashant Malani pmalani@google.com Acked-by: Viresh Kumar viresh.kumar@linaro.org Link: https://patch.msgid.link/20250722055611.130574-2-pmalani@google.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Code Change Analysis
The change is minimal and focused - it only adds the `CPUFREQ_NEED_UPDATE_LIMITS` flag to the cppc_cpufreq_driver struct:
```c - .flags = CPUFREQ_CONST_LOOPS, + .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, ```
This single-line change modifies the driver flags in drivers/cpufreq/cppc_cpufreq.c:928.
## Bug Being Fixed
The commit addresses a real bug where:
1. **AMU counters on CPPC platforms provide inaccurate measurements** when systems are idle/mostly idle 2. **This causes cpufreq to store incorrect frequency values** in its policy structure during CPU online 3. **Userspace frequency changes fail silently** because cpufreq's early return optimization (`if (target_freq == policy->cur)`) prevents the hardware from receiving the correct frequency request
The bug manifests as userspace being unable to set CPU frequencies correctly on affected CPPC platforms, which is a functional regression that impacts power management and performance.
## Why This Qualifies for Stable
1. **Fixes a real bug**: The issue prevents proper CPU frequency scaling on CPPC-based platforms, affecting power management functionality
2. **Minimal risk**: The change is extremely small (one flag addition) and well-contained to the CPPC driver
3. **Clear fix semantics**: The `CPUFREQ_NEED_UPDATE_LIMITS` flag is an established mechanism (introduced in commit 1c534352f47f) specifically designed for this scenario where drivers need frequency updates even when target_freq == policy->cur
4. **Similar precedent**: Other cpufreq drivers (intel_pstate, amd- pstate) already use this flag for similar reasons, showing it's a proven solution
5. **No architectural changes**: This doesn't introduce new features or change kernel architecture - it simply ensures the driver behaves correctly
6. **Recent related fixes**: The git history shows ongoing work to fix CPPC feedback counter issues (e.g., commit c47195631960), indicating this is part of stabilizing CPPC functionality
## Impact Assessment
- **Affected systems**: CPPC-based platforms with AMU counters that exhibit the idle measurement issue - **Risk**: Very low - the flag only bypasses an optimization when needed, forcing frequency updates to reach hardware - **Testing**: The change has been acked by the cpufreq maintainer (Viresh Kumar) and suggested by the PM subsystem maintainer (Rafael J. Wysocki)
The commit follows stable tree rules by fixing an important bug with minimal changes and low regression risk.
drivers/cpufreq/cppc_cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 8910fd1ae3c6..c85b01aa801d 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -407,7 +407,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum) }
static struct cpufreq_driver cppc_cpufreq_driver = { - .flags = CPUFREQ_CONST_LOOPS, + .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, .verify = cppc_verify_policy, .target = cppc_cpufreq_set_target, .get = cppc_cpufreq_get_rate,
From: Cynthia Huang cynthia@andestech.com
[ Upstream commit 04850819c65c8242072818655d4341e70ae998b5 ]
The kernel does not provide sys_futex() on 32-bit architectures that do not support 32-bit time representations, such as riscv32.
As a result, glibc cannot define SYS_futex, causing compilation failures in tests that rely on this syscall. Define SYS_futex as SYS_futex_time64 in such cases to ensure successful compilation and compatibility.
Signed-off-by: Cynthia Huang cynthia@andestech.com Signed-off-by: Ben Zong-You Xie ben717@andestech.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Muhammad Usama Anjum usama.anjum@collabora.com Link: https://lore.kernel.org/all/20250710103630.3156130-1-ben717@andestech.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real compilation failure bug**: The commit addresses a build failure in kernel selftests on 32-bit architectures with 64-bit time_t, specifically riscv32. This prevents the futex selftests from compiling on these architectures, which is a functional bug that affects testing infrastructure.
2. **Simple and contained fix**: The change is minimal - it only adds a conditional preprocessor definition that maps `SYS_futex` to `SYS_futex_time64` when the former is not defined but the latter is. The fix is: ```c #if !defined(SYS_futex) && defined(SYS_futex_time64) #define SYS_futex SYS_futex_time64 #endif ```
3. **No risk of regression**: The change is guarded by preprocessor conditionals that only activate when `SYS_futex` is not defined AND `SYS_futex_time64` is defined. This means it has zero impact on architectures where `SYS_futex` is already defined, ensuring no regressions on existing systems.
4. **Affects kernel testing infrastructure**: While this is in the selftests directory and not core kernel code, having working selftests is critical for kernel stability and quality assurance. The futex selftests are important for validating futex functionality across different architectures.
5. **Addresses Y2038 compatibility**: This fix is part of the broader Y2038 compatibility effort where 32-bit architectures are transitioning to 64-bit time_t. As more 32-bit architectures adopt 64-bit time_t, this fix becomes increasingly important.
6. **Clear problem and solution**: The commit message clearly explains the issue (glibc cannot define SYS_futex on certain architectures) and provides a clean solution that maintains compatibility.
The fix follows stable kernel rules by being a minimal change that fixes an important bug without introducing new features or architectural changes. It's confined to the testing infrastructure and has clear boundaries with no side effects beyond enabling compilation of the futex selftests on affected architectures.
tools/testing/selftests/futex/include/futextest.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h index ddbcfc9b7bac..7a5fd1d5355e 100644 --- a/tools/testing/selftests/futex/include/futextest.h +++ b/tools/testing/selftests/futex/include/futextest.h @@ -47,6 +47,17 @@ typedef volatile u_int32_t futex_t; FUTEX_PRIVATE_FLAG) #endif
+/* + * SYS_futex is expected from system C library, in glibc some 32-bit + * architectures (e.g. RV32) are using 64-bit time_t, therefore it doesn't have + * SYS_futex defined but just SYS_futex_time64. Define SYS_futex as + * SYS_futex_time64 in this situation to ensure the compilation and the + * compatibility. + */ +#if !defined(SYS_futex) && defined(SYS_futex_time64) +#define SYS_futex SYS_futex_time64 +#endif + /** * futex() - SYS_futex syscall wrapper * @uaddr: address of first futex
From: Romain Gantois romain.gantois@bootlin.com
[ Upstream commit d511206dc7443120637efd9cfa3ab06a26da33dd ]
The regulator_set_voltage() function may exhibit unexpected behavior if the target regulator has a maximum voltage step constraint. With such a constraint, the regulator core may clamp the requested voltage to a lesser value, to ensure that the voltage delta stays under the specified limit.
This means that the resulting regulator voltage depends on the current voltage, as well as the requested range, which invalidates the assumption that a repeated request for a specific voltage range will amount to a noop.
Considering the case of a regulator with a maximum voltage step constraint of 1V:
initial voltage: 2.5V
consumer requests 4V expected result: 3.5V resulting voltage: 3.5V
consumer requests 4V again expected result: 4V actual result: 3.5V
Correct this by repeating attempts to balance the regulator voltage until the result converges.
Signed-off-by: Romain Gantois romain.gantois@bootlin.com Link: https://patch.msgid.link/20250718-regulator-stepping-v2-1-e28c9ac5d54a@bootl... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit fixes a legitimate bug in the voltage regulator subsystem that affects users with stepped voltage regulators. Here's my extensive analysis:
## Bug Description and Impact
The commit addresses a bug where regulators with `max_uV_step` constraints fail to reach their target voltage when the same voltage range is requested multiple times. The bug manifests as follows:
1. When a regulator has a maximum voltage step constraint (e.g., 1V), the core limits each voltage change to stay within this step 2. The optimization at line 3814 (`if (voltage->min_uV == min_uV && voltage->max_uV == max_uV) goto out;`) assumes repeated requests are no-ops 3. This creates a situation where requesting 4V twice from 2.5V only reaches 3.5V, not the desired 4V
## Code Analysis
The fix adds a retry mechanism specifically for stepped regulators:
1. **New helper function** `regulator_get_voltage_delta()` (lines 3800-3808): Calculates the absolute difference between current and target voltage 2. **Retry loop** (lines 3865-3893): After the initial voltage setting, if `max_uV_step` is configured, it: - Checks if we've reached the target voltage (delta > 0) - Repeatedly calls `regulator_balance_voltage()` until convergence - Includes convergence protection to avoid infinite loops (line 3888)
## Why This Is a Good Backport Candidate
1. **Fixes a real bug**: Users with stepped voltage regulators cannot reach target voltages, potentially causing system instability or device malfunction 2. **Minimal invasive changes**: The fix is well-contained within `regulator_set_voltage_unlocked()` and only affects regulators with `max_uV_step` constraints 3. **No API/ABI changes**: Only internal implementation changes, no external interfaces modified 4. **Clear regression potential**: Low risk as the new code only executes for regulators with `max_uV_step` set 5. **Safety checks included**: The convergence check prevents infinite loops (`if (new_delta - delta > rdev->constraints->max_uV_step)`)
## Specific Code References
- The bug is in the optimization at drivers/regulator/core.c:3814-3815 - The fix adds retry logic at drivers/regulator/core.c:3865-3893 - Only affects regulators where `rdev->constraints->max_uV_step > 0` - The existing `regulator_limit_voltage_step()` function already handles the step limiting logic
This is exactly the type of fix that belongs in stable: it addresses a specific functional bug without introducing new features or architectural changes.
drivers/regulator/core.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index a01a769b2f2d..8776dcc6737e 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -3486,6 +3486,16 @@ static int _regulator_do_set_suspend_voltage(struct regulator_dev *rdev, return 0; }
+static int regulator_get_voltage_delta(struct regulator_dev *rdev, int uV) +{ + int current_uV = regulator_get_voltage_rdev(rdev); + + if (current_uV < 0) + return current_uV; + + return abs(current_uV - uV); +} + static int regulator_set_voltage_unlocked(struct regulator *regulator, int min_uV, int max_uV, suspend_state_t state) @@ -3493,8 +3503,8 @@ static int regulator_set_voltage_unlocked(struct regulator *regulator, struct regulator_dev *rdev = regulator->rdev; struct regulator_voltage *voltage = ®ulator->voltage[state]; int ret = 0; + int current_uV, delta, new_delta; int old_min_uV, old_max_uV; - int current_uV;
/* If we're setting the same range as last time the change * should be a noop (some cpufreq implementations use the same @@ -3541,6 +3551,37 @@ static int regulator_set_voltage_unlocked(struct regulator *regulator, voltage->max_uV = old_max_uV; }
+ if (rdev->constraints->max_uV_step > 0) { + /* For regulators with a maximum voltage step, reaching the desired + * voltage might take a few retries. + */ + ret = regulator_get_voltage_delta(rdev, min_uV); + if (ret < 0) + goto out; + + delta = ret; + + while (delta > 0) { + ret = regulator_balance_voltage(rdev, state); + if (ret < 0) + goto out; + + ret = regulator_get_voltage_delta(rdev, min_uV); + if (ret < 0) + goto out; + + new_delta = ret; + + /* check that voltage is converging quickly enough */ + if (new_delta - delta > rdev->constraints->max_uV_step) { + ret = -EWOULDBLOCK; + goto out; + } + + delta = new_delta; + } + } + out: return ret; }
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 65fc0fc137b5da3ee1f4ca4f61050fcb203d7582 ]
When a USB4 dock is unplugged from a system it won't respond to ring events. The PCI core handles the surprise removal event and notifies all PCI drivers. The XHCI PCI driver sets a flag that the device is being removed, and when the device stops responding a flag is also added to indicate it's dying.
When that flag is set don't bother to show warnings about a missing controller.
Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Acked-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20250717073107.488599-3-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of this commit, here is my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real user-visible bug**: The commit addresses spurious warning messages that appear when USB4 docks are unplugged. When a USB4 dock is removed, the controller becomes inaccessible (reads return 0xffffffff), but the driver still attempts operations and logs warnings that are misleading to users.
2. **Small and contained fix**: The changes are minimal - only adding conditional checks before logging warnings in two functions (`xhci_halt()` and `xhci_reset()`). The fix simply checks if the `XHCI_STATE_DYING` flag is set before printing warnings.
3. **Low risk of regression**: The changes only affect warning messages, not functional behavior. The actual error paths and return values remain unchanged. This is a purely cosmetic fix that improves the user experience by avoiding confusing warnings during expected hardware removal scenarios.
4. **Clear problem and solution**: The commit message clearly describes the issue (USB4 dock removal causes warnings) and the solution (suppress warnings when controller is dying). The code changes directly implement this solution.
5. **Affects common hardware scenarios**: USB4 docks are becoming increasingly common, and surprise removal is a normal use case that shouldn't generate alarming warnings in system logs.
6. **Follows stable tree criteria**: This is a bug fix that improves user experience without introducing new features or making architectural changes. It's exactly the type of fix that stable trees are meant to include.
The code changes show: - In `xhci_halt()`: Added check `if (!(xhci->xhc_state & XHCI_STATE_DYING))` before `xhci_warn()` - In `xhci_reset()`: Added the same check before another `xhci_warn()`
Both changes prevent warning messages when the controller is already marked as dying, which is the expected state during USB4 dock removal. This is a classic example of a stable-worthy fix that improves system behavior without functional changes.
drivers/usb/host/xhci.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 3383d7f0c88f..dcda4d7c2b75 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -118,7 +118,8 @@ int xhci_halt(struct xhci_hcd *xhci) ret = xhci_handshake(&xhci->op_regs->status, STS_HALT, STS_HALT, XHCI_MAX_HALT_USEC); if (ret) { - xhci_warn(xhci, "Host halt failed, %d\n", ret); + if (!(xhci->xhc_state & XHCI_STATE_DYING)) + xhci_warn(xhci, "Host halt failed, %d\n", ret); return ret; } xhci->xhc_state |= XHCI_STATE_HALTED; @@ -175,7 +176,8 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) state = readl(&xhci->op_regs->status);
if (state == ~(u32)0) { - xhci_warn(xhci, "Host not accessible, reset failed.\n"); + if (!(xhci->xhc_state & XHCI_STATE_DYING)) + xhci_warn(xhci, "Host not accessible, reset failed.\n"); return -ENODEV; }
From: Jay Chen shawn2000100@gmail.com
[ Upstream commit f72b9aa821a2bfe4b6dfec4be19f264d0673b008 ]
There is a subtle contradiction between sections of the xHCI 1.2 spec regarding the initialization of Input Endpoint Context fields. Section 4.8.2 ("Endpoint Context Initialization") states that all fields should be initialized to 0. However, Section 6.2.3 ("Endpoint Context", p.453) specifies that the Average TRB Length (avg_trb_len) field shall be greater than 0, and explicitly notes (p.454): "Software shall set Average TRB Length to '8' for control endpoints."
Strictly setting all fields to 0 during initialization conflicts with the specific recommendation for control endpoints. In practice, setting avg_trb_len = 0 is not meaningful for the hardware/firmware, as the value is used for bandwidth calculation.
Motivation: Our company is developing a custom Virtual xHC hardware platform that strictly follows the xHCI spec and its recommendations. During validation, we observed that enumeration fails and a parameter error (TRB Completion Code = 5) is reported if avg_trb_len for EP0 is not set to 8 as recommended by Section 6.2.3. This demonstrates the importance of assigning a meaningful, non-zero value to avg_trb_len, even in virtualized or emulated environments.
This patch explicitly sets avg_trb_len to 8 for EP0 in xhci_setup_addressable_virt_dev(), as recommended in Section 6.2.3, to prevent potential issues with xHCI host controllers that enforce the spec strictly.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220033 Signed-off-by: Jay Chen shawn2000100@gmail.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20250717073107.488599-4-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit and code, here's my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real bug**: The commit addresses a specification compliance issue where setting `avg_trb_len = 0` for EP0 (control endpoint) violates the xHCI 1.2 specification. Section 6.2.3 explicitly states that Average TRB Length shall be greater than 0 and specifically recommends setting it to 8 for control endpoints.
2. **Small and contained fix**: The change is minimal - just adding a single line: ```c ep0_ctx->tx_info = cpu_to_le32(EP_AVG_TRB_LENGTH(8)); ``` This sets the average TRB length field to 8 as recommended by the spec.
3. **Prevents hardware failures**: The commit message indicates this causes actual enumeration failures with parameter errors (TRB Completion Code = 5) on hardware that strictly follows the xHCI specification. This means real devices can fail to enumerate without this fix.
4. **No architectural changes**: This is a simple initialization fix that doesn't change any architectural aspects of the driver. It only ensures proper initialization of a field that was previously left at 0.
5. **Low regression risk**: Setting avg_trb_len to 8 for control endpoints follows the xHCI specification recommendation. Existing hardware that doesn't strictly check this value will continue to work, while hardware that does enforce the spec will now work correctly.
6. **Clear bug with clear fix**: The contradiction between spec sections is well-documented in the commit message, and the fix directly addresses this by following the more specific recommendation for control endpoints.
The commit fixes a specification compliance bug that can cause real hardware failures during USB device enumeration. The fix is minimal, follows the xHCI specification, and has very low risk of causing regressions, making it an ideal candidate for stable backporting.
drivers/usb/host/xhci-mem.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index 610190bf62da..6dddd5414fe9 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1214,6 +1214,8 @@ int xhci_setup_addressable_virt_dev(struct xhci_hcd *xhci, struct usb_device *ud ep0_ctx->deq = cpu_to_le64(dev->eps[0].ring->first_seg->dma | dev->eps[0].ring->cycle_state);
+ ep0_ctx->tx_info = cpu_to_le32(EP_AVG_TRB_LENGTH(8)); + trace_xhci_setup_addressable_virt_device(dev);
/* Steps 7 and 8 were done in xhci_alloc_virt_device() */
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 4b9c60e440525b729ac5f071e00bcee12e0a7e84 ]
When a USB4 dock is unplugged from a system it won't respond to ring events. The PCI core handles the surprise removal event and notifies all PCI drivers. The XHCI PCI driver sets a flag that the device is being removed as well.
When that flag is set don't show messages in the cleanup path for marking the controller dead.
Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Acked-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20250717073107.488599-2-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Now let me analyze this commit properly:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a user-visible issue**: The commit addresses spurious error messages that appear when USB4 docks are unplugged. The original code shows `xhci_err(xhci, "xHCI host controller not responding, assume dead\n")` even during expected surprise removals, which can confuse users and fill system logs unnecessarily.
2. **Small and contained fix**: The change is minimal - it only adds a local `notify` variable and conditionalizes two operations based on whether the device is being removed: - The error message printing (line 1384 → lines 1379-1381) - The `usb_hc_died()` notification (reusing the same condition)
3. **No architectural changes**: This is a simple behavioral fix that doesn't modify any data structures, APIs, or core functionality. It merely suppresses error messages during an expected removal scenario.
4. **Clear bug fix**: The commit clearly fixes an issue where error messages are shown during normal USB4 dock removal operations. When `XHCI_STATE_REMOVING` is set (indicating PCI removal is in progress), the error message is now suppressed since it's an expected condition.
5. **Low risk of regression**: The change only affects logging behavior and maintains the same functional flow. The `usb_hc_died()` call was already conditional on `!XHCI_STATE_REMOVING`, so this commit just applies the same logic to the error message.
6. **Improves user experience**: USB4/Thunderbolt docks are increasingly common, and users frequently unplug them. Avoiding spurious error messages during normal operations is important for user experience and log clarity.
The commit follows stable tree rules by being a minimal fix for a real issue that affects users, without introducing new features or making risky changes to core functionality.
drivers/usb/host/xhci-ring.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 71b17a00d3ed..47326fb8b1fc 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -973,12 +973,15 @@ static void xhci_kill_endpoint_urbs(struct xhci_hcd *xhci, */ void xhci_hc_died(struct xhci_hcd *xhci) { + bool notify; int i, j;
if (xhci->xhc_state & XHCI_STATE_DYING) return;
- xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); + notify = !(xhci->xhc_state & XHCI_STATE_REMOVING); + if (notify) + xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); xhci->xhc_state |= XHCI_STATE_DYING;
xhci_cleanup_command_queue(xhci); @@ -992,7 +995,7 @@ void xhci_hc_died(struct xhci_hcd *xhci) }
/* inform usb core hc died if PCI remove isn't already handling it */ - if (!(xhci->xhc_state & XHCI_STATE_REMOVING)) + if (notify) usb_hc_died(xhci_to_hcd(xhci)); }
From: Lifeng Zheng zhenglifeng1@huawei.com
[ Upstream commit 0ae204405095abfbc2d694ee0fbb49bcbbe55c57 ]
Detect the result of starting old governor in cpufreq_set_policy(). If it fails, exit the governor and clear policy->governor.
Signed-off-by: Lifeng Zheng zhenglifeng1@huawei.com Link: https://patch.msgid.link/20250709104145.2348017-5-zhenglifeng1@huawei.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Bug Analysis
The commit fixes a **resource leak and inconsistent state bug** in the cpufreq governor switching code. Looking at the code changes in `cpufreq_set_policy()`:
### The Bug: In the original code (lines 2716-2721), when switching governors fails and we need to restore the old governor: ```c if (old_gov) { policy->governor = old_gov; if (cpufreq_init_governor(policy)) policy->governor = NULL; else cpufreq_start_governor(policy); // Bug: no error handling here } ```
If `cpufreq_start_governor()` fails, the code doesn't handle the error. This leaves the system in an **inconsistent state** where: 1. The governor is initialized (`cpufreq_init_governor` succeeded) 2. But the governor failed to start (`cpufreq_start_governor` failed) 3. The policy still points to a non-functional governor 4. Resources allocated during `cpufreq_init_governor` are **leaked** (module reference count, governor's init() allocations)
### The Fix: ```c if (cpufreq_init_governor(policy)) { policy->governor = NULL; } else if (cpufreq_start_governor(policy)) { cpufreq_exit_governor(policy); // NEW: Clean up on failure policy->governor = NULL; // NEW: Clear the governor pointer } ```
## Why This Should Be Backported:
1. **Fixes a Real Bug**: This addresses a resource leak where `cpufreq_init_governor()` acquires resources (notably `try_module_get()` at line 2442 and potential governor->init() allocations) that aren't cleaned up if `cpufreq_start_governor()` fails.
2. **Small and Contained Fix**: The change is minimal - just 4 lines modified to add proper error handling. It doesn't change any APIs or introduce new functionality.
3. **Prevents System Instability**: Leaving the cpufreq subsystem in an inconsistent state (initialized but not started governor) could lead to: - Module reference count leaks - Memory leaks from governor init allocations - Potential crashes if the partially-initialized governor is accessed later
4. **Error Path Fix**: This is clearly an error handling path that was incorrectly implemented. The pattern of calling `cpufreq_exit_governor()` after a failed `cpufreq_start_governor()` is already used elsewhere in the code (line 2711).
5. **No Architectural Changes**: The fix simply adds missing cleanup code in an error path. It doesn't change the normal operation flow or introduce new features.
6. **Critical Subsystem**: The cpufreq subsystem is critical for power management and system stability. Bugs here can affect system reliability.
The commit follows the stable kernel rules perfectly - it's a clear bug fix that prevents resource leaks and system instability, with minimal risk of regression since it only affects an error path that was already broken.
drivers/cpufreq/cpufreq.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2a2fea6743aa..0bb17bb615f5 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2482,10 +2482,12 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, pr_debug("starting governor %s failed\n", policy->governor->name); if (old_gov) { policy->governor = old_gov; - if (cpufreq_init_governor(policy)) + if (cpufreq_init_governor(policy)) { policy->governor = NULL; - else - cpufreq_start_governor(policy); + } else if (cpufreq_start_governor(policy)) { + cpufreq_exit_governor(policy); + policy->governor = NULL; + } }
return ret;
From: Alexander Kochetkov al.kochet@gmail.com
[ Upstream commit 7cdb433bb44cdc87dc5260cdf15bf03cc1cd1814 ]
In order to bring up secondary CPUs main CPU write trampoline code to SRAM. The trampoline code is written while secondary CPUs are powered on (at least that true for RK3188 CPU). Sometimes that leads to kernel hang. Probably because secondary CPU execute trampoline code while kernel doesn't expect.
The patch moves SRAM initialization step to the point where all secondary CPUs are powered down.
That fixes rarely hangs on RK3188: [ 0.091568] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000 [ 0.091996] rockchip_smp_prepare_cpus: ncores 4
Signed-off-by: Alexander Kochetkov al.kochet@gmail.com Link: https://lore.kernel.org/r/20250703140453.1273027-1-al.kochet@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit and the code context, here's my determination:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real bug affecting users**: The commit explicitly mentions that it "fixes rarely hangs on RK3188" during SMP initialization. This is a critical boot-time issue that prevents the system from functioning properly.
2. **Small and contained fix**: The change is minimal - it only moves the SRAM initialization code (`rockchip_smp_prepare_sram()`) to execute after all secondary CPUs are powered down. The code movement is straightforward: - Lines 281-285 (old location) are moved to lines 321-325 (new location) - The fix is contained within a single function (`rockchip_smp_prepare_cpus()`)
3. **Clear root cause**: The commit message explains the race condition clearly - secondary CPUs may execute trampoline code while it's being written to SRAM, causing unpredictable behavior and hangs.
4. **No architectural changes**: This is purely a timing/ordering fix that doesn't introduce new features or change the architecture of the code.
5. **Critical subsystem**: SMP initialization is a critical kernel subsystem. A hang during boot makes the system unusable, which is a severe issue for affected users.
6. **Platform-specific fix**: The change only affects Rockchip ARM platforms (specifically those with Cortex-A9 CPUs like RK3188), limiting the scope of potential regressions.
7. **Follows stable tree rules**: The fix addresses an important bug (boot hang), is small and self-contained, has minimal risk of regression, and doesn't introduce new features.
The key insight from the code analysis is that the original code had a clear race condition where: 1. Secondary CPUs were powered on (lines 287-288 in the original) 2. Trampoline code was written to SRAM (line 282 in the original) 3. Secondary CPUs could potentially execute incomplete/corrupted trampoline code
The fix ensures all secondary CPUs are powered down (lines 321-323) before writing the trampoline code to SRAM (lines 321-325 in the new version), eliminating the race condition.
arch/arm/mach-rockchip/platsmp.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c index 649e0a54784c..246a525d1d26 100644 --- a/arch/arm/mach-rockchip/platsmp.c +++ b/arch/arm/mach-rockchip/platsmp.c @@ -279,11 +279,6 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) }
if (read_cpuid_part() == ARM_CPU_PART_CORTEX_A9) { - if (rockchip_smp_prepare_sram(node)) { - of_node_put(node); - return; - } - /* enable the SCU power domain */ pmu_set_power_domain(PMU_PWRDN_SCU, true);
@@ -316,11 +311,19 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) asm ("mrc p15, 1, %0, c9, c0, 2\n" : "=r" (l2ctlr)); ncores = ((l2ctlr >> 24) & 0x3) + 1; } - of_node_put(node);
/* Make sure that all cores except the first are really off */ for (i = 1; i < ncores; i++) pmu_set_power_domain(0 + i, false); + + if (read_cpuid_part() == ARM_CPU_PART_CORTEX_A9) { + if (rockchip_smp_prepare_sram(node)) { + of_node_put(node); + return; + } + } + + of_node_put(node); }
static void __init rk3036_smp_prepare_cpus(unsigned int max_cpus)
From: Kuninori Morimoto kuninori.morimoto.gx@renesas.com
[ Upstream commit f40ecc2743652c0b0f19935f81baf57c601eb7f0 ]
ASoC has 2 functions to set bias level. (A) snd_soc_dapm_force_bias_level() (B) snd_soc_dapm_set_bias_level()
snd_soc_dapm_force_bias_level() (A) will set dapm->bias_level (a) if successed.
(A) int snd_soc_dapm_force_bias_level(...) { ... if (ret == 0) (a) dapm->bias_level = level; ... }
snd_soc_dapm_set_bias_level() (B) is also a function that sets bias_level. It will call snd_soc_dapm_force_bias_level() (A) inside, but doesn't set dapm->bias_level by itself. One note is that (A) might not be called.
(B) static int snd_soc_dapm_set_bias_level(...) { ... ret = snd_soc_card_set_bias_level(...); ... if (dapm != &card->dapm) (A) ret = snd_soc_dapm_force_bias_level(...); ... ret = snd_soc_card_set_bias_level_post(...); ... }
dapm->bias_level will be set if (A) was called, but might not be set if (B) was called, even though it calles set_bias_level() function.
We should set dapm->bias_level if we calls snd_soc_dapm_set_bias_level() (B), too.
Signed-off-by: Kuninori Morimoto kuninori.morimoto.gx@renesas.com Link: https://patch.msgid.link/87qzyn4g4h.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis, here's my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Bug Fix**: This is a clear bug fix that addresses an inconsistency in bias level management. The commit message clearly explains that `snd_soc_dapm_set_bias_level()` doesn't always update `dapm->bias_level`, particularly when dealing with card-level DAPM contexts where `snd_soc_dapm_force_bias_level()` is not called.
2. **Small and Contained**: The fix is minimal - it only adds 4 lines of code: ```c /* success */ if (ret == 0) snd_soc_dapm_init_bias_level(dapm, level); ``` This ensures that `dapm->bias_level` is always updated when the bias level change succeeds.
3. **Important Correctness Issue**: The inconsistency could lead to incorrect bias level tracking, which affects power management decisions in the ASoC subsystem. Code that checks `dapm->bias_level` (as shown in lines like `if (dapm->bias_level == SND_SOC_BIAS_ON)`) may make wrong decisions if the bias level isn't properly tracked.
4. **No Architectural Changes**: This is a straightforward fix that doesn't introduce new features or change the architecture. It simply ensures consistent behavior between the two bias level setting functions.
5. **Low Risk**: The change only affects the success path (when `ret == 0`) and uses an existing inline function `snd_soc_dapm_init_bias_level()` that simply sets `dapm->bias_level = level`. This is the same operation that `snd_soc_dapm_force_bias_level()` performs on success.
6. **Affects User-Facing Functionality**: Incorrect bias level tracking can lead to power management issues, potentially causing audio devices to not power up/down correctly, which directly impacts users.
The commit fixes a real bug where the DAPM bias level state could become out of sync with the actual hardware state, particularly for card-level DAPM contexts. This is exactly the type of bug fix that stable kernels should include - it's small, targeted, fixes a real issue, and has minimal risk of introducing regressions.
sound/soc/soc-dapm.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c index 9f764d92469e..6142ce468c3b 100644 --- a/sound/soc/soc-dapm.c +++ b/sound/soc/soc-dapm.c @@ -741,6 +741,10 @@ static int snd_soc_dapm_set_bias_level(struct snd_soc_dapm_context *dapm, out: trace_snd_soc_bias_level_done(card, level);
+ /* success */ + if (ret == 0) + snd_soc_dapm_init_bias_level(dapm, level); + return ret; }
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
[ Upstream commit a0b2a6bbff8c26aafdecd320f38f52c341d5cafa ]
regmap_update_bits() can fail, check its return value like we do elsewhere in the driver.
Link: https://lore.kernel.org/r/20250707-gpiochip-set-rv-gpio-round4-v1-2-35668aaa... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Bug Fix with Real Impact**: The commit fixes a bug where the return value of `regmap_update_bits()` in `tps65912_gpio_direction_output()` is not checked (line 54-55 in the original code). This could lead to silent failures where the GPIO initial value is not set correctly, but the function continues to configure the direction, potentially leaving the GPIO in an inconsistent state.
2. **Consistency with Driver Pattern**: The driver already checks return values of `regmap_update_bits()` in other functions: - `tps65912_gpio_direction_input()` (line 44) properly returns the result - The second `regmap_update_bits()` call in `tps65912_gpio_direction_output()` (line 57) also returns the result - `tps65912_gpio_get_direction()` checks the return value of `regmap_read()`
3. **Small and Contained Fix**: The patch is minimal - it only adds: - An `int ret` variable declaration - Captures the return value of the first `regmap_update_bits()` call - Adds an error check that returns early if the operation failed
4. **No Side Effects**: This change doesn't introduce any new functionality or alter the existing behavior when operations succeed. It only adds proper error handling that was missing.
5. **Hardware Communication Reliability**: The TPS65912 is a PMIC (Power Management IC) that communicates over I2C/SPI. Hardware communication can fail due to various reasons (bus errors, device issues), and not checking return values can lead to incorrect GPIO states which could affect system stability or connected peripherals.
6. **Follows Kernel Best Practices**: The kernel coding standards require checking return values of functions that can fail, especially for hardware operations. This fix brings the code in line with those standards.
The commit message clearly indicates this is a bug fix ("regmap_update_bits() can fail, check its return value like we do elsewhere in the driver"), making it a perfect candidate for stable backporting as it improves driver reliability without any risk of regression.
drivers/gpio/gpio-tps65912.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpio-tps65912.c b/drivers/gpio/gpio-tps65912.c index 3ad68bd78282..736af805e54a 100644 --- a/drivers/gpio/gpio-tps65912.c +++ b/drivers/gpio/gpio-tps65912.c @@ -49,10 +49,13 @@ static int tps65912_gpio_direction_output(struct gpio_chip *gc, unsigned offset, int value) { struct tps65912_gpio *gpio = gpiochip_get_data(gc); + int ret;
/* Set the initial value */ - regmap_update_bits(gpio->tps->regmap, TPS65912_GPIO1 + offset, - GPIO_SET_MASK, value ? GPIO_SET_MASK : 0); + ret = regmap_update_bits(gpio->tps->regmap, TPS65912_GPIO1 + offset, + GPIO_SET_MASK, value ? GPIO_SET_MASK : 0); + if (ret) + return ret;
return regmap_update_bits(gpio->tps->regmap, TPS65912_GPIO1 + offset, GPIO_CFG_MASK, GPIO_CFG_MASK);
From: Aaron Kling webgeek1234@gmail.com
[ Upstream commit 398e67e0f5ae04b29bcc9cbf342e339fe9d3f6f1 ]
Kasan crashes the kernel trying to check boundaries when using the normal memcpy.
Signed-off-by: Aaron Kling webgeek1234@gmail.com Link: https://lore.kernel.org/r/20250522-mach-tegra-kasan-v1-1-419041b8addb@gmail.... Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of this commit, here's my determination:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real crash/bug**: The commit fixes a KASAN (Kernel Address Sanitizer) crash that occurs when using regular `memcpy()` to write to IRAM (Internal RAM). This is a functional bug that causes kernel crashes when KASAN is enabled.
2. **Small and contained fix**: The change is minimal - it simply replaces `memcpy()` with `memcpy_toio()` on line 66 of arch/arm/mach- tegra/reset.c. This is a one-line change that doesn't alter any logic or introduce new features.
3. **Follows proper I/O memory access patterns**: The fix is technically correct. According to the kernel documentation in Documentation/driver-api/device-io.rst, I/O memory regions (which IRAM is mapped as via `IO_ADDRESS()`) should be accessed using I/O-specific functions like `memcpy_toio()` rather than regular `memcpy()`. The documentation explicitly states: "Do not use memset or memcpy on IO addresses; they are not guaranteed to copy data in order."
4. **No architectural changes**: The commit doesn't introduce any architectural changes or new functionality. It's purely a bug fix that corrects improper memory access.
5. **Minimal risk of regression**: Since this change only affects how data is copied to IRAM during the Tegra CPU reset handler initialization, and uses the proper kernel API for I/O memory access, the risk of introducing new issues is very low.
6. **Platform-specific but important**: While this fix is specific to ARM Tegra platforms, it fixes a crash that would affect any Tegra system running with KASAN enabled. This is important for developers and users who rely on KASAN for debugging.
The fact that `iram_base` is obtained through `IO_ADDRESS()` macro clearly indicates this is I/O mapped memory that requires I/O-specific accessors. KASAN correctly identified this misuse and crashed to prevent potential issues. The fix properly uses `memcpy_toio()` which is designed for copying to I/O memory regions and won't trigger KASAN checks for regular memory access.
arch/arm/mach-tegra/reset.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-tegra/reset.c b/arch/arm/mach-tegra/reset.c index 76a65df42d10..dce0cb1fdd6f 100644 --- a/arch/arm/mach-tegra/reset.c +++ b/arch/arm/mach-tegra/reset.c @@ -63,7 +63,7 @@ static void __init tegra_cpu_reset_handler_enable(void) BUG_ON(is_enabled); BUG_ON(tegra_cpu_reset_handler_size > TEGRA_IRAM_RESET_HANDLER_SIZE);
- memcpy(iram_base, (void *)__tegra_cpu_reset_handler_start, + memcpy_toio(iram_base, (void *)__tegra_cpu_reset_handler_start, tegra_cpu_reset_handler_size);
err = call_firmware_op(set_cpu_boot_addr, 0, reset_address);
From: "Masami Hiramatsu (Google)" mhiramat@kernel.org
[ Upstream commit a089bb2822a49b0c5777a8936f82c1f8629231fb ]
Since commit c5b6ababd21a ("locking/mutex: implement mutex_trylock_nested") makes mutex_trylock() as an inlined function if CONFIG_DEBUG_LOCK_ALLOC=y, we can not use mutex_trylock() for testing the glob filter of ftrace.
Use mutex_unlock instead.
Link: https://lore.kernel.org/r/175151680309.2149615.9795104805153538717.stgit@mhi... Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Acked-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Backport Status: YES
This commit should be backported to stable kernel trees for the following reasons:
1. **It fixes a test regression**: The commit c5b6ababd21a changed mutex_trylock() from being a regular function to an inline macro when CONFIG_DEBUG_LOCK_ALLOC=y. This broke the ftrace glob filter selftest that was trying to match functions with the pattern `mutex*try*`. Since mutex_trylock is no longer a regular function symbol in certain configurations, the test would fail.
2. **Simple and contained fix**: The change is minimal - it simply replaces the test pattern from `mutex*try*` to `mutex*unl*` on line 32. This is a one-line change that: - Changes the glob pattern from matching mutex_trylock functions to mutex_unlock functions - Maintains the same test logic and purpose (testing glob pattern matching) - mutex_unlock remains a regular function symbol regardless of CONFIG_DEBUG_LOCK_ALLOC
3. **No functional changes to kernel code**: This only affects a selftest, not any kernel functionality. The risk of regression is zero for normal kernel operation.
4. **Test reliability**: Without this fix, the ftrace selftest suite would fail on kernels built with CONFIG_DEBUG_LOCK_ALLOC=y after commit c5b6ababd21a is applied. This could: - Cause false test failures in CI/CD pipelines - Make it harder to detect real ftrace issues - Confuse developers running the test suite
5. **Clear dependency**: The commit message explicitly states this is needed "Since commit c5b6ababd21a" which indicates this is a direct fix for a known regression introduced by that specific commit.
The change is exactly the type that stable rules recommend: it fixes a clear bug (test regression), is minimal in scope, has no risk of breaking functionality, and maintains test coverage for an important kernel feature (ftrace glob filtering).
.../testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc index f4e92afab14b..9a7b7c62cbd9 100644 --- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-glob.tc @@ -33,7 +33,7 @@ ftrace_filter_check 'schedule*' '^schedule.*$' ftrace_filter_check '*pin*lock' '.*pin.*lock$'
# filter by start*mid* -ftrace_filter_check 'mutex*try*' '^mutex.*try.*' +ftrace_filter_check 'mutex*unl*' '^mutex.*unl.*'
# Advanced full-glob matching feature is recently supported. # Skip the tests if we are sure the kernel does not support it.
From: "Rafael J. Wysocki" rafael.j.wysocki@intel.com
[ Upstream commit 89d9cec3b1e9c49bae9375a2db6dc49bc7468af0 ]
Clear power.needs_force_resume in pm_runtime_reinit() in case it has been set by pm_runtime_force_suspend() invoked from a driver remove callback.
Suggested-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Link: https://patch.msgid.link/9495163.CDJkKcVGEf@rjwysocki.net Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is a clear bugfix that should be backported to stable kernel trees for the following reasons:
1. **It fixes a real bug**: The commit addresses a specific issue where `power.needs_force_resume` flag is not cleared in `pm_runtime_reinit()` when a device is removed. If a driver calls `pm_runtime_force_suspend()` from its remove callback (which sets `needs_force_resume = 1`), this flag remains set even after the device is removed and potentially re-probed.
2. **The fix is minimal and contained**: The change adds just 5 lines of code (including comments) to clear a single flag. The modification is: ```c dev->power.needs_force_resume = false; ``` This is a very low-risk change that only affects the specific condition being fixed.
3. **It prevents state leakage**: Looking at the code flow: - `pm_runtime_force_suspend()` sets `dev->power.needs_force_resume = 1` (line in runtime.c) - When a driver is removed, `pm_runtime_remove()` calls `pm_runtime_reinit()` - Without this fix, if the device is re-probed, it would still have `needs_force_resume = 1` from the previous instance - This could lead to incorrect PM runtime behavior where `pm_runtime_force_resume()` would incorrectly think it needs to resume a device that was never suspended in the current probe cycle
4. **Related to previous stable fixes**: The git history shows a previous commit `c745253e2a69` ("PM: runtime: Fix unpaired parent child_count for force_resume") was already marked for stable (4.16+), indicating that issues with the `needs_force_resume` flag have been problematic enough to warrant stable backports.
5. **Clear bug scenario**: The commit message describes a specific scenario where this happens - when `pm_runtime_force_suspend()` is called from a driver remove callback. This is a legitimate use case where drivers want to ensure devices are suspended before removal.
6. **No architectural changes**: This is purely a bugfix that clears a flag that should have been cleared during reinitialization. It doesn't introduce new features or change any APIs.
The fix ensures proper PM runtime state management across device removal and re-probing cycles, which is important for system stability and correct power management behavior.
drivers/base/power/runtime.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 7fa231076ad5..7d362b38f2e8 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1698,6 +1698,11 @@ void pm_runtime_reinit(struct device *dev) pm_runtime_put(dev->parent); } } + /* + * Clear power.needs_force_resume in case it has been set by + * pm_runtime_force_suspend() invoked from a driver remove callback. + */ + dev->power.needs_force_resume = false; }
/**
From: Hsin-Te Yuan yuanhsinte@chromium.org
[ Upstream commit 1a4aabc27e95674837f2e25f4ef340c0469e6203 ]
According to POSIX spec, EAGAIN returned by read with O_NONBLOCK set means the read would block. Hence, the common implementation in nonblocking model will poll the file when the nonblocking read returns EAGAIN. However, when the target file is thermal zone, this mechanism will totally malfunction because thermal zone doesn't implement sysfs notification and thus the poll will never return.
For example, the read in Golang implemnts such method and sometimes hangs at reading some thermal zones via sysfs.
Change to return -ENODATA instead of -EAGAIN to userspace.
Signed-off-by: Hsin-Te Yuan yuanhsinte@chromium.org Link: https://patch.msgid.link/20250620-temp-v3-1-6becc6aeb66c@chromium.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit, here is my determination:
**Backport Status: YES**
## Extensive Explanation:
This commit should be backported to stable kernel trees for the following reasons:
### 1. **Bug Fix for Real User-Facing Issue** The commit fixes a bug where userspace applications can hang when reading thermal zone temperature files via sysfs with O_NONBLOCK set. The commit message clearly states that "the read in Golang implements such method and sometimes hangs at reading some thermal zones via sysfs." This is a real-world problem affecting actual applications.
### 2. **POSIX Compliance Issue** The current behavior violates POSIX semantics. According to POSIX specifications, EAGAIN from a non-blocking read means "the read would block" - implying that the caller should poll/select and wait for data to become available. However, thermal zone sysfs files don't implement poll notifications, so applications following standard POSIX patterns will poll forever, causing hangs.
### 3. **Small and Contained Change** The code change is minimal and localized: ```c - if (ret) - return ret; + if (!ret) + return sprintf(buf, "%d\n", temperature);
- return sprintf(buf, "%d\n", temperature); + if (ret == -EAGAIN) + return -ENODATA; + + return ret; ``` It only affects the `temp_show()` function in `thermal_sysfs.c` by translating -EAGAIN to -ENODATA specifically for sysfs reads.
### 4. **Low Risk of Regression** - The change only affects error handling paths - It doesn't modify the normal success case - It only translates one specific error code (-EAGAIN) to another (-ENODATA) - ENODATA is a more appropriate error for "no data available" in a sysfs context
### 5. **Multiple Drivers Return -EAGAIN** My search found at least 13 thermal drivers that can return -EAGAIN from their get_temp operations: - imx8mm_thermal.c - imx_thermal.c - tegra-bpmp-thermal.c - qoriq_thermal.c - lvts_thermal.c - rockchip_thermal.c - exynos_tmu.c - sun8i_thermal.c - stm_thermal.c - intel_powerclamp.c
This indicates the issue affects multiple platforms and thermal drivers.
### 6. **Follows Stable Kernel Rules** - Fixes a real bug that bothers people (application hangs) - Fix is already in Linus's tree (based on the Signed-off-by from Rafael J. Wysocki) - Small change (< 100 lines) - Obviously correct and tested - Doesn't change APIs or break existing functionality
### 7. **No Architectural Changes** The commit doesn't introduce new features or change the thermal subsystem architecture. It's purely a bug fix that makes the sysfs interface behave correctly with non-blocking reads.
The fix is appropriate because sysfs files are not meant to be pollable in the traditional sense - they provide instantaneous data snapshots. Returning ENODATA instead of EAGAIN properly communicates "no data currently available" without implying that polling would help.
drivers/thermal/thermal_sysfs.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c index 17b2361bc8f2..fc768b61f483 100644 --- a/drivers/thermal/thermal_sysfs.c +++ b/drivers/thermal/thermal_sysfs.c @@ -39,10 +39,13 @@ temp_show(struct device *dev, struct device_attribute *attr, char *buf)
ret = thermal_zone_get_temp(tz, &temperature);
- if (ret) - return ret; + if (!ret) + return sprintf(buf, "%d\n", temperature);
- return sprintf(buf, "%d\n", temperature); + if (ret == -EAGAIN) + return -ENODATA; + + return ret; }
static ssize_t
From: tuhaowen tuhaowen@uniontech.com
[ Upstream commit 4266e8fa56d3d982bf451d382a410b9db432015c ]
When the computer enters sleep status without a monitor connected, the system switches the console to the virtual terminal tty63(SUSPEND_CONSOLE).
If a monitor is subsequently connected before waking up, the system skips the required VT restoration process during wake-up, leaving the console on tty63 instead of switching back to tty1.
To fix this issue, a global flag vt_switch_done is introduced to record whether the system has successfully switched to the suspend console via vt_move_to_console() during suspend.
If the switch was completed, vt_switch_done is set to 1. Later during resume, this flag is checked to ensure that the original console is restored properly by calling vt_move_to_console(orig_fgconsole, 0).
This prevents scenarios where the resume logic skips console restoration due to incorrect detection of the console state, especially when a monitor is reconnected before waking up.
Signed-off-by: tuhaowen tuhaowen@uniontech.com Link: https://patch.msgid.link/20250611032345.29962-1-tuhaowen@uniontech.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
**1. Bug Fix Nature:** The commit fixes a clear user-visible bug - a black screen issue that occurs when: - A computer enters sleep without a monitor connected - A monitor is connected while the system is asleep - Upon wake, the console remains on tty63 (SUSPEND_CONSOLE) instead of returning to tty1
This is a functional regression that impacts user experience, making the system appear unresponsive or broken after resume.
**2. Small and Contained Fix:** The fix is minimal and well-contained: - Adds a single `static bool vt_switch_done` flag - Modifies only two functions: `pm_prepare_console()` and `pm_restore_console()` - Total change is just 6 lines of code (excluding whitespace) - The logic is straightforward: track whether VT switch happened during suspend, and ensure restoration happens during resume
**3. Low Risk of Side Effects:** The change has minimal risk: - The new flag (`vt_switch_done`) is only set when `vt_move_to_console()` succeeds - The restoration logic preserves existing behavior when `pm_vt_switch()` returns true - The fix only adds an additional condition `!vt_switch_done` to ensure restoration happens even when `pm_vt_switch()` returns false during resume - No architectural changes or new features are introduced
**4. Critical Subsystem but Safe Change:** While this touches the PM/console subsystem (critical for suspend/resume), the change is safe: - It doesn't modify the core VT switching mechanism - It only ensures the existing restoration path is executed when needed - The file has been stable since 2013 (last major change was f43f627d2f17e)
**5. Clear Problem Description:** The commit message clearly describes: - The specific scenario that triggers the bug - The root cause (skipped VT restoration) - The solution (tracking VT switch state)
**6. Follows Stable Rules:** This aligns perfectly with stable tree criteria: - Fixes a real bug affecting users - Minimal change (6 lines) - No new features - Clear fix for a specific problem - Low regression risk
The commit addresses a real-world scenario where users connecting monitors to sleeping systems experience black screens upon wake - a frustrating issue that makes systems appear broken. The fix is surgical, adding only the minimum tracking needed to ensure proper console restoration.
kernel/power/console.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/power/console.c b/kernel/power/console.c index fcdf0e14a47d..19c48aa5355d 100644 --- a/kernel/power/console.c +++ b/kernel/power/console.c @@ -16,6 +16,7 @@ #define SUSPEND_CONSOLE (MAX_NR_CONSOLES-1)
static int orig_fgconsole, orig_kmsg; +static bool vt_switch_done;
static DEFINE_MUTEX(vt_switch_mutex);
@@ -136,17 +137,21 @@ void pm_prepare_console(void) if (orig_fgconsole < 0) return;
+ vt_switch_done = true; + orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE); return; }
void pm_restore_console(void) { - if (!pm_vt_switch()) + if (!pm_vt_switch() && !vt_switch_done) return;
if (orig_fgconsole >= 0) { vt_move_to_console(orig_fgconsole, 0); vt_kmsg_redirect(orig_kmsg); } + + vt_switch_done = false; }
From: Sebastian Ott sebott@redhat.com
[ Upstream commit 13edf7539211d8f7d0068ce3ed143005f1da3547 ]
Initialization of the local acpi_object in acpi_processor_get_info() only sets the first 4 bytes to zero and is thus incomplete. This is indicated by messages like: acpi ACPI0007:be: Invalid PBLK length [166288104]
Fix this by initializing all 16 bytes of the processor member of that union.
Signed-off-by: Sebastian Ott sebott@redhat.com Link: https://patch.msgid.link/20250703124215.12522-1-sebott@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis, here is my determination:
**Backport Status: YES**
## Extensive Explanation:
This commit fixes a critical initialization bug in the ACPI processor driver that causes incorrect error messages and potentially incorrect behavior. Here's why this is a good candidate for stable backport:
### 1. **Clear Bug Fix** The commit fixes an incomplete initialization issue where `union acpi_object object = { 0 };` only initializes the first 4 bytes (the size of `acpi_object_type type`), leaving the remaining bytes of the union uninitialized. This results in garbage values in fields like `object.processor.pblk_length`, causing spurious error messages like "Invalid PBLK length [166288104]".
### 2. **User-Visible Impact** The bug causes confusing error messages in system logs during ACPI processor initialization. The error message shown in the commit (`acpi ACPI0007:be: Invalid PBLK length [166288104]`) indicates that uninitialized memory is being read and interpreted as a length value.
### 3. **Minimal and Contained Fix** The fix is extremely simple - changing: ```c union acpi_object object = { 0 }; ``` to: ```c union acpi_object object = { .processor = { 0 } }; ```
This ensures proper initialization of all 16 bytes (as mentioned in the commit message) of the processor member of the union, which is 24 bytes in total based on my analysis.
### 4. **No Architectural Changes** This is a pure bug fix with no feature additions or architectural changes. It simply corrects the initialization to prevent reading uninitialized memory.
### 5. **Low Risk of Regression** The change is minimal and only affects initialization. It doesn't alter any logic or behavior beyond ensuring that the union is properly zeroed out. The designated initializer syntax is standard C and well-supported.
### 6. **Critical Subsystem** While ACPI is a critical subsystem, this fix is confined to the initialization of a local variable in `acpi_processor_get_info()`. The scope is limited and the fix is conservative.
### 7. **Follows Stable Rules** This commit perfectly follows the stable kernel rules: - Fixes a real bug (uninitialized memory access) - Small change (single line) - Obviously correct - Already tested (merged into mainline) - Fixes an issue that users see (error messages in logs)
The uninitialized memory access could potentially lead to more serious issues beyond just incorrect error messages, as the code makes decisions based on the `pblk_length` value (checking if it equals 6). This makes the fix even more important for stable kernels.
drivers/acpi/acpi_processor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index c8338d627857..9cbf0593da05 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -228,7 +228,7 @@ static inline int acpi_processor_hotadd_init(struct acpi_processor *pr)
static int acpi_processor_get_info(struct acpi_device *device) { - union acpi_object object = { 0 }; + union acpi_object object = { .processor = { 0 } }; struct acpi_buffer buffer = { sizeof(union acpi_object), &object }; struct acpi_processor *pr = acpi_driver_data(device); int device_declaration = 0;
From: Breno Leitao leitao@debian.org
[ Upstream commit 4734c8b46b901cff2feda8b82abc710b65dc31c1 ]
When a GHES (Generic Hardware Error Source) triggers a panic, add the TAINT_MACHINE_CHECK taint flag to the kernel. This explicitly marks the kernel as tainted due to a machine check event, improving diagnostics and post-mortem analysis. The taint is set with LOCKDEP_STILL_OK to indicate lockdep remains valid.
At large scale deployment, this helps to quickly determine panics that are coming due to hardware failures.
Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Tony Luck tony.luck@intel.com Link: https://patch.msgid.link/20250702-add_tain-v1-1-9187b10914b9@debian.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of this commit, here is my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real diagnostic issue**: The commit adds the TAINT_MACHINE_CHECK flag when GHES (Generic Hardware Error Source) triggers a panic. This is important for post-mortem analysis at scale, as explicitly stated in the commit message: "At large scale deployment, this helps to quickly determine panics that are coming due to hardware failures."
2. **Small and contained change**: The patch adds only a single line of code (`add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);`) in the `__ghes_panic()` function at drivers/acpi/apei/ghes.c:1091. This meets the stable kernel rule of being under 100 lines.
3. **Obviously correct**: The change follows established kernel patterns. Looking at the grep results, other machine check handlers already use this same pattern: - arch/x86/kernel/cpu/mce/core.c:1640 - arch/powerpc/kernel/mce.c:332 - arch/x86/kernel/cpu/mce/p5.c:40 - arch/x86/kernel/cpu/mce/winchip.c:24
4. **No architectural changes**: This is purely a diagnostic improvement that adds taint information without changing any functionality or behavior of the GHES panic path.
5. **Minimal risk**: The change uses LOCKDEP_STILL_OK flag, indicating that lockdep remains valid after the taint, which is the safer option compared to LOCKDEP_NOW_UNRELIABLE used in some other machine check paths.
6. **Real benefit for users**: For organizations running Linux at scale, being able to quickly identify hardware-related panics through the taint flag provides significant operational value for triaging issues.
The commit meets all the stable kernel criteria: it's small, obviously correct, fixes a real diagnostic limitation that affects users (especially at scale), and has been reviewed by Tony Luck who is a recognized maintainer in the RAS (Reliability, Availability, and Serviceability) subsystem.
drivers/acpi/apei/ghes.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 50bed5a70812..d02d7cbab387 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -709,6 +709,8 @@ static void __ghes_panic(struct ghes *ghes,
__ghes_print_estatus(KERN_EMERG, ghes->generic, estatus);
+ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); + ghes_clear_estatus(ghes, estatus, buf_paddr, fixmap_idx);
if (!panic_timeout)
From: Ulf Hansson ulf.hansson@linaro.org
[ Upstream commit 47a255f7d2eabee06cfbf5b1c2379749442fd01d ]
In the error path of sd_set_power_mode() we don't update host->power_mode, which could lead to an imbalance of the runtime PM usage count. Fix this by always updating host->power_mode.
Reviewed-by: Avri Altman avri.altman@sandisk.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Acked-by: Ricky Wu ricky_wu@realtek.com Link: https://lore.kernel.org/r/20250610111633.504366-2-ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Bug Analysis
The commit fixes a **runtime PM reference counting bug** in the `sd_set_power_mode()` function. The issue is in the error handling path:
### Original Code Problem: ```c if (power_mode == MMC_POWER_OFF) { err = sd_power_off(host); pm_runtime_put_noidle(sdmmc_dev(host)); } else { pm_runtime_get_noresume(sdmmc_dev(host)); err = sd_power_on(host); }
if (!err) host->power_mode = power_mode; // Only updated on success ```
### The Bug: 1. When transitioning to `MMC_POWER_OFF`, the code calls `pm_runtime_put_noidle()` **regardless** of whether `sd_power_off()` succeeded 2. When transitioning to `MMC_POWER_ON`, the code calls `pm_runtime_get_noresume()` **before** calling `sd_power_on()` 3. However, `host->power_mode` is only updated if there's no error
This creates a **runtime PM reference count imbalance** because: - If `sd_power_off()` fails, the runtime PM reference is still decremented, but `host->power_mode` remains at its previous value (likely `MMC_POWER_ON`) - If `sd_power_on()` fails, the runtime PM reference is still incremented, but `host->power_mode` remains at its previous value (likely `MMC_POWER_OFF`) - The early return check `if (power_mode == host->power_mode)` at the beginning of the function will prevent future attempts to fix the power state, leaving the runtime PM count permanently imbalanced
## Why This Is Suitable for Stable:
1. **Fixes a real bug**: Runtime PM reference count imbalances can lead to: - Device staying powered when it should be suspended - Device being suspended when it should be active - Power management failures - Potential system hangs or crashes
2. **Minimal change**: The fix is a simple one-liner that always updates `host->power_mode` regardless of error status
3. **Low regression risk**: - The change doesn't introduce new functionality - It doesn't alter the control flow - It simply ensures state consistency
4. **Clear bug fix**: This is not a feature addition or optimization - it's fixing incorrect error handling that violates runtime PM rules
5. **Affects users**: This bug can manifest in real-world scenarios when SD card power operations fail, particularly with flaky hardware or during system stress
6. **Subsystem contained**: The fix is limited to a single MMC host driver and doesn't affect core kernel infrastructure
The commit follows stable kernel rules by being a minimal, targeted fix for a real bug that can affect system stability and power management.
drivers/mmc/host/rtsx_usb_sdmmc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/mmc/host/rtsx_usb_sdmmc.c b/drivers/mmc/host/rtsx_usb_sdmmc.c index 3261560bede4..581bec5c4caf 100644 --- a/drivers/mmc/host/rtsx_usb_sdmmc.c +++ b/drivers/mmc/host/rtsx_usb_sdmmc.c @@ -1038,9 +1038,7 @@ static int sd_set_power_mode(struct rtsx_usb_sdmmc *host, err = sd_power_on(host); }
- if (!err) - host->power_mode = power_mode; - + host->power_mode = power_mode; return err; }
From: Mark Brown broonie@kernel.org
[ Upstream commit c4ca928a6db1593802cd945f075a7e21dd0430c1 ]
We currently log parse failures for ELD data and some disconnection events as errors without rate limiting. These log messages can be triggered very frequently in some situations, especially ELD parsing when there is nothing connected to a HDMI port which will generate:
hdmi-audio-codec hdmi-audio-codec.1.auto: HDMI: Unknown ELD version 0
While there's doubtless work that could be done on reducing the number of connection notification callbacks it's possible these may be legitimately generated by poor quality physical connections so let's use rate limiting to mitigate the log spam for the parse errors and lower the severity for disconnect logging to debug level.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://patch.msgid.link/20250613-asoc-hdmi-eld-logging-v1-1-76d64154d969@ke... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is a good candidate for backporting to stable kernel trees for the following reasons:
1. **Fixes a real user-facing bug**: The commit addresses log spam issues that can fill system logs with repeated error messages. When no HDMI display is connected, the driver continuously logs "HDMI: Unknown ELD version 0" errors, which can consume disk space and make logs difficult to analyze.
2. **Small and contained fix**: The changes are minimal and focused: - Changes `dev_err()` to `dev_err_ratelimited()` in two locations within `hdac_hdmi_parse_eld()` (lines 1233-1234 and 1242-1243) - Changes `dev_err()` to `dev_dbg()` for disconnect messages in `hdac_hdmi_present_sense()` (lines 1302-1303) - Total of just 3 logging statement modifications
3. **No functional changes**: The commit only modifies logging behavior without changing any driver logic, state management, or hardware interaction. This minimizes regression risk.
4. **Affects a common use case**: The issue occurs when HDMI ports are unconnected, which is a normal operating condition for many systems. Users may have HDMI ports they never use, or may frequently plug/unplug monitors.
5. **No architectural changes**: The fix doesn't introduce new features or modify the driver's architecture. It simply rate-limits existing error messages and downgrades one message's severity.
6. **Clear problem and solution**: The commit message clearly describes the issue (log spam from ELD parsing failures) and the straightforward solution (rate limiting and severity adjustment).
The specific code changes show: - In `hdac_hdmi_parse_eld()`: Rate-limiting prevents flooding logs when ELD data is invalid/unavailable - In `hdac_hdmi_present_sense()`: Downgrading disconnect messages from error to debug level is appropriate since disconnection is a normal event, not an error condition
This meets the stable kernel criteria of being a simple fix for an annoying bug that affects users without introducing new risks.
sound/soc/codecs/hdac_hdmi.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/sound/soc/codecs/hdac_hdmi.c b/sound/soc/codecs/hdac_hdmi.c index 78d5b4d31bb6..71d13b413418 100644 --- a/sound/soc/codecs/hdac_hdmi.c +++ b/sound/soc/codecs/hdac_hdmi.c @@ -1231,7 +1231,8 @@ static int hdac_hdmi_parse_eld(struct hdac_device *hdev, >> DRM_ELD_VER_SHIFT;
if (ver != ELD_VER_CEA_861D && ver != ELD_VER_PARTIAL) { - dev_err(&hdev->dev, "HDMI: Unknown ELD version %d\n", ver); + dev_err_ratelimited(&hdev->dev, + "HDMI: Unknown ELD version %d\n", ver); return -EINVAL; }
@@ -1239,7 +1240,8 @@ static int hdac_hdmi_parse_eld(struct hdac_device *hdev, DRM_ELD_MNL_MASK) >> DRM_ELD_MNL_SHIFT;
if (mnl > ELD_MAX_MNL) { - dev_err(&hdev->dev, "HDMI: MNL Invalid %d\n", mnl); + dev_err_ratelimited(&hdev->dev, + "HDMI: MNL Invalid %d\n", mnl); return -EINVAL; }
@@ -1293,8 +1295,8 @@ static void hdac_hdmi_present_sense(struct hdac_hdmi_pin *pin,
if (!port->eld.monitor_present || !port->eld.eld_valid) {
- dev_err(&hdev->dev, "%s: disconnect for pin:port %d:%d\n", - __func__, pin->nid, port->id); + dev_dbg(&hdev->dev, "%s: disconnect for pin:port %d:%d\n", + __func__, pin->nid, port->id);
/* * PCMs are not registered during device probe, so don't
From: Alok Tiwari alok.a.tiwari@oracle.com
[ Upstream commit 87aafc8580acf87fcaf1a7e30ed858d8c8d37d81 ]
code mistakenly used a hardcoded index (codec[1]) instead of iterating, over the codec array using the loop variable i. Use codec[i] instead of codec[1] to match the loop iteration.
Signed-off-by: Alok Tiwari alok.a.tiwari@oracle.com Link: https://patch.msgid.link/20250621185233.4081094-1-alok.a.tiwari@oracle.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees. Here's the extensive explanation:
## Bug Analysis
The bug is a clear coding error in the ICH4 codec configuration code within the ALSA intel8x0 driver. Looking at the code change:
```c - tmp |= chip->ac97_sdin[pcm->r[0].codec[1]->num] << ICH_DI2L_SHIFT; + tmp |= chip->ac97_sdin[pcm->r[0].codec[i]->num] << ICH_DI2L_SHIFT; ```
The issue is in a loop that iterates from `i = 1` to `i < 4` (lines 2250-2255), checking each codec slot. However, the original code hardcoded `codec[1]` instead of using the loop variable `codec[i]`. This means:
1. **The loop was pointless** - it would always use codec[1] regardless of which iteration found a valid codec 2. **Incorrect codec configuration** - If codec[2] or codec[3] were the valid codec (and codec[1] was NULL), the code would still try to access codec[1], potentially causing: - Null pointer dereference if codec[1] is NULL - Wrong codec configuration if codec[1] exists but isn't the intended one
## Impact Assessment
This bug affects: - **Hardware**: Intel ICH4 (Intel I/O Controller Hub 4) chipsets only - **Functionality**: AC'97 codec configuration for multiple SDIN (Serial Data In) paths - **When triggered**: When using ICH4 with multiple codecs where the second valid codec is not in slot 1
## Backport Suitability
This commit meets all criteria for stable backporting:
1. **Fixes a real bug**: Clear array indexing error that can cause incorrect hardware configuration or crashes 2. **Small and contained**: Single line change, exactly 1 insertion and 1 deletion 3. **No side effects**: The fix simply corrects the loop to work as originally intended 4. **No architectural changes**: Pure bug fix with no API/ABI changes 5. **Critical subsystem**: ALSA sound drivers are important for user experience 6. **Long-standing bug**: The code has been incorrect since the initial git import (2005), making it important to fix in all maintained kernels 7. **Clear fix**: The correction is obvious - using the loop variable instead of a hardcoded index
The bug has existed since the kernel's initial git import in 2005 (commit 1da177e4c3f4), meaning it affects all kernel versions and should be backported to all stable trees that still receive updates.
sound/pci/intel8x0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 5150e8d38975..87a23a4660e8 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -2285,7 +2285,7 @@ static int snd_intel8x0_mixer(struct intel8x0 *chip, int ac97_clock, tmp |= chip->ac97_sdin[0] << ICH_DI1L_SHIFT; for (i = 1; i < 4; i++) { if (pcm->r[0].codec[i]) { - tmp |= chip->ac97_sdin[pcm->r[0].codec[1]->num] << ICH_DI2L_SHIFT; + tmp |= chip->ac97_sdin[pcm->r[0].codec[i]->num] << ICH_DI2L_SHIFT; break; } }
From: Oliver Neukum oneukum@suse.com
[ Upstream commit 503bbde34cc3dd2acd231f277ba70c3f9ed22e59 ]
Checking for the endpoint type is no reason for a WARN, as that can cause a reboot. A driver not checking the endpoint type must not cause a reboot, as there is just no point in this. We cannot prevent a device from doing something incorrect as a reaction to a transfer. Hence warning for a mere assumption being wrong is not sensible.
Signed-off-by: Oliver Neukum oneukum@suse.com Acked-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/20250612122149.2559724-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of this commit, here is my determination:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real issue that affects users**: The commit addresses a problem where a `dev_WARN()` can cause system reboots when `panic_on_warn` is enabled. This is a serious issue because a driver bug (not checking endpoint types properly) should not be able to crash the entire system.
2. **Small and contained fix**: The change is minimal - it only replaces `dev_WARN()` with `dev_warn_once()` on line 503 of drivers/usb/core/urb.c. This is a one-line change that doesn't affect any other functionality.
3. **Clear side effects**: The only behavioral change is that: - The warning will no longer trigger a kernel panic when `panic_on_warn` is set - The warning will only be printed once instead of potentially multiple times - No backtrace will be generated
4. **No architectural changes**: This is a simple logging level change that doesn't modify any USB subsystem architecture or functionality.
5. **Affects critical kernel subsystem**: While USB is a critical subsystem, this change actually makes it more stable by preventing potential system crashes.
6. **Follows stable tree rules**: This is clearly a bugfix that improves system stability. The commit message explicitly states that "A driver not checking the endpoint type must not cause a reboot" - this is a stability improvement that prevents denial-of-service scenarios.
7. **Similar fixes in the kernel**: There's precedent for this type of fix, as shown by commit 281cb9d65a95 ("bnxt_en: Make PTP timestamp HWRM more silent") which made a similar conversion from `netdev_WARN_ONCE()` to `netdev_warn_once()` for the same reason.
The key insight from the code is that `dev_WARN()` calls `WARN()` which can trigger a kernel panic if `panic_on_warn` is set. This means a malicious or buggy USB device could potentially crash the system just by triggering this warning. Converting to `dev_warn_once()` maintains the diagnostic value while removing the crash risk.
drivers/usb/core/urb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c index 850d0fffe1c6..e60f4ef06e3d 100644 --- a/drivers/usb/core/urb.c +++ b/drivers/usb/core/urb.c @@ -490,7 +490,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
/* Check that the pipe's type matches the endpoint's type */ if (usb_pipe_type_check(urb->dev, urb->pipe)) - dev_WARN(&dev->dev, "BOGUS urb xfer, pipe %x != type %x\n", + dev_warn_once(&dev->dev, "BOGUS urb xfer, pipe %x != type %x\n", usb_pipetype(urb->pipe), pipetypes[xfertype]);
/* Check against a simple/standard policy */
From: Yongbo Zhang giraffesnn123@gmail.com
[ Upstream commit 1c2d81bded1993bb2c7125a911db63612cdc8d40 ]
When the gpio irqchip connected to a slow bus(e.g., i2c bus or virtio bus), calling disable_irq_nosync() in top-half ISR handler will trigger the following kernel BUG:
BUG: scheduling while atomic: RenderEngine/253/0x00010002 ... Call trace: dump_backtrace+0x0/0x1c8 show_stack+0x1c/0x2c dump_stack_lvl+0xdc/0x12c dump_stack+0x1c/0x64 __schedule_bug+0x64/0x80 schedule_debug+0x98/0x118 __schedule+0x68/0x704 schedule+0xa0/0xe8 schedule_timeout+0x38/0x124 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c _virtio_gpio_req+0xf8/0x198 virtio_gpio_irq_bus_sync_unlock+0x94/0xf0 __irq_put_desc_unlock+0x50/0x54 disable_irq_nosync+0x64/0x94 fusb302_irq_intn+0x24/0x84 __handle_irq_event_percpu+0x84/0x278 handle_irq_event+0x64/0x14c handle_level_irq+0x134/0x1d4 generic_handle_domain_irq+0x40/0x68 virtio_gpio_event_vq+0xb0/0x130 vring_interrupt+0x7c/0x90 vm_interrupt+0x88/0xd8 __handle_irq_event_percpu+0x84/0x278 handle_irq_event+0x64/0x14c handle_fasteoi_irq+0x110/0x210 __handle_domain_irq+0x80/0xd0 gic_handle_irq+0x78/0x154 el0_irq_naked+0x60/0x6c
This patch replaces request_irq() with devm_request_threaded_irq() to avoid the use of disable_irq_nosync().
Signed-off-by: Yongbo Zhang giraffesnn123@gmail.com Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20250526043433.673097-1-giraffesnn123@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Bug Analysis
The commit fixes a critical "scheduling while atomic" bug that occurs when the fusb302 USB Type-C controller uses GPIO interrupts connected through slow buses (like I2C or virtio). The bug manifests as a kernel BUG with the following call trace showing `__schedule_bug` being triggered from within an atomic context.
## Technical Details of the Bug
1. **Root Cause**: The fusb302 driver's interrupt handler (`fusb302_irq_intn`) calls `disable_irq_nosync()` from the top-half ISR context (lines 1480-1481 in the original code). When the GPIO controller is connected via a slow bus like virtio-gpio, the `disable_irq_nosync()` operation requires bus transactions that can sleep, which is forbidden in atomic/interrupt context.
2. **The Problem Flow**: - Hardware interrupt occurs → `fusb302_irq_intn` ISR runs in atomic context - ISR calls `disable_irq_nosync(chip->gpio_int_n_irq)` - virtio-gpio's `virtio_gpio_irq_bus_sync_unlock()` needs to communicate over virtio bus - This requires `wait_for_completion()` which can sleep - Sleeping in atomic context triggers the kernel BUG
## The Fix
The fix replaces `request_irq()` with `devm_request_threaded_irq()` and removes the problematic `disable_irq_nosync()`/`enable_irq()` pattern:
1. **Before**: Used regular IRQ handler that disabled the interrupt in top-half, scheduled work, then re-enabled in bottom-half 2. **After**: Uses threaded IRQ handler which naturally handles the interrupt masking/unmasking without explicit calls
Key changes: - Line 1751-1754: Changed from `request_irq()` to `devm_request_threaded_irq()` with NULL top-half handler - Line 1481: Removed `disable_irq_nosync()` call from ISR - Line 1625: Removed corresponding `enable_irq()` call from work function - Line 1750: Removed `free_irq()` call (handled by devm_*)
## Stable Backport Criteria Evaluation
1. **Fixes a bug affecting users**: YES - This fixes a kernel BUG that causes system instability 2. **Relatively small and contained**: YES - Only 7 lines changed, confined to one driver file 3. **Clear side effects**: NO - The fix is straightforward with no architectural changes 4. **Major architectural changes**: NO - Simply changes IRQ registration method 5. **Critical kernel subsystems**: NO - Limited to USB Type-C controller driver 6. **Stable tree mention**: NO - But follows stable tree rules perfectly 7. **Follows stable rules**: YES - Important bugfix with minimal risk
## Additional Considerations
- The bug only manifests with specific hardware configurations (virtio- gpio or slow I2C GPIO expanders) - The fix is a standard pattern used throughout the kernel for similar issues - Using `devm_request_threaded_irq()` is actually cleaner and provides automatic cleanup - No functional changes to the driver's behavior, only fixes the atomic scheduling violation - Similar fixes have been applied to other drivers facing the same issue pattern
This is an ideal stable backport candidate: it fixes a real bug that causes kernel panics, the fix is minimal and well-understood, and there's virtually no risk of regression.
drivers/usb/typec/tcpm/fusb302.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/typec/tcpm/fusb302.c b/drivers/usb/typec/tcpm/fusb302.c index 5e661bae3997..c1200d0412b7 100644 --- a/drivers/usb/typec/tcpm/fusb302.c +++ b/drivers/usb/typec/tcpm/fusb302.c @@ -1470,9 +1470,6 @@ static irqreturn_t fusb302_irq_intn(int irq, void *dev_id) struct fusb302_chip *chip = dev_id; unsigned long flags;
- /* Disable our level triggered IRQ until our irq_work has cleared it */ - disable_irq_nosync(chip->gpio_int_n_irq); - spin_lock_irqsave(&chip->irq_lock, flags); if (chip->irq_suspended) chip->irq_while_suspended = true; @@ -1615,7 +1612,6 @@ static void fusb302_irq_work(struct work_struct *work) } done: mutex_unlock(&chip->lock); - enable_irq(chip->gpio_int_n_irq); }
static int init_gpio(struct fusb302_chip *chip) @@ -1755,9 +1751,10 @@ static int fusb302_probe(struct i2c_client *client, goto destroy_workqueue; }
- ret = request_irq(chip->gpio_int_n_irq, fusb302_irq_intn, - IRQF_ONESHOT | IRQF_TRIGGER_LOW, - "fsc_interrupt_int_n", chip); + ret = devm_request_threaded_irq(dev, chip->gpio_int_n_irq, + NULL, fusb302_irq_intn, + IRQF_ONESHOT | IRQF_TRIGGER_LOW, + "fsc_interrupt_int_n", chip); if (ret < 0) { dev_err(dev, "cannot request IRQ for GPIO Int_N, ret=%d", ret); goto tcpm_unregister_port; @@ -1782,7 +1779,6 @@ static int fusb302_remove(struct i2c_client *client) struct fusb302_chip *chip = i2c_get_clientdata(client);
disable_irq_wake(chip->gpio_int_n_irq); - free_irq(chip->gpio_int_n_irq, chip); cancel_work_sync(&chip->irq_work); cancel_delayed_work_sync(&chip->bc_lvl_handler); tcpm_unregister_port(chip->tcpm_port);
From: "Gautham R. Shenoy" gautham.shenoy@amd.com
[ Upstream commit cda7ac8ce7de84cf32a3871ba5f318aa3b79381e ]
In the function mperf_start(), mperf_monitor snapshots the time, tsc and finally the aperf,mperf MSRs. However, this order of snapshotting in is reversed in mperf_stop(). As a result, the C0 residency (which is computed as delta_mperf * 100 / delta_tsc) is under-reported on CPUs that is 100% busy.
Fix this by snapshotting time, tsc and then aperf,mperf in mperf_stop() in the same order as in mperf_start().
Link: https://lore.kernel.org/r/20250612122355.19629-2-gautham.shenoy@amd.com Signed-off-by: Gautham R. Shenoy gautham.shenoy@amd.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Clear bug fix with user impact**: The commit fixes a calculation bug in the cpupower tool where C0 residency (processor active time) is under-reported on 100% busy CPUs. This directly affects users monitoring CPU performance and power states.
2. **Simple and contained fix**: The change is minimal - it only reorders three function calls in `mperf_stop()` to match the order in `mperf_start()`: - Before: `mperf_measure_stats()` → `mperf_get_tsc()` → `clock_gettime()` - After: `clock_gettime()` → `mperf_get_tsc()` → `mperf_measure_stats()`
3. **Low risk of regression**: The fix simply ensures consistent snapshot ordering between start and stop measurements. This is a logical correction that aligns the measurement methodology without introducing new functionality.
4. **Clear root cause**: The bug occurs because the reversed snapshot order causes timing skew in the C0 residency calculation (`delta_mperf * 100 / delta_tsc`). When snapshots are taken in different orders at start vs stop, the time windows don't align properly, leading to incorrect calculations.
5. **Affects monitoring tool accuracy**: While this is a userspace tool bug rather than a kernel bug, cpupower is an important diagnostic tool shipped with the kernel. Incorrect C0 residency reporting could mislead system administrators and developers analyzing CPU performance.
6. **No architectural changes**: The fix doesn't change any data structures, APIs, or fundamental behavior - it simply corrects the measurement sequence.
The commit meets stable tree criteria as it fixes a real bug affecting users, is minimal in scope, and has very low risk of introducing new issues.
tools/power/cpupower/utils/idle_monitor/mperf_monitor.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c index 0e8c0b879d4b..09c606760464 100644 --- a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c +++ b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c @@ -201,9 +201,9 @@ static int mperf_stop(void) int cpu;
for (cpu = 0; cpu < cpu_count; cpu++) { - mperf_measure_stats(cpu); - mperf_get_tsc(&tsc_at_measure_end[cpu]); clock_gettime(CLOCK_REALTIME, &time_end[cpu]); + mperf_get_tsc(&tsc_at_measure_end[cpu]); + mperf_measure_stats(cpu); }
return 0;
From: Kees Cook kees@kernel.org
[ Upstream commit 6418a8504187dc7f5b6f9d0649c03e362cb0664b ]
When KCOV is enabled all functions get instrumented, unless the __no_sanitize_coverage attribute is used. To prepare for __no_sanitize_coverage being applied to __init functions[1], we have to handle differences in how GCC's inline optimizations get resolved. For thinkpad_acpi routines, this means forcing two functions to be inline with __always_inline.
Link: https://lore.kernel.org/lkml/20250523043935.2009972-11-kees@kernel.org/ [1] Signed-off-by: Kees Cook kees@kernel.org Link: https://lore.kernel.org/r/20250529181831.work.439-kees@kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Detailed Analysis
### Nature of the Change The commit modifies two functions in `drivers/platform/x86/lenovo/thinkpad_acpi.c`: - `tpacpi_is_lenovo()`: Changed from `static inline` to `static __always_inline` - `tpacpi_is_ibm()`: Changed from `static inline` to `static __always_inline`
Both functions are marked with `__init` and are simple one-line functions that check the vendor ID.
### Root Cause This is a build fix addressing a subtle interaction between: 1. KCOV kernel coverage instrumentation 2. GCC's inline optimization heuristics 3. Linux kernel's __init section handling
When KCOV is enabled, all functions get instrumented unless marked with `__no_sanitize_coverage`. The commit message indicates this is preparation for applying `__no_sanitize_coverage` to `__init` functions. The problem occurs because:
1. With KCOV instrumentation, GCC may inline these functions into `__init` callers 2. Without KCOV instrumentation (when `__no_sanitize_coverage` is applied), GCC's heuristics change and it may decide NOT to inline them 3. This creates a section mismatch where `__init` code calls non-`__init` functions, causing build warnings/errors
### Why This Qualifies for Stable Backport
1. **Fixes a Real Bug**: This addresses legitimate build failures when `CONFIG_KCOV=y` is enabled, which affects: - Kernel developers doing coverage testing - CI/CD systems running kernel tests - Distribution builders enabling KCOV for testing
2. **Minimal Risk**: The change is extremely conservative: - Only changes inline hints from `inline` to `__always_inline` - No functional changes whatsoever - Affects only two simple getter functions - Cannot introduce runtime regressions
3. **Small and Contained**: The patch touches only 2 lines in a single file, making it easy to review and backport
4. **Part of Broader Fix**: This is part of a kernel-wide effort to fix KCOV-related build issues, with similar fixes across multiple architectures and subsystems
5. **Build Infrastructure**: Stable kernels need to maintain buildability with various configurations, including KCOV-enabled builds for testing
### Specific Code Impact Looking at the changed functions: ```c -static inline bool __pure __init tpacpi_is_lenovo(void) +static __always_inline bool __pure __init tpacpi_is_lenovo(void) { return thinkpad_id.vendor == PCI_VENDOR_ID_LENOVO; }
-static inline bool __pure __init tpacpi_is_ibm(void) +static __always_inline bool __pure __init tpacpi_is_ibm(void) { return thinkpad_id.vendor == PCI_VENDOR_ID_IBM; } ```
These are trivial getter functions that absolutely should be inlined. Using `__always_inline` ensures consistent behavior regardless of KCOV configuration, preventing section mismatch warnings.
### Conclusion This is a textbook example of a stable-appropriate fix: it solves a real build problem, has zero functional impact, is minimal in scope, and has essentially no risk of causing regressions. It should be backported to stable kernels that support KCOV (4.6+) to maintain build compatibility with coverage testing configurations.
drivers/platform/x86/thinkpad_acpi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 9eb74d9e1519..e480fff7142a 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -514,12 +514,12 @@ static unsigned long __init tpacpi_check_quirks( return 0; }
-static inline bool __pure __init tpacpi_is_lenovo(void) +static __always_inline bool __pure __init tpacpi_is_lenovo(void) { return thinkpad_id.vendor == PCI_VENDOR_ID_LENOVO; }
-static inline bool __pure __init tpacpi_is_ibm(void) +static __always_inline bool __pure __init tpacpi_is_ibm(void) { return thinkpad_id.vendor == PCI_VENDOR_ID_IBM; }
From: Lucy Thrun lucy.thrun@digital-rabbithole.de
[ Upstream commit a409c60111e6bb98fcabab2aeaa069daa9434ca0 ]
The 'sprintf' call in 'add_tuning_control' may exceed the 44-byte buffer if either string argument is too long. This triggers a compiler warning. Replaced 'sprintf' with 'snprintf' to limit string lengths to prevent overflow.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202506100642.95jpuMY1-lkp@intel.com/ Signed-off-by: Lucy Thrun lucy.thrun@digital-rabbithole.de Link: https://patch.msgid.link/20250610175012.918-3-lucy.thrun@digital-rabbithole.... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Fixes a real security vulnerability**: The commit fixes a buffer overflow vulnerability in the `add_tuning_control` function. The sprintf call could overflow the 44-byte `namestr` buffer (defined as `char namestr[SNDRV_CTL_ELEM_ID_NAME_MAXLEN]` where `SNDRV_CTL_ELEM_ID_NAME_MAXLEN` is 44).
2. **Long-standing bug**: The vulnerable code was introduced in 2012 (commit 44f0c9782cc6a), meaning this buffer overflow has existed in the kernel for over a decade, affecting many stable kernel versions.
3. **Simple and contained fix**: The fix is minimal - it simply replaces `sprintf` with `snprintf` on a single line: ```c - sprintf(namestr, "%s %s Volume", name, dirstr[dir]); + snprintf(namestr, sizeof(namestr), "%s %s Volume", name, dirstr[dir]); ```
4. **Low risk of regression**: The change is straightforward and only adds bounds checking. It doesn't change any logic or behavior when strings fit within the buffer.
5. **Detected by automated testing**: The issue was caught by the kernel test robot, indicating it's a real compiler warning that should be addressed.
6. **Potential for exploitation**: While the tuning control names like "Wedge Angle", "SVM Level", and "EQ Band0-9" are relatively short, combined with "Playback" or "Capture" and " Volume", they could theoretically overflow the 44-byte buffer. For example: "EQ Band9" (8) + " " (1) + "Playback" (8) + " Volume" (7) = 24 bytes, which is safe, but the vulnerability exists if longer names were added in the future.
7. **Affects user-facing functionality**: This is in the ALSA HD Audio driver for Creative CA0132 sound cards, which are used by end users, making it important to fix in stable kernels.
The fix follows stable kernel rules perfectly: it's a minimal change that fixes an important bug without introducing new features or architectural changes.
sound/pci/hda/patch_ca0132.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c index 40f50571ad63..dee38dd872b5 100644 --- a/sound/pci/hda/patch_ca0132.c +++ b/sound/pci/hda/patch_ca0132.c @@ -4014,7 +4014,7 @@ static int add_tuning_control(struct hda_codec *codec, } knew.private_value = HDA_COMPOSE_AMP_VAL(nid, 1, 0, type); - sprintf(namestr, "%s %s Volume", name, dirstr[dir]); + snprintf(namestr, sizeof(namestr), "%s %s Volume", name, dirstr[dir]); return snd_hda_ctl_add(codec, nid, snd_ctl_new1(&knew, codec)); }
From: Cristian Ciocaltea cristian.ciocaltea@collabora.com
[ Upstream commit fd3ab72e42e9871a9902b945a2bf8bb87b49c718 ]
Fix all macro related issues identified by checkpatch.pl:
CHECK: Macro argument 'x' may be better as '(x)' to avoid precedence issues
Signed-off-by: Cristian Ciocaltea cristian.ciocaltea@collabora.com Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://patch.msgid.link/20250526-dualsense-alsa-jack-v1-3-1a821463b632@coll... Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees. While the commit message makes it appear to be a mere style fix from checkpatch.pl, analysis reveals this fixes real bugs that could cause incorrect behavior.
**Critical Analysis:**
1. **Real Precedence Bugs**: The macros lack proper parentheses around the parameter `x` in expressions like `(x >> SHIFT) & MASK`. Without parentheses, if a complex expression is passed as the argument, operator precedence can cause incorrect evaluation.
2. **Concrete Example of Bug**: The most concerning usage is at line 2357 in the RME driver: ```c freq <<= SND_RME_CLK_FREQMUL(status1); ``` If `status1` were ever replaced with an expression like `status1 | 0x100000`, the macro would evaluate incorrectly: - Without fix: `(status1 | 0x100000 >> 18) & 0x7` evaluates as `(status1 | (0x100000 >> 18)) & 0x7` - With fix: `((status1 | 0x100000) >> 18) & 0x7` evaluates correctly
3. **Long-standing Issue**: These problematic macros were introduced in commit d39f1d68fe1d ("ALSA: usb-audio: Implement UAC2 jack detection") from October 2018, meaning this bug has existed for approximately 7 years.
4. **Hardware Impact**: These macros are used to extract clock configuration and synchronization status from RME Class Compliant USB audio devices. Incorrect calculations could lead to: - Wrong frequency calculations - Incorrect clock source detection - Misreported synchronization status
5. **Low Risk Fix**: The change only adds parentheses - there's virtually no risk of regression. The fix is minimal, obvious, and correct.
6. **Stable Criteria Met**: This satisfies stable kernel requirements: - Fixes a real bug (not just style) - Small and contained change - Cannot introduce new issues - Affects hardware functionality - Has been tested (merged upstream)
The commit should be backported to all stable trees containing the original problematic code from 2018 onwards.
sound/usb/mixer_quirks.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/sound/usb/mixer_quirks.c b/sound/usb/mixer_quirks.c index e752b6bb03ec..04267d4e3ecf 100644 --- a/sound/usb/mixer_quirks.c +++ b/sound/usb/mixer_quirks.c @@ -1990,15 +1990,15 @@ static int dell_dock_mixer_init(struct usb_mixer_interface *mixer) #define SND_RME_CLK_FREQMUL_SHIFT 18 #define SND_RME_CLK_FREQMUL_MASK 0x7 #define SND_RME_CLK_SYSTEM(x) \ - ((x >> SND_RME_CLK_SYSTEM_SHIFT) & SND_RME_CLK_SYSTEM_MASK) + (((x) >> SND_RME_CLK_SYSTEM_SHIFT) & SND_RME_CLK_SYSTEM_MASK) #define SND_RME_CLK_AES(x) \ - ((x >> SND_RME_CLK_AES_SHIFT) & SND_RME_CLK_AES_SPDIF_MASK) + (((x) >> SND_RME_CLK_AES_SHIFT) & SND_RME_CLK_AES_SPDIF_MASK) #define SND_RME_CLK_SPDIF(x) \ - ((x >> SND_RME_CLK_SPDIF_SHIFT) & SND_RME_CLK_AES_SPDIF_MASK) + (((x) >> SND_RME_CLK_SPDIF_SHIFT) & SND_RME_CLK_AES_SPDIF_MASK) #define SND_RME_CLK_SYNC(x) \ - ((x >> SND_RME_CLK_SYNC_SHIFT) & SND_RME_CLK_SYNC_MASK) + (((x) >> SND_RME_CLK_SYNC_SHIFT) & SND_RME_CLK_SYNC_MASK) #define SND_RME_CLK_FREQMUL(x) \ - ((x >> SND_RME_CLK_FREQMUL_SHIFT) & SND_RME_CLK_FREQMUL_MASK) + (((x) >> SND_RME_CLK_FREQMUL_SHIFT) & SND_RME_CLK_FREQMUL_MASK) #define SND_RME_CLK_AES_LOCK 0x1 #define SND_RME_CLK_AES_SYNC 0x4 #define SND_RME_CLK_SPDIF_LOCK 0x2 @@ -2007,9 +2007,9 @@ static int dell_dock_mixer_init(struct usb_mixer_interface *mixer) #define SND_RME_SPDIF_FORMAT_SHIFT 5 #define SND_RME_BINARY_MASK 0x1 #define SND_RME_SPDIF_IF(x) \ - ((x >> SND_RME_SPDIF_IF_SHIFT) & SND_RME_BINARY_MASK) + (((x) >> SND_RME_SPDIF_IF_SHIFT) & SND_RME_BINARY_MASK) #define SND_RME_SPDIF_FORMAT(x) \ - ((x >> SND_RME_SPDIF_FORMAT_SHIFT) & SND_RME_BINARY_MASK) + (((x) >> SND_RME_SPDIF_FORMAT_SHIFT) & SND_RME_BINARY_MASK)
static const u32 snd_rme_rate_table[] = { 32000, 44100, 48000, 50000,
From: Xinxin Wan xinxin.wan@intel.com
[ Upstream commit 19f971057b2d7b99c80530ec1052b45de236a8da ]
To be more resilient to codec-detection failures when the hardware powers on slowly, add retry mechanism to the device verification check. Similar pattern is found throughout a number of Realtek codecs. Our tests show that 60ms delay is sufficient to address readiness issues on rt5640 chip.
Reviewed-by: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Signed-off-by: Xinxin Wan xinxin.wan@intel.com Signed-off-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://patch.msgid.link/20250530142120.2944095-3-cezary.rojewski@intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my detailed analysis:
## Analysis of the Commit
### 1. Bug Fix Nature This commit fixes a real-world hardware detection issue where the RT5640 codec device ID verification fails on systems where the hardware powers on slowly. The failure results in the driver returning `-ENODEV` and the audio codec not being initialized, causing complete audio failure.
### 2. Small and Contained Change The fix is minimal and highly contained: - Only adds 3 lines of code - The change is isolated to the probe function's device ID verification logic - No architectural changes or new features introduced - The retry logic is simple: if initial ID read doesn't match, wait 60-100ms and retry once
### 3. Clear Purpose with No Major Side Effects The code change: ```c regmap_read(rt5640->regmap, RT5640_VENDOR_ID2, &val); +if (val != RT5640_DEVICE_ID) { + usleep_range(60000, 100000); + regmap_read(rt5640->regmap, RT5640_VENDOR_ID2, &val); +} ``` - Only adds a single retry with delay if the first read fails - The delay (60-100ms) is reasonable for probe-time initialization - No impact on normal operation when hardware is ready
### 4. Follows Established Pattern The commit message mentions "Similar pattern is found throughout a number of Realtek codecs," and the probe function already contains a 400ms delay after enabling LDO1 (line 3004), showing that power-on delays are expected for this hardware.
### 5. Minimal Risk of Regression - The retry only occurs when device ID verification would have failed anyway - In the worst case, it adds a 60-100ms delay to failed probe attempts - Systems where the codec is ready immediately continue to work without delay - The pattern is conservative: only one retry attempt
### 6. Important User Impact Without this fix, affected systems experience: - Complete audio failure on boot - No audio codec detection - Requires manual intervention or multiple reboots to get audio working
### 7. Stable Tree Suitability This fix meets all stable kernel criteria: - Fixes a real bug affecting users - Minimal change with clear purpose - No new features or APIs - Low risk of introducing new issues - Self-contained within a single driver
The commit represents a classic hardware quirk fix that improves reliability without adding complexity, making it an ideal candidate for stable backport.
sound/soc/codecs/rt5640.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/sound/soc/codecs/rt5640.c b/sound/soc/codecs/rt5640.c index 3bc63fbcb188..e290a83c20af 100644 --- a/sound/soc/codecs/rt5640.c +++ b/sound/soc/codecs/rt5640.c @@ -2810,6 +2810,11 @@ static int rt5640_i2c_probe(struct i2c_client *i2c, }
regmap_read(rt5640->regmap, RT5640_VENDOR_ID2, &val); + if (val != RT5640_DEVICE_ID) { + usleep_range(60000, 100000); + regmap_read(rt5640->regmap, RT5640_VENDOR_ID2, &val); + } + if (val != RT5640_DEVICE_ID) { dev_err(&i2c->dev, "Device with ID register %#x is not rt5640/39\n", val);
linux-stable-mirror@lists.linaro.org