From: "Rafael J. Wysocki" rafael.j.wysocki@intel.com
[ Upstream commit de1675de39aa945bad5937d1fde4df3682670639 ]
Revert commit 234f71555019 ("ACPI: battery: negate current when discharging") breaks not one but several userspace implementations of battery monitoring: Steam and MangoHud. Perhaps it breaks more, but those are the two that have been tested.
Reported-by: Matthew Schwartz matthew.schwartz@linux.dev Closes: https://lore.kernel.org/linux-acpi/87C1B2AF-D430-4568-B620-14B941A8ABA4@linu... Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES**, this commit should be backported to stable kernel trees.
## Extensive Explanation:
### 1. **Critical Userspace Regression Fix** This revert fixes a critical regression that breaks userspace applications. The commit message explicitly states that the original change "breaks not one but several userspace implementations of battery monitoring: Steam and MangoHud." This is a clear violation of the kernel's "no userspace regressions" rule, which is one of the most fundamental principles in kernel development.
### 2. **Code Analysis Shows ABI Change** Looking at the code changes: ```c // The problematic code that was added (and is now being removed): if (psp == POWER_SUPPLY_PROP_CURRENT_NOW && (battery->state & ACPI_BATTERY_STATE_DISCHARGING) && acpi_battery_handle_discharging(battery) == POWER_SUPPLY_STATUS_DISCHARGING) val->intval = -val->intval; ```
This code was changing the sign of the current value when discharging, which altered the userspace-visible ABI. Even if the original behavior didn't match documentation, changing it broke existing applications that depended on the long-standing behavior.
### 3. **Simple and Safe Revert** The commit is a clean revert that: - Removes the problematic sign negation logic - Restores the original behavior that userspace applications expect - Is self-contained to the `acpi_battery_get_property()` function - Has minimal risk of introducing new issues
### 4. **Meets Stable Kernel Criteria** According to stable kernel rules, this commit qualifies because it: - **Fixes a real bug** that users are hitting (broken battery monitoring in Steam and MangoHud) - **Is obviously correct** (it's a revert of a change that broke userspace) - **Is small and contained** (only affects the ACPI battery property reporting) - **Has been tested** (the reporter confirmed the issue) - **Fixes a regression** (restores previous working behavior)
### 5. **Comparison to Similar Commits** Looking at the historical commits marked with "Backport Status: YES", they share similar characteristics: - Fix actual problems affecting users - Are relatively small and contained changes - Don't introduce new features or architectural changes - Focus on fixing specific issues
This revert fits perfectly into this pattern as it's fixing a specific userspace compatibility issue without introducing any new functionality.
### 6. **Timing Considerations** The original problematic commit (234f71555019) was from May 8, 2025, and this revert is from July 3, 2025. If the original commit made it into any stable releases, those stable trees would need this revert to prevent userspace breakage.
### Conclusion: This is a textbook example of a commit that should be backported to stable kernels. It fixes a userspace regression, is minimal in scope, carries very low risk, and restores previously working behavior that applications depend on. The fact that it's a revert of a problematic change makes it even more suitable for stable backporting.
drivers/acpi/battery.c | 19 +++---------------- 1 file changed, 3 insertions(+), 16 deletions(-)
diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c index 2f188a734a0c5..5a4e022662417 100644 --- a/drivers/acpi/battery.c +++ b/drivers/acpi/battery.c @@ -241,23 +241,10 @@ static int acpi_battery_get_property(struct power_supply *psy, break; case POWER_SUPPLY_PROP_CURRENT_NOW: case POWER_SUPPLY_PROP_POWER_NOW: - if (battery->rate_now == ACPI_BATTERY_VALUE_UNKNOWN) { + if (battery->rate_now == ACPI_BATTERY_VALUE_UNKNOWN) ret = -ENODEV; - break; - } - - val->intval = battery->rate_now * 1000; - /* - * When discharging, the current should be reported as a - * negative number as per the power supply class interface - * definition. - */ - if (psp == POWER_SUPPLY_PROP_CURRENT_NOW && - (battery->state & ACPI_BATTERY_STATE_DISCHARGING) && - acpi_battery_handle_discharging(battery) - == POWER_SUPPLY_STATUS_DISCHARGING) - val->intval = -val->intval; - + else + val->intval = battery->rate_now * 1000; break; case POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN: case POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN:
From: Alessandro Carminati acarmina@redhat.com
[ Upstream commit ca46946a482238b0cdea459fb82fc837fb36260e ]
Failing to reset coupling_desc.n_coupled after freeing coupled_rdevs can lead to NULL pointer dereference when regulators are accessed post-unbind.
This can happen during runtime PM or other regulator operations that rely on coupling metadata.
For example, on ridesx4, unbinding the 'reg-dummy' platform device triggers a panic in regulator_lock_recursive() due to stale coupling state.
Ensure n_coupled is set to 0 to prevent access to invalid pointers.
Signed-off-by: Alessandro Carminati acarmina@redhat.com Link: https://patch.msgid.link/20250626083809.314842-1-acarmina@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
Now I understand the bug. After unbind, the regulator_dev structure may still exist and be accessed (e.g., during runtime PM operations). If n_coupled is not reset to 0, code like regulator_lock_recursive() will try to iterate through n_coupled entries in the coupled_rdevs array, but that array has been freed and set to NULL, causing a NULL pointer dereference.
**YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **It fixes a real NULL pointer dereference bug**: The commit addresses a crash that occurs when regulators are accessed after unbind. The issue is in `regulator_lock_recursive()` at line 326-327 where it iterates through `n_coupled` entries in the `coupled_rdevs` array: ```c for (i = 0; i < rdev->coupling_desc.n_coupled; i++) { c_rdev = rdev->coupling_desc.coupled_rdevs[i]; ``` If `n_coupled > 0` but `coupled_rdevs` has been freed (set to NULL), this causes a NULL pointer dereference.
2. **The fix is minimal and safe**: The patch adds just one line: ```c rdev->coupling_desc.n_coupled = 0; ``` This ensures that after freeing the coupling data, the count is also reset, preventing any code from trying to access the freed array.
3. **It affects a critical subsystem**: The regulator framework is essential for power management, and crashes in this subsystem can cause system instability or complete failure.
4. **The bug can be triggered during normal operations**: The commit message mentions this happens during runtime PM or other regulator operations, which are common scenarios, not edge cases.
5. **Similar to other backported fixes**: Looking at the historical commits, we see that similar coupling-related fixes have been backported: - "regulator: core: Release coupled_rdevs on regulator_init_coupling() error" (backported) - "regulator: da9063: fix null pointer deref with partial DT config" (backported) These precedents show that NULL pointer fixes in the regulator subsystem are considered important for stable trees.
6. **Clear reproducer**: The commit mentions a specific platform (ridesx4) where unbinding the 'reg-dummy' platform device triggers the panic, indicating this is a reproducible issue.
The fix follows the stable kernel rules: it's a small, contained fix for an important bug with minimal risk of regression.
drivers/regulator/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 29c9171e923a2..7e6ff7e72784b 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -5423,6 +5423,7 @@ static void regulator_remove_coupling(struct regulator_dev *rdev) ERR_PTR(err)); }
+ rdev->coupling_desc.n_coupled = 0; kfree(rdev->coupling_desc.coupled_rdevs); rdev->coupling_desc.coupled_rdevs = NULL; }
From: Maor Gottlieb maorg@nvidia.com
[ Upstream commit 333e4d79316c9ed5877d7aac8b8ed22efc74e96d ]
The GID cache warning messages can flood the kernel log when there are multiple failed attempts to add GIDs. This can happen when creating many virtual interfaces without having enough space for their GIDs in the GID table.
Change pr_warn to pr_warn_ratelimited to prevent log flooding while still maintaining visibility of the issue.
Link: https://patch.msgid.link/r/fd45ed4a1078e743f498b234c3ae816610ba1b18.17500623... Signed-off-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Key Reasons for Backporting:
### 1. **Fixes a Real-World Bug** The commit addresses a concrete problem where warning messages can flood the kernel log. This is not a theoretical issue - it occurs when users create many virtual interfaces without sufficient GID table space. This is a common scenario in production environments with RDMA/InfiniBand deployments.
### 2. **Small and Contained Change** The change is minimal - only replacing `pr_warn()` with `pr_warn_ratelimited()` on lines 585-586. This is exactly 2 lines of functional change with no architectural modifications or new features added.
### 3. **Low Risk of Regression** - The change only affects logging behavior, not functional behavior - `pr_warn_ratelimited()` is a well-established kernel API that behaves identically to `pr_warn()` except for rate limiting - No data structures are modified - No locking changes - No protocol or API changes
### 4. **Follows Stable Kernel Rules** According to stable kernel rules, this fix qualifies because: - It fixes a bug that bothers people (kernel log flooding) - It's a trivial change (2 lines) - It has been tested in mainline - It doesn't add new features
### 5. **Similar to Historical Backports** Looking at the similar commits provided: - Similar Commit #1 (backported): Fixed a hardware limitation issue with clear bounds - Similar Commit #2 (NOT backported): Added rate limiting to ipoib_warn but was more invasive with macro changes - This commit is more like #1 - a targeted fix for a specific issue
### 6. **Production Impact** Log flooding can have serious production impacts: - Fills up `/var/log` partitions - Makes it difficult to find other important kernel messages - Can impact system performance due to excessive logging - Can trigger log rotation issues
### 7. **Context-Specific Analysis** The error path where this warning occurs (line 571: `ret = -ENOSPC`) specifically handles the case when the GID table is full. In environments with many virtual interfaces, this condition can be hit repeatedly in quick succession, causing the flood described in the commit message.
The commit message clearly states this is fixing an observed problem: "This can happen when creating many virtual interfaces without having enough space for their GIDs in the GID table."
This is a perfect example of a stable-appropriate fix: minimal change, fixes a real bug, no risk of functional regression, and improves system stability in production environments.
drivers/infiniband/core/cache.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 873988e5c5280..0023aad0e7e43 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -582,8 +582,8 @@ static int __ib_cache_gid_add(struct ib_device *ib_dev, u32 port, out_unlock: mutex_unlock(&table->lock); if (ret) - pr_warn("%s: unable to add gid %pI6 error=%d\n", - __func__, gid->raw, ret); + pr_warn_ratelimited("%s: unable to add gid %pI6 error=%d\n", + __func__, gid->raw, ret); return ret; }
linux-stable-mirror@lists.linaro.org