From: "Chia-Lin Kao (AceLan)" acelan.kao@canonical.com
[ Upstream commit b169e1733cadb614e87f69d7a5ae1b186c50d313 ]
Dell Pro Rugged 10/12 tablets has a reliable VGBS method. If VGBS is not called on boot, the on-screen keyboard won't appear if the device is booted without a keyboard.
Call VGBS on boot on thess devices to get the initial state of SW_TABLET_MODE in a reliable way.
Signed-off-by: Chia-Lin Kao (AceLan) acelan.kao@canonical.com Reviewed-by: Hans de Goede johannes.goede@oss.qualcomm.com Link: https://patch.msgid.link/20251127070407.656463-1-acelan.kao@canonical.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Summary Analysis
### What This Commit Does This commit adds two Dell tablet models (Dell Pro Rugged 10 Tablet RA00260 and Dell Pro Rugged 12 Tablet RA02260) to the `dmi_vgbs_allow_list` DMI quirk table. When a device matches this list, the driver calls the VGBS ACPI method at probe time to properly detect the initial SW_TABLET_MODE state.
### The Bug Being Fixed Without this quirk, on these Dell tablets: - The VGBS method isn't called at boot - The kernel doesn't know the device is in tablet mode - The on-screen keyboard won't appear if the device is booted without a physical keyboard attached - This makes the device difficult/impossible to use in tablet-only mode
### Classification: Hardware Quirk/Workaround This falls squarely into the **QUIRKS and WORKAROUNDS** category which is explicitly allowed for stable backporting: - It's a DMI-based allowlist entry for specific hardware models - The pattern is identical to existing entries (HP Spectre, Microsoft Surface Go, HP Dragonfly G2) - The mechanism has been stable since Linux 5.10
### Risk Assessment **Risk Level: EXTREMELY LOW** - **Lines changed**: ~12 lines (just two DMI match entries) - **Files touched**: 1 file - **Scope**: Only affects Dell Pro Rugged 10/12 tablets - **Cannot regress other hardware**: DMI matching is device-specific - **Pattern proven**: Same structure as existing entries that have worked for years
### Stability Indicators - **Reviewed-by:** Hans de Goede (well-known x86 platform maintainer) - **Reviewed-by:** Ilpo Järvinen (Intel platform maintainer) - The `dmi_vgbs_allow_list` infrastructure has existed since v5.10-rc1 (commit 537b0dd4729e7)
### Stable Tree Criteria Assessment | Criterion | Status | |-----------|--------| | Obviously correct | ✅ Yes - trivial data addition | | Fixes real bug | ✅ Yes - on-screen keyboard not working | | Small and contained | ✅ Yes - ~12 lines in 1 file | | No new features | ✅ Yes - enables existing functionality | | Applies cleanly | ✅ Yes - simple addition to allowlist |
### Concerns 1. **No explicit "Cc: stable" tag** - However, DMI quirk additions are commonly appropriate for stable even without explicit tags 2. **No "Fixes:" tag** - This is adding new device support rather than fixing a regression in existing code
### Verdict This commit is appropriate for stable backporting because: 1. It's a minimal, surgical hardware quirk that only affects two specific Dell tablet models 2. It fixes a real usability issue for affected users (tablet becomes difficult to use without keyboard) 3. Zero risk of regression for any other hardware 4. The pattern exactly matches existing stable entries 5. The infrastructure has been stable since v5.10, available in all current LTS kernels 6. Well-reviewed by experienced platform maintainers
The lack of explicit stable tags is not disqualifying - this is a textbook example of a hardware quirk addition that should go to stable trees to enable proper device functionality.
**YES**
drivers/platform/x86/intel/hid.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/platform/x86/intel/hid.c b/drivers/platform/x86/intel/hid.c index 9c07a7faf18fe..560cc063198e1 100644 --- a/drivers/platform/x86/intel/hid.c +++ b/drivers/platform/x86/intel/hid.c @@ -177,6 +177,18 @@ static const struct dmi_system_id dmi_vgbs_allow_list[] = { DMI_MATCH(DMI_PRODUCT_NAME, "HP Elite Dragonfly G2 Notebook PC"), }, }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Dell Pro Rugged 10 Tablet RA00260"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Dell Pro Rugged 12 Tablet RA02260"), + }, + }, { } };
From: Daniel Wagner wagi@kernel.org
[ Upstream commit b71cbcf7d170e51148d5467820ae8a72febcb651 ]
nvme_fc_ctrl_put can acquire the rport lock when freeing the ctrl object:
nvme_fc_ctrl_put nvme_fc_ctrl_free spin_lock_irqsave(rport->lock)
Thus we can't hold the rport lock when calling nvme_fc_ctrl_put.
Justin suggested use the safe list iterator variant because nvme_fc_ctrl_put will also modify the rport->list.
Cc: Justin Tee justin.tee@broadcom.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Daniel Wagner wagi@kernel.org Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
This shows the affected function was introduced in v5.8-rc1 (`14fd1e98afafc`), meaning this deadlock bug has existed since **Linux 5.8** and affects all stable kernels from 5.8 onwards (5.10.y, 5.15.y, 6.1.y, 6.6.y, etc.).
### SUMMARY
**What the commit fixes:** A **deadlock bug** in the NVMe-FC (Fibre Channel) driver where `nvme_fc_match_disconn_ls()` holds `rport->lock` while calling `nvme_fc_ctrl_put()`. When the reference count hits zero, `nvme_fc_ctrl_free()` tries to acquire the same lock, causing a deadlock.
**Stable kernel criteria:** | Criterion | Assessment | |-----------|------------| | Obviously correct | ✅ Uses canonical lock drop/reacquire pattern | | Fixes real bug | ✅ Deadlock - system hang | | Important issue | ✅ Deadlocks in storage paths are critical | | Small and contained | ✅ ~6 lines in one function | | No new features | ✅ Pure bug fix | | Expert reviewed | ✅ Christoph Hellwig |
**Risk vs Benefit:** - **Risk:** LOW - The fix uses a well-established kernel pattern (`list_for_each_entry_safe` + lock release/reacquire) - **Benefit:** HIGH - Prevents deadlock in NVMe-FC storage driver used in enterprise environments
**Concerns:** - No explicit `Cc: stable` tag, but this is not required for obvious bug fixes - No `Fixes:` tag, but we've identified the bug exists since v5.8 - The fix should apply cleanly to any kernel with the affected function (5.8+)
### CONCLUSION
This commit fixes a clear deadlock bug in the NVMe-FC driver that has existed since Linux 5.8. The fix is: - Small and surgical (only ~6 lines changed) - Uses well-understood, standard kernel locking patterns - Has been reviewed by a respected kernel developer (Christoph Hellwig) - Signed off by the NVMe maintainer (Keith Busch) - Affects enterprise storage users who rely on NVMe over Fibre Channel
Deadlocks in storage drivers are serious issues that warrant stable backporting. The minimal scope and established fix pattern make this a low-risk, high-value backport.
**YES**
drivers/nvme/host/fc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 2c903729b0b90..8324230c53719 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -1468,14 +1468,14 @@ nvme_fc_match_disconn_ls(struct nvme_fc_rport *rport, { struct fcnvme_ls_disconnect_assoc_rqst *rqst = &lsop->rqstbuf->rq_dis_assoc; - struct nvme_fc_ctrl *ctrl, *ret = NULL; + struct nvme_fc_ctrl *ctrl, *tmp, *ret = NULL; struct nvmefc_ls_rcv_op *oldls = NULL; u64 association_id = be64_to_cpu(rqst->associd.association_id); unsigned long flags;
spin_lock_irqsave(&rport->lock, flags);
- list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { + list_for_each_entry_safe(ctrl, tmp, &rport->ctrl_list, ctrl_list) { if (!nvme_fc_ctrl_get(ctrl)) continue; spin_lock(&ctrl->lock); @@ -1488,7 +1488,9 @@ nvme_fc_match_disconn_ls(struct nvme_fc_rport *rport, if (ret) /* leave the ctrl get reference */ break; + spin_unlock_irqrestore(&rport->lock, flags); nvme_fc_ctrl_put(ctrl); + spin_lock_irqsave(&rport->lock, flags); }
spin_unlock_irqrestore(&rport->lock, flags);
From: Pei Xiao xiaopei01@kylinos.cn
[ Upstream commit 541dfb49dcb80c2509e030842de77adfb77820f5 ]
./drivers/hwmon/emc2305.c:597:4-15: ERROR: probable double put
Device node iterators put the previous value of the index variable, so an explicit put causes a double put.
Signed-off-by: Pei Xiao xiaopei01@kylinos.cn Link: https://lore.kernel.org/r/tencent_CD373F952BE48697C949E39CB5EB77841D06@qq.co... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis Summary
### What the Bug Is This commit fixes a **double reference count decrement (double put)** on a device_node structure in the `emc2305_probe_childs_from_dt()` function.
**Technical explanation:** 1. The `for_each_child_of_node()` macro uses `of_get_next_child()` internally for iteration 2. As confirmed in the kernel documentation, `of_get_next_child()` "**Decrements the refcount of prev**" automatically when advancing to the next child 3. The buggy code explicitly calls `of_node_put(child)` before `continue` 4. When the loop continues, `of_get_next_child()` decrements the refcount again → **double put**
**Impact:** Double put causes reference count underflow which can lead to: - Use-after-free vulnerabilities - Memory corruption - Kernel crashes/instability
### Stable Tree Criteria Evaluation
| Criteria | Met? | Details | |----------|------|---------| | Obviously correct | ✅ | Standard DT iterator pattern fix | | Fixes real bug | ✅ | Reference counting bug confirmed | | Important issue | ✅ | Potential UAF/memory corruption | | Small and contained | ✅ | Removes 4 lines in one function | | No new features | ✅ | Pure bug fix | | Tested | ✅ | Signed-off by hwmon maintainer |
### Risk Assessment - **Risk**: Very low - the fix simply removes incorrect `of_node_put()` calls - **Scope**: Single function, single driver (emc2305 hwmon) - **Backport complexity**: None - straightforward removal of lines
### Version Analysis The buggy code was introduced in commit `2ed4db7a1d07b` which first appeared in **v6.17-rc1**. This means: - Only kernels 6.17+ have this bug - Older stable trees (6.12.y, 6.6.y, 6.1.y, etc.) do **NOT** have this code
### Concerns - No explicit "Cc: stable@vger.kernel.org" tag - No "Fixes:" tag pointing to the introducing commit - However, the bug and fix are clearly documented and understood
### Verdict This is a legitimate bug fix that corrects an obvious reference counting error. The fix is: - Trivially correct (well-known DT iterator pattern) - Very low risk - Fixes a real bug that can cause memory corruption
While the affected code only exists in 6.17+, this is still a valid stable backport candidate for the 6.17.y stable branch and should be backported to ensure stable users don't hit this reference counting bug.
**YES**
drivers/hwmon/emc2305.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/hwmon/emc2305.c b/drivers/hwmon/emc2305.c index 84cb9b72cb6c2..ceae96c07ac45 100644 --- a/drivers/hwmon/emc2305.c +++ b/drivers/hwmon/emc2305.c @@ -593,10 +593,8 @@ static int emc2305_probe_childs_from_dt(struct device *dev) for_each_child_of_node(dev->of_node, child) { if (of_property_present(child, "reg")) { ret = emc2305_of_parse_pwm_child(dev, child, data); - if (ret) { - of_node_put(child); + if (ret) continue; - } count++; } }
From: "Derek J. Clark" derekjohn.clark@gmail.com
[ Upstream commit 55715d7ad5e772d621c3201da3895f250591bce8 ]
Add Legion Go 2 SKU's to the Extreme Mode quirks table.
Signed-off-by: Derek J. Clark derekjohn.clark@gmail.com Reviewed-by: Armin Wolf W_Armin@gmx.de Reviewed-by: Mark Pearson mpearson-lenovo@squebb.ca Link: https://patch.msgid.link/20251127151605.1018026-4-derekjohn.clark@gmail.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis of Commit: platform/x86: wmi-gamezone: Add Legion Go 2 Quirks
### 1. COMMIT MESSAGE ANALYSIS
**Subject:** Adds Legion Go 2 SKUs to the Extreme Mode quirks table
**Tags present:** - Multiple Reviewed-by tags (3 reviewers: Armin Wolf, Mark Pearson, Ilpo Järvinen) - Signed-off-by tags
**Tags absent:** - No `Cc: stable@vger.kernel.org` - No `Fixes:` tag
### 2. CODE CHANGE ANALYSIS
The change is minimal and mechanical: - Adds two new DMI entries to the existing `fwbug_list[]` table - New entries: "Legion Go 8ASP2" and "Legion Go 8AHP2" (Legion Go 2 variants) - Both use the same `&quirk_no_extreme_bug` quirk as existing Legion Go devices - Also removes a stray blank line (cleanup)
The structure is identical to existing entries - DMI vendor/product matching to apply a known quirk.
### 3. CLASSIFICATION: QUIRK/DEVICE-ID ADDITION
This falls into **two explicit exception categories** for stable:
1. **Device ID Addition:** Adding DMI identifiers to an existing driver to enable hardware support 2. **Hardware Quirk:** The `quirk_no_extreme_bug` works around firmware bugs where devices falsely report extreme thermal mode support
Without this quirk, the driver would attempt to enable "extreme mode" on Legion Go 2 devices that have incomplete BIOS implementations, potentially causing thermal management issues.
### 4. SCOPE AND RISK ASSESSMENT
| Metric | Value | |--------|-------| | Lines added | ~14 (two DMI table entries) | | Files changed | 1 | | Complexity | Very low | | Risk | Minimal |
**Risk analysis:** - Change only affects Legion Go 2 hardware (DMI matching ensures isolation) - Uses exact same quirk mechanism proven with existing Legion Go devices - No new code paths introduced - Pattern identical to existing well-tested entries
### 5. USER IMPACT
**Affected users:** Legion Go 2 (8ASP2/8AHP2) owners
**Without this fix:** These devices might have their thermal profiles/extreme mode misconfigured due to firmware bugs, potentially causing: - Unexpected platform profile behavior - Incorrect thermal mode settings
**Severity:** Moderate - hardware usability issue
### 6. STABILITY INDICATORS
- **3 Reviewed-by tags** from different reviewers (strong review coverage) - Pattern is well-established in the driver - Mechanical, predictable change
### 7. DEPENDENCY CHECK
The wmi-gamezone driver needs to exist in the target stable tree. This is a relatively new driver (for Legion Go devices released ~2023), so it may only exist in recent stable branches (6.6+). If the driver doesn't exist in older stables, the patch simply won't apply.
### DECISION RATIONALE
**Arguments FOR backporting:** 1. Classic quirk addition - explicitly allowed exception in stable rules 2. Equivalent to device ID addition for new hardware SKUs 3. Very small, surgical change with minimal risk 4. Uses existing infrastructure and proven quirk 5. Well-reviewed (3 reviewers) 6. Fixes real hardware behavior issues (firmware bugs) 7. DMI matching isolates impact to specific hardware only
**Arguments AGAINST:** 1. No explicit `Cc: stable` tag from maintainer 2. Adds support for new hardware (could be viewed as feature) 3. Driver may not exist in older stable trees
**Conclusion:**
This commit is a textbook example of a hardware quirk addition that's appropriate for stable backporting. The stable kernel documentation explicitly allows: - Adding device IDs to existing drivers - Adding hardware quirks/workarounds for buggy devices
The change is small (~14 lines), low risk (DMI-isolated to specific hardware), uses existing proven mechanisms, and fixes real hardware issues. The lack of an explicit stable tag isn't disqualifying - many valid stable patches don't include it. Stable maintainers can determine applicability based on whether the driver exists in their trees.
**YES**
drivers/platform/x86/lenovo/wmi-gamezone.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/x86/lenovo/wmi-gamezone.c b/drivers/platform/x86/lenovo/wmi-gamezone.c index 0eb7fe8222f4a..b26806b37d960 100644 --- a/drivers/platform/x86/lenovo/wmi-gamezone.c +++ b/drivers/platform/x86/lenovo/wmi-gamezone.c @@ -274,8 +274,23 @@ static const struct dmi_system_id fwbug_list[] = { }, .driver_data = &quirk_no_extreme_bug, }, + { + .ident = "Legion Go 8ASP2", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "Legion Go 8ASP2"), + }, + .driver_data = &quirk_no_extreme_bug, + }, + { + .ident = "Legion Go 8AHP2", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "Legion Go 8AHP2"), + }, + .driver_data = &quirk_no_extreme_bug, + }, {}, - };
/**
From: Pei Xiao xiaopei01@kylinos.cn
[ Upstream commit 4910da6b36b122db50a27fabf6ab7f8611b60bf8 ]
The for_each_child_of_node() macro automatically manages device node reference counts during normal iteration. However, when breaking out of the loop early with return, the current iteration's node is not automatically released, leading to a reference count leak.
Fix this by adding of_node_put(child) before returning from the loop when emc2305_set_single_tz() fails.
This issue could lead to memory leaks over multiple probe cycles.
Signed-off-by: Pei Xiao xiaopei01@kylinos.cn Link: https://lore.kernel.org/r/tencent_5CDC08544C901D5ECA270573D5AEE3117108@qq.co... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis of Commit: hwmon: (emc2305) fix device node refcount leak in error path
### 1. COMMIT MESSAGE ANALYSIS
**Subject:** Clear "fix" language indicates a bug fix targeting a "refcount leak" in an "error path"
**Body explanation:** - The `for_each_child_of_node()` macro manages device node reference counts internally - When breaking out of the loop early with `return`, the current node's reference is NOT automatically released - This causes a reference count leak leading to memory leaks over multiple probe cycles
**Tags present:** - `Signed-off-by` from author and maintainer (Guenter Roeck) - `Link:` to mailing list discussion
**Tags missing:** - No `Cc: stable@vger.kernel.org` - No `Fixes:` tag
### 2. CODE CHANGE ANALYSIS
The fix is extremely small and surgical:
```c for_each_child_of_node(dev->of_node, child) { ret = emc2305_set_single_tz(dev, child, i); - if (ret != 0) + if (ret != 0) { + of_node_put(child); return ret; + } i++; } ```
**Technical mechanism:** - `for_each_child_of_node()` calls `of_node_get()` on each child internally - On normal loop completion, the macro decrements the refcount - On early exit (return/break), the caller must manually call `of_node_put()` to release the reference - Without this, each failed probe leaves an unreleased reference → memory leak
**Root cause:** Missing required cleanup call when breaking out of device tree iterator macro
**Why fix is correct:** This is the standard, well-documented pattern in the Linux kernel for handling early exits from `for_each_child_of_node()`. Adding `of_node_put(child)` before return is textbook correct.
### 3. CLASSIFICATION
- **Bug type:** Resource leak (reference count / memory leak) - **Category:** Standard bug fix - **Security:** Not a security issue - **New features:** None
### 4. SCOPE AND RISK ASSESSMENT
| Metric | Value | |--------|-------| | Lines changed | 3 (effectively +1 functional line) | | Files touched | 1 | | Complexity | Very low | | Risk | Near zero |
**Risk analysis:** - The fix only adds a cleanup call in an error path that already returns immediately - Cannot possibly affect normal operation - The `of_node_put()` function is well-tested core kernel infrastructure - Adding required cleanup where it was missing cannot cause regression
### 5. USER IMPACT
**Affected users:** Those with EMC2305 fan controller hardware using device tree
**Trigger conditions:** 1. Device must have child nodes in device tree 2. `emc2305_set_single_tz()` must fail 3. Must happen repeatedly over time
**Severity:** Low to medium - memory leak that accumulates over multiple failed probe cycles. Not a crash or security issue, but can eventually exhaust memory on long-running systems.
### 6. STABILITY INDICATORS
- Accepted by hwmon subsystem maintainer (Guenter Roeck) - Simple, well-understood fix pattern - No complex interactions possible
### 7. DEPENDENCY CHECK
- No dependencies on other commits - `for_each_child_of_node()` and `of_node_put()` are long-standing kernel APIs - The emc2305 driver must exist in the target stable tree
---
## Summary
**What the commit fixes:** A device node reference count leak in the emc2305 hwmon driver that occurs when `emc2305_set_single_tz()` fails during probe. This can lead to memory leaks over multiple probe cycles.
**Stable kernel rules assessment:** 1. ✅ **Obviously correct:** Standard kernel pattern, textbook fix 2. ✅ **Fixes real bug:** Yes, reference count leak causing memory leak 3. ⚠️ **Important issue:** Moderate severity - memory leak in error path 4. ✅ **Small and contained:** 3 lines changed in 1 file 5. ✅ **No new features:** Pure bug fix 6. ✅ **Clean application:** Should apply cleanly
**Risk vs Benefit:** - **Risk:** Essentially zero - adds required cleanup in error path - **Benefit:** Prevents memory leak on systems using this hardware
**Concerns:** - No explicit `Cc: stable` tag from maintainer - Bug requires specific error condition to trigger - Affects only specific hardware
**Verdict:** Despite the lack of explicit stable tagging, this fix is a textbook example of a safe backport candidate. The fix is trivially correct, follows a well-established kernel pattern, has zero risk of regression, and fixes a real (if low-severity) resource leak. Similar `for_each_child_of_node()` refcount leak fixes are regularly backported to stable trees.
**YES**
drivers/hwmon/emc2305.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/hwmon/emc2305.c b/drivers/hwmon/emc2305.c index 60809289f8169..84cb9b72cb6c2 100644 --- a/drivers/hwmon/emc2305.c +++ b/drivers/hwmon/emc2305.c @@ -685,8 +685,10 @@ static int emc2305_probe(struct i2c_client *client) i = 0; for_each_child_of_node(dev->of_node, child) { ret = emc2305_set_single_tz(dev, child, i); - if (ret != 0) + if (ret != 0) { + of_node_put(child); return ret; + } i++; } } else {
From: Gregory CLEMENT gregory.clement@bootlin.com
[ Upstream commit 36dac9a3dda1f2bae343191bc16b910c603cac25 ]
Since commit e424054000878 ("MIPS: Tracing: Reduce the overhead of dynamic Function Tracer"), the macro UASM_i_LA_mostly has been used, and this macro can generate more than 2 instructions. At the same time, the code in ftrace assumes that no more than 2 instructions can be generated, which is why it stores them in an int[2] array. However, as previously noted, the macro UASM_i_LA_mostly (and now UASM_i_LA) causes a buffer overflow when _mcount is beyond 32 bits. This leads to corruption of the variables located in the __read_mostly section.
This corruption was observed because the variable __cpu_primary_thread_mask was corrupted, causing a hang very early during boot.
This fix prevents the corruption by avoiding the generation of instructions if they could exceed 2 instructions in length. Fortunately, insn_la_mcount is only used if the instrumented code is located outside the kernel code section, so dynamic ftrace can still be used, albeit in a more limited scope. This is still preferable to corrupting memory and/or crashing the kernel.
Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## COMPREHENSIVE ANALYSIS
### 1. COMMIT MESSAGE ANALYSIS - **Subject**: Contains "Fix" and "memory corruption" - strong indicators of a serious bug fix - **Body**: Clearly explains the buffer overflow mechanism and observed real-world impact - **References**: Cites commit e424054000878 as the source of the bug (introduced in 2010, kernel 2.6.35) - **No explicit Cc: stable tag**, but the nature of the fix makes it appropriate for stable
### 2. TECHNICAL ROOT CAUSE The bug is in `ftrace_dyn_arch_init_insns()`:
1. **Buffer size**: `insn_la_mcount[2]` is declared as a 2-element array (8 bytes) 2. **UASM_i_LA behavior**: When address is NOT in 32-bit compat space, it can generate up to 5-6 instructions: - `lui` (always) - `daddiu` (conditional) - `dsll` (conditional) - `daddiu` (conditional) - `dsll` (conditional) - final `daddiu`/`addiu` 3. **Overflow**: Writing more than 2 instructions overwrites adjacent `__read_mostly` variables 4. **Observed impact**: Corruption of `__cpu_primary_thread_mask` causing boot hang
### 3. FIX MECHANISM The fix adds two defensive checks: 1. **In `ftrace_dyn_arch_init_insns()`**: Only generate instructions if `uasm_in_compat_space_p(MCOUNT_ADDR)` - otherwise warn and skip 2. **In `ftrace_make_call()`**: Return `-EFAULT` if `insn_la_mcount` would be needed but wasn't generated
This gracefully degrades functionality rather than corrupting memory.
### 4. STABLE KERNEL CRITERIA ASSESSMENT
| Criterion | Assessment | |-----------|------------| | Obviously correct | ✅ Simple defensive check before buffer write | | Fixes real bug | ✅ Memory corruption causing boot hang | | Important issue | ✅ System crash/hang - very severe | | Small and contained | ✅ Single file, ~30 lines changed | | No new features | ✅ Actually reduces functionality in edge cases | | No new APIs | ✅ Purely internal change |
### 5. DEPENDENCY CHECK - **`uasm_in_compat_space_p()`**: Exists since kernel 2.6.x (commit e30ec4525d473) - **Bug source commit**: e424054000878 from 2010 (2.6.35) - **Dependencies**: None - fix is self-contained
### 6. RISK vs BENEFIT
**Risk**: Very LOW - Defensive check - prevents execution rather than changing behavior - Worst case: ftrace doesn't work for code outside kernel text on 64-bit MIPS with addresses beyond 32 bits - Architecture-specific (MIPS only)
**Benefit**: HIGH - Prevents memory corruption that causes boot hangs - Bug has existed since 2010 - affects all stable kernels - Observable real-world failure
### 7. USER IMPACT
- **Affected users**: MIPS 64-bit users with kernel loaded at addresses beyond 32 bits - **Severity**: Critical - boot hang due to memory corruption - **Reproducibility**: Deterministic when conditions are met (not a race)
### 8. CONCERNS
- **No explicit Cc: stable@vger.kernel.org tag**: However, the commit clearly fixes a serious memory corruption bug - **Partial functionality loss**: Some ftrace capabilities reduced for 64-bit addresses, but this is far preferable to memory corruption
### CONCLUSION
This commit is an excellent stable backport candidate: 1. **Fixes a serious bug**: Memory corruption causing system boot hangs 2. **Minimal risk**: Defensive check that gracefully degrades functionality 3. **Self-contained**: Single file change with no dependencies on new code 4. **Long-standing bug**: Affects all kernels since 2.6.35 5. **MIPS maintainer signed-off**: Thomas Bogendoerfer approved the fix
The fix is small, surgical, and meets all stable kernel criteria. The trade-off (reduced ftrace functionality in edge cases vs memory corruption) strongly favors the fix.
**YES**
arch/mips/kernel/ftrace.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c index f39e85fd58fa9..b15615b285690 100644 --- a/arch/mips/kernel/ftrace.c +++ b/arch/mips/kernel/ftrace.c @@ -54,10 +54,20 @@ static inline void ftrace_dyn_arch_init_insns(void) u32 *buf; unsigned int v1;
- /* la v1, _mcount */ - v1 = 3; - buf = (u32 *)&insn_la_mcount[0]; - UASM_i_LA(&buf, v1, MCOUNT_ADDR); + /* If we are not in compat space, the number of generated + * instructions will exceed the maximum expected limit of 2. + * To prevent buffer overflow, we avoid generating them. + * insn_la_mcount will not be used later in ftrace_make_call. + */ + if (uasm_in_compat_space_p(MCOUNT_ADDR)) { + /* la v1, _mcount */ + v1 = 3; + buf = (u32 *)&insn_la_mcount[0]; + UASM_i_LA(&buf, v1, MCOUNT_ADDR); + } else { + pr_warn("ftrace: mcount address beyond 32 bits is not supported (%lX)\n", + MCOUNT_ADDR); + }
/* jal (ftrace_caller + 8), jump over the first two instruction */ buf = (u32 *)&insn_jal_ftrace_caller; @@ -189,6 +199,13 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) unsigned int new; unsigned long ip = rec->ip;
+ /* When the code to patch does not belong to the kernel code + * space, we must use insn_la_mcount. However, if MCOUNT_ADDR + * is not in compat space, insn_la_mcount is not usable. + */ + if (!core_kernel_text(ip) && !uasm_in_compat_space_p(MCOUNT_ADDR)) + return -EFAULT; + new = core_kernel_text(ip) ? insn_jal_ftrace_caller : insn_la_mcount[0];
#ifdef CONFIG_64BIT
From: Justin Tee justintee8345@gmail.com
[ Upstream commit 13989207ee29c40501e719512e8dc90768325895 ]
With authentication, in addition to EKEYREJECTED there is also no point in retrying reconnects when status is ENOKEY. Thus, add -ENOKEY as another criteria to determine when to stop retries.
Cc: Daniel Wagner wagi@kernel.org Cc: Hannes Reinecke hare@suse.de Closes: https://lore.kernel.org/linux-nvme/20250829-nvme-fc-sync-v3-0-d69c87e63aee@k... Signed-off-by: Justin Tee justintee8345@gmail.com Tested-by: Daniel Wagner wagi@kernel.org Reviewed-by: Daniel Wagner wagi@kernel.org Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Comprehensive Analysis
### 1. COMMIT MESSAGE ANALYSIS - **Purpose**: Add `-ENOKEY` to prevent pointless reconnect retries when authentication key is missing - **Tags**: `Tested-by` (Daniel Wagner), `Reviewed-by` (Daniel Wagner, Hannes Reinecke), `Closes:` (lore link) - **Missing**: No explicit `Cc: stable@vger.kernel.org` or `Fixes:` tag - **Maintainer signoff**: Keith Busch (NVMe maintainer)
### 2. CODE CHANGE ANALYSIS
The change is minimal - single line modification: ```c - if (status == -EKEYREJECTED) + if (status == -EKEYREJECTED || status == -ENOKEY) ```
**Where `-ENOKEY` is returned:** - `drivers/nvme/host/auth.c:720` - No session key negotiated - `drivers/nvme/host/auth.c:973` - No host key (`ctrl->host_key` is NULL) - `drivers/nvme/host/auth.c:978` - Controller key configured but invalid - `drivers/nvme/host/tcp.c:1698,2080,2112,2121` - Various TLS/PSK key failures
All these represent "key does not exist" scenarios where retrying cannot help.
**Function impact:** `nvmf_should_reconnect()` is called by all three NVMe fabric transports (TCP, FC, RDMA) via `nvme_tcp_reconnect_or_remove()`, `nvme_fc_reconnect_or_delete()`, and `nvme_rdma_reconnect_or_remove()`.
### 3. CLASSIFICATION - **Bug fix**: Yes - fixes futile retry behavior - **New feature**: No - extends existing error handling pattern - **Follows established pattern**: The `-EKEYREJECTED` check was added in v6.10 (commit 0e34bd9605f6c) with identical logic
### 4. SCOPE AND RISK ASSESSMENT - **Lines changed**: 1 - **Files touched**: 1 - **Complexity**: Trivial - **Risk**: Extremely low - change only affects reconnect decision for an already-failed authentication - **Regression potential**: Near zero - the code path only executes when authentication already failed
### 5. USER IMPACT - **Who is affected**: Users of NVMe Fabrics (TCP/RDMA/FC) with authentication enabled - **Severity without fix**: Wasteful reconnect retries, potential log spam, resource consumption - **Not a crash/data corruption**: This is a behavioral improvement, not a critical fix
### 6. STABILITY INDICATORS - Tested by Daniel Wagner (NVMe developer) - Reviewed by Daniel Wagner and Hannes Reinecke (both storage/NVMe experts) - Clean, simple change with clear semantics
### 7. DEPENDENCY CHECK - Requires commit `0e34bd9605f6c` ("nvme: do not retry authentication failures") from v6.10 - NVMe authentication feature itself was added in v6.1 (`f50fff73d620c`) - Backport applies cleanly to trees with the `-EKEYREJECTED` check
### Decision Rationale
**Pros for backporting:** - Trivial one-line change with zero regression risk - Fixes real wasteful behavior (pointless retries that can never succeed) - Follows existing code pattern already established - Reviewed and tested by domain experts - Semantically correct: `-ENOKEY` means "no key available" - retry won't create one
**Cons for backporting:** - No explicit `Cc: stable@vger.kernel.org` tag from maintainers - Not a crash, security bug, or data corruption fix - NVMe authentication is a relatively niche feature - Bug impact is resource waste, not functional failure
**Conclusion:** This is a low-risk, obviously correct bug fix that prevents wasteful behavior. While it lacks explicit stable tags and isn't fixing a critical bug, the change is so simple and safe that the benefit-to-risk ratio strongly favors inclusion. The fix completes the authentication error handling that was started with the `-EKEYREJECTED` check, making it a natural complement to that existing code. Stable kernel users with NVMe authentication would benefit from not having pointless reconnection storms when their keys are missing.
**YES**
drivers/nvme/host/fabrics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 2e58a7ce10905..55a8afd2efd50 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -592,7 +592,7 @@ bool nvmf_should_reconnect(struct nvme_ctrl *ctrl, int status) if (status > 0 && (status & NVME_STATUS_DNR)) return false;
- if (status == -EKEYREJECTED) + if (status == -EKEYREJECTED || status == -ENOKEY) return false;
if (ctrl->opts->max_reconnects == -1 ||
From: Jinhui Guo guojinhui.liam@bytedance.com
[ Upstream commit d3429178ee51dd7155445d15a5ab87a45fae3c73 ]
When probing the I2C master, disable SMBus interrupts to prevent storms caused by broken firmware mis-configuring IC_SMBUS=1; the handler never services them and a mis-configured SMBUS Master extend-clock timeout or SMBUS Slave extend-clock timeout can flood the CPU.
Signed-off-by: Jinhui Guo guojinhui.liam@bytedance.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@kernel.org Link: https://lore.kernel.org/r/20251021075714.3712-2-guojinhui.liam@bytedance.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## ANALYSIS SUMMARY
### Problem Being Fixed This commit fixes an **interrupt storm** caused by misconfigured firmware that leaves `IC_SMBUS=1` in the DesignWare I2C controller. When this happens: 1. SMBus-related interrupts (Master/Slave extend-clock timeout) get enabled 2. The driver has NO handler for these interrupts 3. Unserviced interrupts fire continuously, flooding the CPU 4. This effectively causes a denial-of-service condition
### Code Change Assessment
**Header file (`i2c-designware-core.h`):** - Adds one register offset: `#define DW_IC_SMBUS_INTR_MASK 0xcc`
**Master file (`i2c-designware-master.c`):** - In `i2c_dw_init_master()`, after disabling the adapter, writes 0 to the SMBus interrupt mask register to disable all SMBus interrupts
The fix is **defensive programming**: it proactively masks interrupts the driver doesn't use, regardless of what firmware may have configured.
### Stable Criteria Evaluation
| Criterion | Assessment | |-----------|------------| | Obviously correct | ✅ Yes - writing 0 to mask unused interrupts is safe | | Fixes real bug | ✅ Yes - interrupt storms are real and severe | | Important issue | ✅ Yes - system can become unresponsive | | Small and contained | ✅ Yes - 2 lines of actual code | | No new features | ✅ Correct - purely defensive | | Clean backport | ✅ Yes - minimal, self-contained change |
### Risk vs Benefit
**Benefits:** - Prevents interrupt storms that can make systems unusable - ByteDance reporting this suggests real-world occurrence in production - The i2c-designware driver is widely used (Intel SoCs, embedded systems)
**Risks:** - Nearly zero - only disables interrupts the driver never uses - Writing 0 to an interrupt mask register is inherently safe - The function structure (`i2c_dw_init_master`) has been stable
### Stability Indicators - **Reviewed-by:** Andy Shevchenko (well-known kernel developer) - **Acked-by:** Mika Westerberg (Intel/I2C maintainer) - Well-commented explaining the rationale
### Missing Tags Analysis - **No `Cc: stable`** - not disqualifying; many important fixes lack this - **No `Fixes:` tag** - the bug likely existed since driver creation, not a recent regression
### Category This falls into the **hardware quirk/workaround** exception category - defensive code to handle broken/misconfigured firmware, which is explicitly allowed in stable trees.
### Dependency Check - No dependencies on other commits - The `i2c_dw_init_master()` function exists in all stable trees with the same structure - Should apply cleanly to stable kernels
---
## Conclusion
This commit should be backported to stable kernels because:
1. **Severity**: Interrupt storms can make systems completely unresponsive 2. **Fix quality**: Minimal, surgical, well-reviewed change 3. **Risk**: Effectively zero - disabling unused interrupts cannot cause regression 4. **Scope**: Affects a widely-used I2C controller driver 5. **Real-world impact**: Reported by ByteDance, indicating production occurrence 6. **Category**: Hardware workaround - appropriate for stable
The fix is a textbook example of defensive programming that protects against broken firmware while having no downside.
**YES**
drivers/i2c/busses/i2c-designware-core.h | 1 + drivers/i2c/busses/i2c-designware-master.c | 7 +++++++ 2 files changed, 8 insertions(+)
diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h index 347843b4f5dd7..436555543c79d 100644 --- a/drivers/i2c/busses/i2c-designware-core.h +++ b/drivers/i2c/busses/i2c-designware-core.h @@ -78,6 +78,7 @@ #define DW_IC_TX_ABRT_SOURCE 0x80 #define DW_IC_ENABLE_STATUS 0x9c #define DW_IC_CLR_RESTART_DET 0xa8 +#define DW_IC_SMBUS_INTR_MASK 0xcc #define DW_IC_COMP_PARAM_1 0xf4 #define DW_IC_COMP_VERSION 0xf8 #define DW_IC_SDA_HOLD_MIN_VERS 0x3131312A /* "111*" == v1.11* */ diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c index 41e9b5ecad201..45bfca05bb300 100644 --- a/drivers/i2c/busses/i2c-designware-master.c +++ b/drivers/i2c/busses/i2c-designware-master.c @@ -220,6 +220,13 @@ static int i2c_dw_init_master(struct dw_i2c_dev *dev) /* Disable the adapter */ __i2c_dw_disable(dev);
+ /* + * Mask SMBus interrupts to block storms from broken + * firmware that leaves IC_SMBUS=1; the handler never + * services them. + */ + regmap_write(dev->map, DW_IC_SMBUS_INTR_MASK, 0); + /* Write standard speed timing parameters */ regmap_write(dev->map, DW_IC_SS_SCL_HCNT, dev->ss_hcnt); regmap_write(dev->map, DW_IC_SS_SCL_LCNT, dev->ss_lcnt);
linux-stable-mirror@lists.linaro.org