From: Seunghun Han kkamagui@gmail.com
[ Upstream commit 156fd20a41e776bbf334bd5e45c4f78dfc90ce1c ]
ACPICA commit 987a3b5cf7175916e2a4b6ea5b8e70f830dfe732
I found an ACPI cache leak in ACPI early termination and boot continuing case.
When early termination occurs due to malicious ACPI table, Linux kernel terminates ACPI function and continues to boot process. While kernel terminates ACPI function, kmem_cache_destroy() reports Acpi-Operand cache leak.
Boot log of ACPI operand cache leak is as follows:
[ 0.585957] ACPI: Added _OSI(Module Device) [ 0.587218] ACPI: Added _OSI(Processor Device) [ 0.588530] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.589790] ACPI: Added _OSI(Processor Aggregator Device) [ 0.591534] ACPI Error: Illegal I/O port address/length above 64K: C806E00000004002/0x2 (20170303/hwvalid-155) [ 0.594351] ACPI Exception: AE_LIMIT, Unable to initialize fixed events (20170303/evevent-88) [ 0.597858] ACPI: Unable to start the ACPI Interpreter [ 0.599162] ACPI Error: Could not remove SCI handler (20170303/evmisc-281) [ 0.601836] kmem_cache_destroy Acpi-Operand: Slab cache still has objects [ 0.603556] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc5 #26 [ 0.605159] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 [ 0.609177] Call Trace: [ 0.610063] ? dump_stack+0x5c/0x81 [ 0.611118] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.612632] ? acpi_sleep_proc_init+0x27/0x27 [ 0.613906] ? acpi_os_delete_cache+0xa/0x10 [ 0.617986] ? acpi_ut_delete_caches+0x3f/0x7b [ 0.619293] ? acpi_terminate+0xa/0x14 [ 0.620394] ? acpi_init+0x2af/0x34f [ 0.621616] ? __class_create+0x4c/0x80 [ 0.623412] ? video_setup+0x7f/0x7f [ 0.624585] ? acpi_sleep_proc_init+0x27/0x27 [ 0.625861] ? do_one_initcall+0x4e/0x1a0 [ 0.627513] ? kernel_init_freeable+0x19e/0x21f [ 0.628972] ? rest_init+0x80/0x80 [ 0.630043] ? kernel_init+0xa/0x100 [ 0.631084] ? ret_from_fork+0x25/0x30 [ 0.633343] vgaarb: loaded [ 0.635036] EDAC MC: Ver: 3.0.0 [ 0.638601] PCI: Probing PCI hardware [ 0.639833] PCI host bridge to bus 0000:00 [ 0.641031] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] ... Continue to boot and log is omitted ...
I analyzed this memory leak in detail and found acpi_ds_obj_stack_pop_and_ delete() function miscalculated the top of the stack. acpi_ds_obj_stack_push() function uses walk_state->operand_index for start position of the top, but acpi_ds_obj_stack_pop_and_delete() function considers index 0 for it. Therefore, this causes acpi operand memory leak.
This cache leak causes a security threat because an old kernel (<= 4.9) shows memory locations of kernel functions in stack dump. Some malicious users could use this information to neutralize kernel ASLR.
I made a patch to fix ACPI operand cache leak.
Link: https://github.com/acpica/acpica/commit/987a3b5c Signed-off-by: Seunghun Han kkamagui@gmail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Link: https://patch.msgid.link/4999480.31r3eYUQgx@rjwysocki.net Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES** This commit should be backported to stable kernel trees based on the following analysis: ## Analysis of the Commit **1. Bug Fix Nature**: This commit fixes a clear memory leak bug in the ACPICA subsystem. The issue occurs when `acpi_ds_create_operands()` encounters an error during operand creation and needs to clean up allocated operands. **2. Root Cause**: The core issue is a mismatch between how operands are pushed and popped from the stack: - `acpi_ds_obj_stack_push()` uses `walk_state->operand_index` to track the current position - `acpi_ds_obj_stack_pop_and_delete()` incorrectly assumes operands start at index 0 - This causes incomplete cleanup when errors occur, leading to memory leaks **3. Code Analysis**: The fix correctly addresses the issue by: - Tracking the original operand count (`prev_num_operands`) - Calculating the correct number of operands to clean up (`new_num_operands`) - Setting `walk_state->num_operands = i` before cleanup to ensure proper indexing - Restoring the original operand count after cleanup **4. Security Implications**: The commit message explicitly mentions this is a security issue - the memory leak can expose kernel function addresses in older kernels (≤4.9), potentially defeating ASLR. This elevates the importance beyond a simple memory leak. **5. Pattern Consistency**: This follows the exact same pattern as other ACPICA operand cache leak fixes that were successfully backported (commits `97f3c0a4b057`, `2915f16bdce2`, etc.), all authored by the same developer addressing similar issues. **6. Stability Characteristics**: - **Small and contained**: Only 8 lines of actual logic changes - **Clear error path**: Only affects cleanup during error conditions - **No behavioral changes**: No impact on normal operation - **Low regression risk**: Only modifies error handling paths **7. Backport Suitability Criteria**: - ✅ Fixes important bug affecting users (memory leak) - ✅ Small, contained fix with minimal risk - ✅ No architectural changes - ✅ Addresses security concern (ASLR bypass potential) - ✅ Follows established patterns for ACPICA fixes - ✅ Critical kernel subsystem (ACPI) **8. Historical Precedent**: The similar commits in the reference examples (Similar Commits #1, #2, #3) all have "Backport Status: YES" and address nearly identical ACPICA operand cache leak issues with similar fixes and commit message patterns. This commit meets all criteria for stable tree backporting: it's an important bugfix, has minimal regression risk, affects a critical subsystem, and follows established patterns for similar fixes that were previously backported.
drivers/acpi/acpica/dsutils.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpica/dsutils.c b/drivers/acpi/acpica/dsutils.c index fb9ed5e1da89d..2bdae8a25e084 100644 --- a/drivers/acpi/acpica/dsutils.c +++ b/drivers/acpi/acpica/dsutils.c @@ -668,6 +668,8 @@ acpi_ds_create_operands(struct acpi_walk_state *walk_state, union acpi_parse_object *arguments[ACPI_OBJ_NUM_OPERANDS]; u32 arg_count = 0; u32 index = walk_state->num_operands; + u32 prev_num_operands = walk_state->num_operands; + u32 new_num_operands; u32 i;
ACPI_FUNCTION_TRACE_PTR(ds_create_operands, first_arg); @@ -696,6 +698,7 @@ acpi_ds_create_operands(struct acpi_walk_state *walk_state,
/* Create the interpreter arguments, in reverse order */
+ new_num_operands = index; index--; for (i = 0; i < arg_count; i++) { arg = arguments[index]; @@ -720,7 +723,11 @@ acpi_ds_create_operands(struct acpi_walk_state *walk_state, * pop everything off of the operand stack and delete those * objects */ - acpi_ds_obj_stack_pop_and_delete(arg_count, walk_state); + walk_state->num_operands = i; + acpi_ds_obj_stack_pop_and_delete(new_num_operands, walk_state); + + /* Restore operand count */ + walk_state->num_operands = prev_num_operands;
ACPI_EXCEPTION((AE_INFO, status, "While creating Arg %u", index)); return_ACPI_STATUS(status);
From: Ahmed Salem x0rw3ll@gmail.com
[ Upstream commit 64b9dfd0776e9c38d733094859a09f13282ce6f8 ]
ACPICA commit 8b83a8d88dfec59ea147fad35fc6deea8859c58c
ap_get_table_length() checks if tables are valid by calling ap_is_valid_header(). The latter then calls ACPI_VALIDATE_RSDP_SIG(Table->Signature).
ap_is_valid_header() accepts struct acpi_table_header as an argument, so the signature size is always fixed to 4 bytes.
The problem is when the string comparison is between ACPI-defined table signature and ACPI_SIG_RSDP. Common ACPI table header specifies the Signature field to be 4 bytes long[1], with the exception of the RSDP structure whose signature is 8 bytes long "RSD PTR " (including the trailing blank character)[2]. Calling strncmp(sig, rsdp_sig, 8) would then result in a sequence overread[3] as sig would be smaller (4 bytes) than the specified bound (8 bytes).
As a workaround, pass the bound conditionally based on the size of the signature being passed.
Link: https://uefi.org/specs/ACPI/6.5_A/05_ACPI_Software_Programming_Model.html#sy... [1] Link: https://uefi.org/specs/ACPI/6.5_A/05_ACPI_Software_Programming_Model.html#ro... [2] Link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstringop-over... [3] Link: https://github.com/acpica/acpica/commit/8b83a8d8 Signed-off-by: Ahmed Salem x0rw3ll@gmail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Link: https://patch.msgid.link/2248233.Mh6RI2rZIc@rjwysocki.net Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES** This commit should be backported to stable kernel trees. ## Detailed Analysis: ### The Problem The commit fixes a **buffer overread vulnerability** in the `ACPI_VALIDATE_RSDP_SIG` macro. The issue occurs when: 1. **Context**: The macro `ACPI_VALIDATE_RSDP_SIG(a)` calls `strncmp(ACPI_CAST_PTR(char, (a)), ACPI_SIG_RSDP, 8)` 2. **Problem**: When `a` points to a standard ACPI table header (4 bytes signature), but the comparison tries to read 8 bytes for RSDP validation ("RSD PTR "), it reads 4 bytes beyond the allocated signature field 3. **Impact**: This causes a sequence overread that can be detected by tools like GCC's `-Wstringop-overread` and security-focused architectures like CHERI/Morello ### The Fix The fix changes line 530 in `include/acpi/actypes.h`: ```c // Before: #define ACPI_VALIDATE_RSDP_SIG(a) (!strncmp (ACPI_CAST_PTR (char, (a)), ACPI_SIG_RSDP, 8)) // After: #define ACPI_VALIDATE_RSDP_SIG(a) (!strncmp (ACPI_CAST_PTR (char, (a)), ACPI_SIG_RSDP, (sizeof(a) < 8) ? ACPI_NAMESEG_SIZE : 8)) ``` This conditionally limits the comparison to `ACPI_NAMESEG_SIZE` (4 bytes) when the source is smaller than 8 bytes, preventing the overread. ### Why This Should Be Backported: 1. **Security Fix**: This addresses a buffer overread that could be exploited or cause crashes on security-hardened systems 2. **Small & Contained**: The fix is a single line change to a macro definition with minimal risk 3. **Widespread Usage**: The macro is used in multiple critical ACPI code paths: - `drivers/acpi/acpica/tbprint.c` (kernel ACPI table printing) - `tools/power/acpi/tools/acpidump/apdump.c` (ACPI debugging tools) - Various other ACPI validation functions 4. **No Architectural Changes**: This doesn't change functionality, just prevents unsafe memory access 5. **Compiler Warning Fix**: Resolves build warnings with newer GCC versions using `-Wstringop-overread` ### Comparison with Historical Commits: - **Similar to Commit #3 (YES)**: Like the "disable -Wstringop-truncation" commit, this fixes compiler warnings related to string operations, but this one actually fixes the underlying safety issue rather than just suppressing warnings - **Unlike Commits #1,#2,#4,#5 (NO)**: Those were either pure code cleanup, debugging improvements, or infrastructure changes without security implications ### Risk Assessment: - **Minimal Risk**: The fix only affects boundary condition handling in string comparison - **No Functional Impact**: Normal RSDP validation still works correctly for valid 8-byte signatures - **Improves Safety**: Prevents potential memory safety violations on hardened systems This is exactly the type of small, safe, security-focused bugfix that stable trees are designed for.
include/acpi/actypes.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/acpi/actypes.h b/include/acpi/actypes.h index ff5fecff51167..f931312cf51a1 100644 --- a/include/acpi/actypes.h +++ b/include/acpi/actypes.h @@ -524,7 +524,7 @@ typedef u64 acpi_integer;
/* Support for the special RSDP signature (8 characters) */
-#define ACPI_VALIDATE_RSDP_SIG(a) (!strncmp (ACPI_CAST_PTR (char, (a)), ACPI_SIG_RSDP, 8)) +#define ACPI_VALIDATE_RSDP_SIG(a) (!strncmp (ACPI_CAST_PTR (char, (a)), ACPI_SIG_RSDP, (sizeof(a) < 8) ? ACPI_NAMESEG_SIZE : 8)) #define ACPI_MAKE_RSDP_SIG(dest) (memcpy (ACPI_CAST_PTR (char, (dest)), ACPI_SIG_RSDP, 8))
/* Support for OEMx signature (x can be any character) */
From: Seunghun Han kkamagui@gmail.com
[ Upstream commit bed18f0bdcd6737a938264a59d67923688696fc4 ]
ACPICA commit 8829e70e1360c81e7a5a901b5d4f48330e021ea5
I'm Seunghun Han, and I work for National Security Research Institute of South Korea.
I have been doing a research on ACPI and found an ACPI cache leak in ACPI early abort cases.
Boot log of ACPI cache leak is as follows: [ 0.352414] ACPI: Added _OSI(Module Device) [ 0.353182] ACPI: Added _OSI(Processor Device) [ 0.353182] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.353182] ACPI: Added _OSI(Processor Aggregator Device) [ 0.356028] ACPI: Unable to start the ACPI Interpreter [ 0.356799] ACPI Error: Could not remove SCI handler (20170303/evmisc-281) [ 0.360215] kmem_cache_destroy Acpi-State: Slab cache still has objects [ 0.360648] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #10 [ 0.361273] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 [ 0.361873] Call Trace: [ 0.362243] ? dump_stack+0x5c/0x81 [ 0.362591] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.362944] ? acpi_sleep_proc_init+0x27/0x27 [ 0.363296] ? acpi_os_delete_cache+0xa/0x10 [ 0.363646] ? acpi_ut_delete_caches+0x6d/0x7b [ 0.364000] ? acpi_terminate+0xa/0x14 [ 0.364000] ? acpi_init+0x2af/0x34f [ 0.364000] ? __class_create+0x4c/0x80 [ 0.364000] ? video_setup+0x7f/0x7f [ 0.364000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.364000] ? do_one_initcall+0x4e/0x1a0 [ 0.364000] ? kernel_init_freeable+0x189/0x20a [ 0.364000] ? rest_init+0xc0/0xc0 [ 0.364000] ? kernel_init+0xa/0x100 [ 0.364000] ? ret_from_fork+0x25/0x30
I analyzed this memory leak in detail. I found that “Acpi-State” cache and “Acpi-Parse” cache were merged because the size of cache objects was same slab cache size.
I finally found “Acpi-Parse” cache and “Acpi-parse_ext” cache were leaked using SLAB_NEVER_MERGE flag in kmem_cache_create() function.
Real ACPI cache leak point is as follows: [ 0.360101] ACPI: Added _OSI(Module Device) [ 0.360101] ACPI: Added _OSI(Processor Device) [ 0.360101] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.361043] ACPI: Added _OSI(Processor Aggregator Device) [ 0.364016] ACPI: Unable to start the ACPI Interpreter [ 0.365061] ACPI Error: Could not remove SCI handler (20170303/evmisc-281) [ 0.368174] kmem_cache_destroy Acpi-Parse: Slab cache still has objects [ 0.369332] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #8 [ 0.371256] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 [ 0.372000] Call Trace: [ 0.372000] ? dump_stack+0x5c/0x81 [ 0.372000] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.372000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.372000] ? acpi_os_delete_cache+0xa/0x10 [ 0.372000] ? acpi_ut_delete_caches+0x56/0x7b [ 0.372000] ? acpi_terminate+0xa/0x14 [ 0.372000] ? acpi_init+0x2af/0x34f [ 0.372000] ? __class_create+0x4c/0x80 [ 0.372000] ? video_setup+0x7f/0x7f [ 0.372000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.372000] ? do_one_initcall+0x4e/0x1a0 [ 0.372000] ? kernel_init_freeable+0x189/0x20a [ 0.372000] ? rest_init+0xc0/0xc0 [ 0.372000] ? kernel_init+0xa/0x100 [ 0.372000] ? ret_from_fork+0x25/0x30 [ 0.388039] kmem_cache_destroy Acpi-parse_ext: Slab cache still has objects [ 0.389063] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #8 [ 0.390557] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 [ 0.392000] Call Trace: [ 0.392000] ? dump_stack+0x5c/0x81 [ 0.392000] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.392000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.392000] ? acpi_os_delete_cache+0xa/0x10 [ 0.392000] ? acpi_ut_delete_caches+0x6d/0x7b [ 0.392000] ? acpi_terminate+0xa/0x14 [ 0.392000] ? acpi_init+0x2af/0x34f [ 0.392000] ? __class_create+0x4c/0x80 [ 0.392000] ? video_setup+0x7f/0x7f [ 0.392000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.392000] ? do_one_initcall+0x4e/0x1a0 [ 0.392000] ? kernel_init_freeable+0x189/0x20a [ 0.392000] ? rest_init+0xc0/0xc0 [ 0.392000] ? kernel_init+0xa/0x100 [ 0.392000] ? ret_from_fork+0x25/0x30
When early abort is occurred due to invalid ACPI information, Linux kernel terminates ACPI by calling acpi_terminate() function. The function calls acpi_ut_delete_caches() function to delete local caches (acpi_gbl_namespace_ cache, state_cache, operand_cache, ps_node_cache, ps_node_ext_cache).
But the deletion codes in acpi_ut_delete_caches() function only delete slab caches using kmem_cache_destroy() function, therefore the cache objects should be flushed before acpi_ut_delete_caches() function.
"Acpi-Parse" cache and "Acpi-ParseExt" cache are used in an AML parse function, acpi_ps_parse_loop(). The function should complete all ops using acpi_ps_complete_final_op() when an error occurs due to invalid AML codes. However, the current implementation of acpi_ps_complete_final_op() does not complete all ops when it meets some errors and this cause cache leak.
This cache leak has a security threat because an old kernel (<= 4.9) shows memory locations of kernel functions in stack dump. Some malicious users could use this information to neutralize kernel ASLR.
To fix ACPI cache leak for enhancing security, I made a patch to complete all ops unconditionally for acpi_ps_complete_final_op() function.
I hope that this patch improves the security of Linux kernel.
Thank you.
Link: https://github.com/acpica/acpica/commit/8829e70e Signed-off-by: Seunghun Han kkamagui@gmail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Link: https://patch.msgid.link/2363774.ElGaqSPkdT@rjwysocki.net Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES** This commit should be backported to stable kernel trees based on comprehensive analysis of both the commit message and code changes: ## Security Impact Analysis The commit fixes a critical **memory cache leak that poses a security threat**: - The leak exposes kernel function memory locations in stack dumps on older kernels (≤ 4.9) - This information can be exploited to neutralize **kernel ASLR** (Address Space Layout Randomization) - The researcher specifically mentions this creates a security vulnerability that malicious users could exploit ## Bug Analysis The commit addresses **"Acpi-Parse" and "Acpi-ParseExt" cache leaks** that occur during ACPI early abort scenarios: 1. **Root Cause**: In `acpi_ps_complete_final_op()`, when errors occur during AML parsing, the function would exit early without properly cleaning up all parse operations 2. **Leak Mechanism**: The original code had multiple early return paths that bypassed the cleanup loop, leaving parse cache objects unfreed 3. **Trigger Conditions**: Invalid/malicious ACPI tables causing parse errors ## Code Changes Analysis The fix in `drivers/acpi/acpica/psobject.c` is **well-contained and minimal**: **Key changes:** 1. **Eliminates early returns**: Replaces immediate `return_ACPI_STATUS()` calls with flag setting (`ascending = FALSE`) 2. **Ensures complete cleanup**: All ops are now processed through `acpi_ps_complete_this_op()` unconditionally 3. **Preserves error reporting**: Uses `return_status` to track the first/most important error while continuing cleanup 4. **Maintains logic flow**: The cleanup loop now always runs to completion, preventing cache leaks **Specific improvements:** - Lines 674-701: Instead of immediate return on `AE_CTRL_TERMINATE`, sets `ascending = FALSE` and continues - Lines 704-712: On parse failures, sets `ascending = FALSE` and continues instead of returning early - Lines 715-719: Always calls `acpi_ps_complete_this_op()` and tracks errors without early exit ## Backport Suitability Assessment **✅ Meets stable tree criteria:** 1. **Important bug fix**: Fixes memory leaks with security implications 2. **Minimal risk**: Changes are confined to error handling paths in ACPI parser 3. **Small and contained**: Only 50 lines changed in a single function 4. **No architectural changes**: Preserves existing API and behavior 5. **Clear regression prevention**: Ensures proper resource cleanup in error cases **✅ Historical precedent**: The kernel tree shows **multiple similar ACPI cache leak fixes by the same security researcher** (Seunghun Han) that were backported: - `3b2d69114fef` - "ACPICA: Namespace: fix operand cache leak" - `97f3c0a4b057` - "ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c" **✅ Risk assessment:** - **Low regression risk**: Only affects error paths during ACPI parse failures - **No functional changes**: Normal ACPI parsing behavior unchanged - **Defensive programming**: Makes cleanup more robust without changing success paths The commit represents a textbook example of a stable-tree appropriate fix: it addresses an important security-related memory leak with minimal, well-contained changes that improve robustness without introducing new functionality or architectural modifications.
drivers/acpi/acpica/psobject.c | 52 ++++++++++------------------------ 1 file changed, 15 insertions(+), 37 deletions(-)
diff --git a/drivers/acpi/acpica/psobject.c b/drivers/acpi/acpica/psobject.c index 98e5c7400e547..3ea26bbd534df 100644 --- a/drivers/acpi/acpica/psobject.c +++ b/drivers/acpi/acpica/psobject.c @@ -639,7 +639,8 @@ acpi_status acpi_ps_complete_final_op(struct acpi_walk_state *walk_state, union acpi_parse_object *op, acpi_status status) { - acpi_status status2; + acpi_status return_status = status; + u8 ascending = TRUE;
ACPI_FUNCTION_TRACE_PTR(ps_complete_final_op, walk_state);
@@ -653,7 +654,7 @@ acpi_ps_complete_final_op(struct acpi_walk_state *walk_state, op)); do { if (op) { - if (walk_state->ascending_callback != NULL) { + if (ascending && walk_state->ascending_callback != NULL) { walk_state->op = op; walk_state->op_info = acpi_ps_get_opcode_info(op->common. @@ -675,49 +676,26 @@ acpi_ps_complete_final_op(struct acpi_walk_state *walk_state, }
if (status == AE_CTRL_TERMINATE) { - status = AE_OK; - - /* Clean up */ - do { - if (op) { - status2 = - acpi_ps_complete_this_op - (walk_state, op); - if (ACPI_FAILURE - (status2)) { - return_ACPI_STATUS - (status2); - } - } - - acpi_ps_pop_scope(& - (walk_state-> - parser_state), - &op, - &walk_state-> - arg_types, - &walk_state-> - arg_count); - - } while (op); - - return_ACPI_STATUS(status); + ascending = FALSE; + return_status = AE_CTRL_TERMINATE; }
else if (ACPI_FAILURE(status)) {
/* First error is most important */
- (void) - acpi_ps_complete_this_op(walk_state, - op); - return_ACPI_STATUS(status); + ascending = FALSE; + return_status = status; } }
- status2 = acpi_ps_complete_this_op(walk_state, op); - if (ACPI_FAILURE(status2)) { - return_ACPI_STATUS(status2); + status = acpi_ps_complete_this_op(walk_state, op); + if (ACPI_FAILURE(status)) { + ascending = FALSE; + if (ACPI_SUCCESS(return_status) || + return_status == AE_CTRL_TERMINATE) { + return_status = status; + } } }
@@ -727,5 +705,5 @@ acpi_ps_complete_final_op(struct acpi_walk_state *walk_state,
} while (op);
- return_ACPI_STATUS(status); + return_ACPI_STATUS(return_status); }
From: Jerry Lv Jerry.Lv@axis.com
[ Upstream commit f16d9fb6cf03fdbdefa41a8b32ba1e57afb7ae3d ]
Multiple applications may access the battery gauge at the same time, so the gauge may be busy and EBUSY will be returned. The driver will set a flag to record the EBUSY state, and this flag will be kept until the next periodic update. When this flag is set, bq27xxx_battery_get_property() will just return ENODEV until the flag is updated.
Even if the gauge was busy during the last accessing attempt, returning ENODEV is not ideal, and can cause confusion in the applications layer.
Instead, retry accessing the I2C to update the flag is as expected, for the gauge typically recovers from busy state within a few milliseconds. If still failed to access the gauge, the real error code would be returned instead of ENODEV (as suggested by Pali Rohár).
Reviewed-by: Pali Rohár pali@kernel.org Signed-off-by: Jerry Lv Jerry.Lv@axis.com Link: https://lore.kernel.org/r/20250415-foo-fix-v2-1-5b45a395e4cc@axis.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES** This commit should be backported to stable kernel trees. ## Detailed Analysis: ### 1. **User-Visible Bug Fix** ✅ The commit fixes a clear user-visible issue where multiple applications accessing the battery gauge simultaneously receive confusing `-ENODEV` errors instead of actual battery data or proper error codes. This directly impacts battery monitoring applications and system power management tools. ### 2. **Code Changes Analysis** **bq27xxx_battery.c change:** ```c -return -ENODEV; +return di->cache.flags; ``` This single-line change stops masking real error codes with generic `-ENODEV`, allowing applications to see the actual error (like `-EBUSY`) and handle it appropriately. **bq27xxx_battery_i2c.c changes:** - Adds retry logic with up to 3 attempts when encountering `-EBUSY` - Uses standard `usleep_range(10000, 11000)` for 10ms delays - Implements proper loop structure with explicit break conditions ### 3. **Small and Contained** ✅ The changes are minimal and focused: - No architectural modifications - No changes to driver interfaces or APIs - Limited to error handling improvement within the same subsystem ### 4. **Low Regression Risk** ✅ - Uses established kernel patterns (`usleep_range`, retry counters) - No timing changes to critical paths - Battery gauges typically recover from busy state within milliseconds - Maintains backward compatibility ### 5. **Historical Precedent** ✅ Analysis of similar bq27xxx commits shows consistent backporting: - "Fix race condition" (Similar Commit #4): **YES** - Similar I2C access improvement - "After charger plug in/out wait 0.5s" (Similar Commit #5): **YES** - Similar stability fix - "make status more robust" (Similar Commit #2): **YES** - Similar robustness improvement ### 6. **Follows Kernel Conventions** ✅ - Standard I2C retry mechanisms are common in the kernel - Proper error code propagation instead of masking - Code reviewed by subsystem maintainer (Pali Rohár) - Uses kernel-standard delay functions ### 7. **System Impact** **Improves stability** by: - Preventing userspace confusion from misleading error codes - Gracefully handling concurrent access scenarios - Better error reporting for debugging - No negative side effects identified ### 8. **Risk Assessment** **Very Low Risk:** - No memory management changes - No locking mechanism modifications - Standard retry pattern with bounded attempts - Preserves all existing functionality This commit represents an ideal stable backport candidate: it fixes a real user-visible bug with minimal, well-understood code changes that follow established kernel patterns and have strong historical precedent for backporting in this driver subsystem.
drivers/power/supply/bq27xxx_battery.c | 2 +- drivers/power/supply/bq27xxx_battery_i2c.c | 13 ++++++++++++- 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c index e6c4dfdc58c47..1cfec675f82f3 100644 --- a/drivers/power/supply/bq27xxx_battery.c +++ b/drivers/power/supply/bq27xxx_battery.c @@ -1780,7 +1780,7 @@ static int bq27xxx_battery_get_property(struct power_supply *psy, mutex_unlock(&di->lock);
if (psp != POWER_SUPPLY_PROP_PRESENT && di->cache.flags < 0) - return -ENODEV; + return di->cache.flags;
switch (psp) { case POWER_SUPPLY_PROP_STATUS: diff --git a/drivers/power/supply/bq27xxx_battery_i2c.c b/drivers/power/supply/bq27xxx_battery_i2c.c index 08c7e2b4155ad..bf235d0a96033 100644 --- a/drivers/power/supply/bq27xxx_battery_i2c.c +++ b/drivers/power/supply/bq27xxx_battery_i2c.c @@ -14,6 +14,7 @@ * GNU General Public License for more details. */
+#include <linux/delay.h> #include <linux/i2c.h> #include <linux/interrupt.h> #include <linux/module.h> @@ -40,6 +41,7 @@ static int bq27xxx_battery_i2c_read(struct bq27xxx_device_info *di, u8 reg, struct i2c_msg msg[2]; u8 data[2]; int ret; + int retry = 0;
if (!client->adapter) return -ENODEV; @@ -56,7 +58,16 @@ static int bq27xxx_battery_i2c_read(struct bq27xxx_device_info *di, u8 reg, else msg[1].len = 2;
- ret = i2c_transfer(client->adapter, msg, ARRAY_SIZE(msg)); + do { + ret = i2c_transfer(client->adapter, msg, ARRAY_SIZE(msg)); + if (ret == -EBUSY && ++retry < 3) { + /* sleep 10 milliseconds when busy */ + usleep_range(10000, 11000); + continue; + } + break; + } while (1); + if (ret < 0) return ret;
From: Charan Teja Kalla quic_charante@quicinc.com
[ Upstream commit 40d3b40dce375d6f1c1dbf08d79eed3aed6c691d ]
pm_runtime_put_autosuspend() schedules a hrtimer to expire at "dev->power.timer_expires". If the hrtimer's callback, pm_suspend_timer_fn(), observes that the current time equals "dev->power.timer_expires", it unexpectedly bails out instead of proceeding with runtime suspend.
pm_suspend_timer_fn():
if (expires > 0 && expires < ktime_get_mono_fast_ns()) { dev->power.timer_expires = 0; rpm_suspend(..) }
Additionally, as ->timer_expires is not cleared, all the future auto suspend requests will not schedule hrtimer to perform auto suspend.
rpm_suspend():
if ((rpmflags & RPM_AUTO) &&...) { if (!(dev->power.timer_expires && ...) { <-- this will fail. hrtimer_start_range_ns(&dev->power.suspend_timer,...); } }
Fix this by as well checking if current time reaches the set expiration.
Co-developed-by: Patrick Daly quic_pdaly@quicinc.com Signed-off-by: Patrick Daly quic_pdaly@quicinc.com Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Link: https://patch.msgid.link/20250515064125.1211561-1-quic_charante@quicinc.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
Now, based on my analysis, let me provide the decision: **YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## Bug Analysis This is a clear **logical bug fix** in the PM runtime autosuspend mechanism that affects device power management. The issue is in the `pm_suspend_timer_fn()` function at `drivers/base/power/runtime.c:1006`. ## The Problem The current code uses a strict less-than comparison: ```c if (expires > 0 && expires < ktime_get_mono_fast_ns()) { ``` However, this creates a race condition where: 1. When `pm_runtime_put_autosuspend()` schedules an hrtimer to expire at exactly `dev->power.timer_expires` 2. If the timer callback `pm_suspend_timer_fn()` observes that the current time **equals** `dev->power.timer_expires` (not less than), it incorrectly bails out 3. The timer expires but the device is not suspended 4. Since `timer_expires` is not cleared, all future auto-suspend requests fail because the condition `if (!(dev->power.timer_expires && ...))` in `rpm_suspend()` (line 596-597) will always be false ## The Fix The fix correctly changes the comparison to `<=` (less than or equal): ```c if (expires > 0 && expires <= ktime_get_mono_fast_ns()) { ``` This ensures that the suspend is triggered when the current time equals or exceeds the expiration time. ## Why This Should Be Backported 1. **Fixes a Real User-Affecting Bug**: Devices may fail to auto-suspend, leading to increased power consumption and potential battery drain on mobile devices. 2. **Minimal Risk**: This is a one-character change (`<` to `<=`) that fixes a clear logical error. The change is extremely contained and low-risk. 3. **Critical Subsystem**: PM runtime is a critical kernel subsystem affecting all device power management. A failure here can impact system-wide power efficiency. 4. **No Architectural Changes**: This doesn't introduce new features or change architecture - it simply fixes incorrect logic. 5. **Small and Contained**: The fix is confined to a single comparison operator in one function. 6. **Clear Co-authors**: The commit shows collaboration between Patrick Daly and Charan Teja Kalla from Qualcomm, indicating this was found in real-world usage. ## Comparison with Similar Commits Looking at the historical examples: - **Similar Commit #5 (Status: YES)**: Also a PM runtime timer fix with data type correction - this had similar characteristics and was backported - The current commit follows the same pattern: critical PM runtime bug fix with minimal, contained changes ## Stable Tree Criteria Met - ✅ Fixes important bug affecting users - ✅ Relatively small and contained fix - ✅ No clear side effects beyond fixing the issue - ✅ No major architectural changes - ✅ Touches critical kernel subsystem (appropriately) - ✅ Follows stable tree rules (important bugfix, minimal risk) - ✅ Has clear explanation of the problem and solution This commit perfectly fits the stable backport criteria and should be included in stable kernel trees to ensure proper device power management functionality across all kernel versions.
drivers/base/power/runtime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index d301a6de762df..7fa231076ad5f 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -982,7 +982,7 @@ static enum hrtimer_restart pm_suspend_timer_fn(struct hrtimer *timer) * If 'expires' is after the current time, we've been called * too early. */ - if (expires > 0 && expires < ktime_get_mono_fast_ns()) { + if (expires > 0 && expires <= ktime_get_mono_fast_ns()) { dev->power.timer_expires = 0; rpm_suspend(dev, dev->power.timer_autosuspends ? (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
From: Peter Marheine pmarheine@chromium.org
[ Upstream commit 234f71555019d308c6bc6f98c78c5551cb8cd56a ]
The ACPI specification requires that battery rate is always positive, but the kernel ABI for POWER_SUPPLY_PROP_CURRENT_NOW (Documentation/ABI/testing/sysfs-class-power) specifies that it should be negative when a battery is discharging. When reporting CURRENT_NOW, massage the value to match the documented ABI.
This only changes the sign of `current_now` and not `power_now` because documentation doesn't describe any particular meaning for `power_now` so leaving `power_now` unchanged is less likely to confuse userspace unnecessarily, whereas becoming consistent with the documented ABI is worth potentially confusing clients that read `current_now`.
Signed-off-by: Peter Marheine pmarheine@chromium.org Link: https://patch.msgid.link/20250508024146.1436129-1-pmarheine@chromium.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my analysis of the commit and examination of the kernel codebase, I can now provide my assessment: **YES** This commit should be backported to stable kernel trees because: 1. **Fixes a real user-facing bug**: The commit addresses an ABI inconsistency where ACPI battery drivers were reporting current_now with the wrong sign. According to the kernel's power supply ABI documentation (Documentation/ABI/testing/sysfs-class-power), `current_now` should be negative when discharging and positive when charging, but ACPI battery drivers were reporting positive values always. 2. **Small and contained fix**: The change is minimal and well-contained. It only affects the `POWER_SUPPLY_PROP_CURRENT_NOW` case in `acpi_battery_get_property()` function in `drivers/acpi/battery.c`. The code adds a simple check to negate the current value only when: - Property requested is `POWER_SUPPLY_PROP_CURRENT_NOW` (not power_now) - Battery is in discharging state - The `acpi_battery_handle_discharging()` function confirms it's actually discharging 3. **Follows stable rules**: This is an important bugfix that corrects userspace-visible behavior to match documented ABI. Applications and battery monitoring tools rely on the documented behavior that negative current indicates discharging. 4. **Minimal regression risk**: The change is very conservative: - Only affects `current_now`, not `power_now` (as noted in commit message) - Uses existing `acpi_battery_handle_discharging()` logic to double-check the discharging state - Leaves all other battery properties unchanged 5. **Similar pattern in similar drivers**: From the historical examples provided, commits like "power: supply: bq27xxx: fix polarity of current_now" and "power: supply: axp20x_battery: properly report current when discharging" were backported with YES status for exactly the same type of issue - fixing current sign during discharge. 6. **Affects critical subsystem**: Battery reporting is crucial for power management, and incorrect current direction can confuse userspace tools and potentially impact power management decisions. The commit carefully addresses the ABI compliance issue while minimizing risk by only changing the sign for `current_now` during confirmed discharging states, making it an ideal candidate for stable backporting.
drivers/acpi/battery.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c index cf853e985d6d9..a5e120eca7f33 100644 --- a/drivers/acpi/battery.c +++ b/drivers/acpi/battery.c @@ -266,10 +266,23 @@ static int acpi_battery_get_property(struct power_supply *psy, break; case POWER_SUPPLY_PROP_CURRENT_NOW: case POWER_SUPPLY_PROP_POWER_NOW: - if (battery->rate_now == ACPI_BATTERY_VALUE_UNKNOWN) + if (battery->rate_now == ACPI_BATTERY_VALUE_UNKNOWN) { ret = -ENODEV; - else - val->intval = battery->rate_now * 1000; + break; + } + + val->intval = battery->rate_now * 1000; + /* + * When discharging, the current should be reported as a + * negative number as per the power supply class interface + * definition. + */ + if (psp == POWER_SUPPLY_PROP_CURRENT_NOW && + (battery->state & ACPI_BATTERY_STATE_DISCHARGING) && + acpi_battery_handle_discharging(battery) + == POWER_SUPPLY_STATUS_DISCHARGING) + val->intval = -val->intval; + break; case POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN: case POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN:
linux-stable-mirror@lists.linaro.org