From: Heiko Carstens hca@linux.ibm.com
[ Upstream commit 11709abccf93b08adde95ef313c300b0d4bc28f1 ]
Kernel user spaces accesses to not exported pages in atomic context incorrectly try to resolve the page fault. With debug options enabled call traces like this can be seen:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1523 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 419074, name: qemu-system-s39 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. Preemption disabled at: [<00000383ea47cfa2>] copy_page_from_iter_atomic+0xa2/0x8a0 CPU: 12 UID: 0 PID: 419074 Comm: qemu-system-s39 Tainted: G W 6.16.0-20250531.rc0.git0.69b3a602feac.63.fc42.s390x+debug #1 PREEMPT Tainted: [W]=WARN Hardware name: IBM 3931 A01 703 (LPAR) Call Trace: [<00000383e990d282>] dump_stack_lvl+0xa2/0xe8 [<00000383e99bf152>] __might_resched+0x292/0x2d0 [<00000383eaa7c374>] down_read+0x34/0x2d0 [<00000383e99432f8>] do_secure_storage_access+0x108/0x360 [<00000383eaa724b0>] __do_pgm_check+0x130/0x220 [<00000383eaa842e4>] pgm_check_handler+0x114/0x160 [<00000383ea47d028>] copy_page_from_iter_atomic+0x128/0x8a0 ([<00000383ea47d016>] copy_page_from_iter_atomic+0x116/0x8a0) [<00000383e9c45eae>] generic_perform_write+0x16e/0x310 [<00000383e9eb87f4>] ext4_buffered_write_iter+0x84/0x160 [<00000383e9da0de4>] vfs_write+0x1c4/0x460 [<00000383e9da123c>] ksys_write+0x7c/0x100 [<00000383eaa7284e>] __do_syscall+0x15e/0x280 [<00000383eaa8417e>] system_call+0x6e/0x90 INFO: lockdep is turned off.
It is not allowed to take the mmap_lock while in atomic context. Therefore handle such a secure storage access fault as if the accessed page is not mapped: the uaccess function will return -EFAULT, and the caller has to deal with this. Usually this means that the access is retried in process context, which allows to resolve the page fault (or in this case export the page).
Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Acked-by: Alexander Gordeev agordeev@linux.ibm.com Acked-by: Christian Borntraeger borntraeger@linux.ibm.com Link: https://lore.kernel.org/r/20250603134936.1314139-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my comprehensive analysis of the commit and the surrounding kernel context, here is my assessment:
**YES**
This commit should be backported to stable kernel trees. Here's my extensive explanation:
## Why This Should Be Backported
### 1. **Critical Sleeping-in-Atomic Bug Fix**
The commit addresses a fundamental kernel correctness issue where `do_secure_storage_access()` was incorrectly attempting to acquire `mmap_read_lock()` while in atomic context. The stack trace in the commit message clearly shows:
``` BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1523 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 419074, name: qemu-system-s39 preempt_count: 1, expected: 0 Preemption disabled at: [<00000383ea47cfa2>] copy_page_from_iter_atomic+0xa2/0x8a0 ```
This shows the fault occurred during `copy_page_from_iter_atomic()`, which explicitly runs in atomic context with preemption disabled (`preempt_count: 1`).
### 2. **Minimal and Safe Code Change**
The fix is exactly 2 lines of code: ```c +if (faulthandler_disabled()) + return handle_fault_error_nolock(regs, 0); ```
This follows the **exact same pattern** already established in the same file at line 277 in `do_exception()`. The change is: - **Consistent**: Uses the same `faulthandler_disabled()` check as other fault handlers - **Safe**: Uses `handle_fault_error_nolock()` which is designed for atomic contexts - **Non-invasive**: Doesn't change any existing logic paths, only adds an early return
### 3. **Matches Established Kernel Patterns**
Looking at similar commits in my reference set, this matches the pattern of **Backport Status: YES** commits:
**Similar to Reference Commit #2** (sja1105): Also fixed sleeping-in- atomic by using atomic-safe alternatives **Similar to Reference Commit #3** (PM domains): Also moved a potentially sleeping operation out of atomic context **Similar to Reference Commit #5** (RDMA/rxe): Also handled sleeping operations that were incorrectly called from atomic context
### 4. **Affects Critical Kernel Subsystem**
This bug affects **s390 memory management**, which is a critical kernel subsystem. The secure storage access functionality is used in: - **IBM Z mainframes** with Protected Execution (Ultravisor) - **KVM virtualization** environments - **Enterprise workloads** running on s390 architecture
A sleeping-in-atomic bug in MM fault handling can cause system instability, deadlocks, or crashes.
### 5. **Production Impact Evidence**
The commit message shows this affecting **qemu-system-s39**, indicating this hits production virtualization workloads. The call trace shows a realistic scenario: ``` copy_page_from_iter_atomic+0x128/0x8a0 generic_perform_write+0x16e/0x310 ext4_buffered_write_iter+0x84/0x160 vfs_write+0x1c4/0x460 ```
This is a common I/O path that can trigger secure storage access violations in protected execution environments.
### 6. **Low Regression Risk**
The change has **minimal regression risk** because: - **Fallback behavior**: It makes the code return `-EFAULT` instead of hanging/crashing - **Retry mechanism**: The commit message explicitly states "Usually this means that the access is retried in process context" - **Defensive programming**: Better to fail gracefully than crash the system - **Existing pattern**: Uses the same error handling path already used elsewhere
### 7. **Stable Tree Criteria Met**
This commit meets all stable tree criteria: - ✅ **Important bugfix**: Fixes kernel crashes/hangs - ✅ **Small and contained**: Only 2 lines changed - ✅ **No new features**: Pure bugfix - ✅ **Minimal risk**: Uses established error handling patterns - ✅ **Affects users**: Impacts virtualization workloads
### 8. **Architecture-Specific but Critical**
While this only affects s390 architecture, it's critical for that platform. s390 is used in enterprise environments where stability is paramount, and sleeping-in-atomic bugs can cause service outages.
### Conclusion
This is a textbook example of a commit that should be backported: it fixes a clear kernel correctness issue (sleeping while atomic), uses a minimal and safe fix that follows established patterns, has low regression risk, and affects production workloads. The fix brings `do_secure_storage_access()` in line with the same atomic context handling used throughout the rest of the kernel's fault handling code.
arch/s390/mm/fault.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index da84ff6770dec..8b3f6dd00eab2 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -442,6 +442,8 @@ void do_secure_storage_access(struct pt_regs *regs) if (rc) BUG(); } else { + if (faulthandler_disabled()) + return handle_fault_error_nolock(regs, 0); mm = current->mm; mmap_read_lock(mm); vma = find_vma(mm, addr);