[PATCH AUTOSEL 6.17-6.6] nvme: fix admin request_queue lifetime

24 Nov 2025

From: Keith Busch kbusch@kernel.org
[ Upstream commit 03b3bcd319b3ab5182bc9aaa0421351572c78ac0 ]
The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist after tearing down the
controller. Ensure the admin request_queue is active by moving the
controller's 'put' to after all controller references have been released
to ensure no one is can access the request_queue. This fixes a reported
use-after-free bug:
BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e #15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>
Reported-by: Casey Chen cachen@purestorage.com
Reviewed-by: Christoph Hellwig hch@lst.de
Reviewed-by: Hannes Reinecke hare@suse.de
Reviewed-by: Ming Lei ming.lei@redhat.com
Reviewed-by: Chaitanya Kulkarni kch@nvidia.com
Signed-off-by: Keith Busch kbusch@kernel.org
Signed-off-by: Sasha Levin sashal@kernel.org
---
LLM Generated explanations, may be completely bogus:
## Analysis
### 1. Commit Message and Bug Analysis
The commit "nvme: fix admin request_queue lifetime" addresses a critical
**Use-After-Free (UAF)** vulnerability in the NVMe subsystem. The commit
message includes a KASAN stack trace showing a crash in
`blk_queue_enter` triggered by `nvme_submit_user_cmd`.
- **The Issue:** A race condition exists during NVMe controller
  teardown. The admin request queue (`ctrl->admin_q`) is destroyed and
  its reference dropped early in the teardown process (inside
  `nvme_remove_admin_tag_set`). However, references to the controller
  object itself (`nvme_ctrl`) can persist, for example, if userspace
  holds open file descriptors to namespaces. If a user issues an admin
  command via ioctl after the queue is freed but before the controller
  is fully released, the kernel attempts to access freed memory,
  resulting in a panic.
- **Real-World Impact:** This was reported by Pure Storage (`Reported-
  by: Casey Chen`) and verified with a KASAN splat, confirming it
  affects production environments and is triggerable by userspace
  operations.
### 2. Deep Code Research & History
- **Origin of Regression:** Detailed investigation reveals this issue
  stems from the v6.2 development cycle (specifically commit
  `2b3f056f72e5`, "blk-mq: move the call to blk_put_queue out of
  blk_mq_destroy_queue"). That change shifted the responsibility of
  dropping queue references to individual drivers. The NVMe driver
  placed this `blk_put_queue` call in `nvme_remove_admin_tag_set`, which
  proved to be too early in the lifecycle.
- **The Fix Mechanism:** The patch moves the
  `blk_put_queue(ctrl->admin_q)` call from `nvme_remove_admin_tag_set()`
  to `nvme_free_ctrl()`.
    - `nvme_free_ctrl()` is the specific `release` callback for the
      controller's device structure. It is guaranteed to run **only**
      when the last reference to the controller is dropped.
    - By moving the cleanup here, the patch ensures the `admin_q`
      remains valid for exactly as long as the controller structure
      exists, perfectly aligning their lifetimes and eliminating the UAF
      window.
- **Code Context:** The change is minimal (removing 1 line, adding 2
  lines) and self-contained within `drivers/nvme/host/core.c`.
### 3. Security Assessment
- **Severity: High.** Use-After-Free bugs are significant security
  vulnerabilities that can lead to data corruption, kernel panics, or
  potentially arbitrary code execution/privilege escalation.
- **Trigger:** The bug is reachable from userspace via standard ioctl
  paths (`nvme_user_cmd`), making it a viable target for local attackers
  or a source of instability during hotplug/shutdown events.
### 4. Stable Kernel Rules Compliance
- **Fixes a real bug:** Yes, documented UAF with stack trace.
- **Obviously correct:** Yes, associating resource cleanup with the
  object destructor is the standard correct pattern for refcounted
  objects.
- **Small and contained:** Only modifies 3 lines in one file.
- **No new features:** Pure bug fix.
- **Mainline Status:** Reviewed by multiple senior subsystem maintainers
  (Hellwig, Reinecke, Lei, Kulkarni).
### 5. Risk Assessment
- **Regression Risk: Very Low.** The fix simply extends the lifetime of
  a memory object to match its owner. It does not change the logic of
  *how* the queue is used, only *when* it is freed. This is defensively
  safer than the current code.
- **Backportability:** The patch applies cleanly to stable kernels
  6.6.y, 6.1.y, and others where the regression exists (kernels >=
  v6.2).
### Conclusion
This is a textbook candidate for stable backporting. It fixes a severe
memory safety bug (UAF) that causes crashes in production systems. The
fix is surgical, well-understood, and restores correct lifetime
management logic.
**YES**
drivers/nvme/host/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5714d49932822..28c598008124c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4896,7 +4896,6 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
     */
    nvme_stop_keep_alive(ctrl);
    blk_mq_destroy_queue(ctrl->admin_q);
-	blk_put_queue(ctrl->admin_q);
    if (ctrl->ops->flags & NVME_F_FABRICS) {
    	blk_mq_destroy_queue(ctrl->fabrics_q);
    	blk_put_queue(ctrl->fabrics_q);
@@ -5040,6 +5039,8 @@ static void nvme_free_ctrl(struct device *dev)
    	container_of(dev, struct nvme_ctrl, ctrl_device);
    struct nvme_subsystem *subsys = ctrl->subsys;
+	if (ctrl->admin_q)
+		blk_put_queue(ctrl->admin_q);
    if (!subsys || ctrl->instance != subsys->instance)
    	ida_free(&nvme_instance_ida, ctrl->instance);
    nvme_free_cels(ctrl);
-- 
2.51.0



    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH AUTOSEL 6.17-6.6] nvme: fix admin request_queue lifetime