From: Keith Busch kbusch@kernel.org
[ Upstream commit 03b3bcd319b3ab5182bc9aaa0421351572c78ac0 ]
The namespaces can access the controller's admin request_queue, and stale references on the namespaces may exist after tearing down the controller. Ensure the admin request_queue is active by moving the controller's 'put' to after all controller references have been released to ensure no one is can access the request_queue. This fixes a reported use-after-free bug:
BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0 Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287 CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G E 6.13.2-ga1582f1a031e #15 Tainted: [E]=UNSIGNED_MODULE Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025 Call Trace: <TASK> dump_stack_lvl+0x4f/0x60 print_report+0xc4/0x620 ? _raw_spin_lock_irqsave+0x70/0xb0 ? _raw_read_unlock_irqrestore+0x30/0x30 ? blk_queue_enter+0x41c/0x4a0 kasan_report+0xab/0xe0 ? blk_queue_enter+0x41c/0x4a0 blk_queue_enter+0x41c/0x4a0 ? __irq_work_queue_local+0x75/0x1d0 ? blk_queue_start_drain+0x70/0x70 ? irq_work_queue+0x18/0x20 ? vprintk_emit.part.0+0x1cc/0x350 ? wake_up_klogd_work_func+0x60/0x60 blk_mq_alloc_request+0x2b7/0x6b0 ? __blk_mq_alloc_requests+0x1060/0x1060 ? __switch_to+0x5b7/0x1060 nvme_submit_user_cmd+0xa9/0x330 nvme_user_cmd.isra.0+0x240/0x3f0 ? force_sigsegv+0xe0/0xe0 ? nvme_user_cmd64+0x400/0x400 ? vfs_fileattr_set+0x9b0/0x9b0 ? cgroup_update_frozen_flag+0x24/0x1c0 ? cgroup_leave_frozen+0x204/0x330 ? nvme_ioctl+0x7c/0x2c0 blkdev_ioctl+0x1a8/0x4d0 ? blkdev_common_ioctl+0x1930/0x1930 ? fdget+0x54/0x380 __x64_sys_ioctl+0x129/0x190 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f765f703b0b Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003 R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60 </TASK>
Reported-by: Casey Chen cachen@purestorage.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis
### 1. Commit Message and Bug Analysis The commit "nvme: fix admin request_queue lifetime" addresses a critical **Use-After-Free (UAF)** vulnerability in the NVMe subsystem. The commit message includes a KASAN stack trace showing a crash in `blk_queue_enter` triggered by `nvme_submit_user_cmd`.
- **The Issue:** A race condition exists during NVMe controller teardown. The admin request queue (`ctrl->admin_q`) is destroyed and its reference dropped early in the teardown process (inside `nvme_remove_admin_tag_set`). However, references to the controller object itself (`nvme_ctrl`) can persist, for example, if userspace holds open file descriptors to namespaces. If a user issues an admin command via ioctl after the queue is freed but before the controller is fully released, the kernel attempts to access freed memory, resulting in a panic. - **Real-World Impact:** This was reported by Pure Storage (`Reported- by: Casey Chen`) and verified with a KASAN splat, confirming it affects production environments and is triggerable by userspace operations.
### 2. Deep Code Research & History - **Origin of Regression:** Detailed investigation reveals this issue stems from the v6.2 development cycle (specifically commit `2b3f056f72e5`, "blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queue"). That change shifted the responsibility of dropping queue references to individual drivers. The NVMe driver placed this `blk_put_queue` call in `nvme_remove_admin_tag_set`, which proved to be too early in the lifecycle. - **The Fix Mechanism:** The patch moves the `blk_put_queue(ctrl->admin_q)` call from `nvme_remove_admin_tag_set()` to `nvme_free_ctrl()`. - `nvme_free_ctrl()` is the specific `release` callback for the controller's device structure. It is guaranteed to run **only** when the last reference to the controller is dropped. - By moving the cleanup here, the patch ensures the `admin_q` remains valid for exactly as long as the controller structure exists, perfectly aligning their lifetimes and eliminating the UAF window. - **Code Context:** The change is minimal (removing 1 line, adding 2 lines) and self-contained within `drivers/nvme/host/core.c`.
### 3. Security Assessment - **Severity: High.** Use-After-Free bugs are significant security vulnerabilities that can lead to data corruption, kernel panics, or potentially arbitrary code execution/privilege escalation. - **Trigger:** The bug is reachable from userspace via standard ioctl paths (`nvme_user_cmd`), making it a viable target for local attackers or a source of instability during hotplug/shutdown events.
### 4. Stable Kernel Rules Compliance - **Fixes a real bug:** Yes, documented UAF with stack trace. - **Obviously correct:** Yes, associating resource cleanup with the object destructor is the standard correct pattern for refcounted objects. - **Small and contained:** Only modifies 3 lines in one file. - **No new features:** Pure bug fix. - **Mainline Status:** Reviewed by multiple senior subsystem maintainers (Hellwig, Reinecke, Lei, Kulkarni).
### 5. Risk Assessment - **Regression Risk: Very Low.** The fix simply extends the lifetime of a memory object to match its owner. It does not change the logic of *how* the queue is used, only *when* it is freed. This is defensively safer than the current code. - **Backportability:** The patch applies cleanly to stable kernels 6.6.y, 6.1.y, and others where the regression exists (kernels >= v6.2).
### Conclusion This is a textbook candidate for stable backporting. It fixes a severe memory safety bug (UAF) that causes crashes in production systems. The fix is surgical, well-understood, and restores correct lifetime management logic.
**YES**
drivers/nvme/host/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 5714d49932822..28c598008124c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4896,7 +4896,6 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl) */ nvme_stop_keep_alive(ctrl); blk_mq_destroy_queue(ctrl->admin_q); - blk_put_queue(ctrl->admin_q); if (ctrl->ops->flags & NVME_F_FABRICS) { blk_mq_destroy_queue(ctrl->fabrics_q); blk_put_queue(ctrl->fabrics_q); @@ -5040,6 +5039,8 @@ static void nvme_free_ctrl(struct device *dev) container_of(dev, struct nvme_ctrl, ctrl_device); struct nvme_subsystem *subsys = ctrl->subsys;
+ if (ctrl->admin_q) + blk_put_queue(ctrl->admin_q); if (!subsys || ctrl->instance != subsys->instance) ida_free(&nvme_instance_ida, ctrl->instance); nvme_free_cels(ctrl);