Fix the following use-after-free complaint triggered by blktests nvme/004:
BUG: KASAN: user-memory-access in blk_mq_complete_request_remote+0xac/0x350 Read of size 4 at addr 0000607bd1835943 by task kworker/13:1/460 Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop] Call Trace: show_stack+0x52/0x58 dump_stack_lvl+0x49/0x5e print_report.cold+0x36/0x1e2 kasan_report+0xb9/0xf0 __asan_load4+0x6b/0x80 blk_mq_complete_request_remote+0xac/0x350 nvme_loop_queue_response+0x1df/0x275 [nvme_loop] __nvmet_req_complete+0x132/0x4f0 [nvmet] nvmet_req_complete+0x15/0x40 [nvmet] nvmet_execute_io_connect+0x18a/0x1f0 [nvmet] nvme_loop_execute_work+0x20/0x30 [nvme_loop] process_one_work+0x56e/0xa70 worker_thread+0x2d1/0x640 kthread+0x183/0x1c0 ret_from_fork+0x1f/0x30
Cc: stable@vger.kernel.org Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Bart Van Assche bvanassche@acm.org --- drivers/nvme/target/core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index a1345790005f..7f4083cf953a 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -735,6 +735,8 @@ static void nvmet_set_error(struct nvmet_req *req, u16 status)
static void __nvmet_req_complete(struct nvmet_req *req, u16 status) { + struct nvmet_ns *ns = req->ns; + if (!req->sq->sqhd_disabled) nvmet_update_sq_head(req); req->cqe->sq_id = cpu_to_le16(req->sq->qid); @@ -745,9 +747,9 @@ static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
trace_nvmet_req_complete(req);
- if (req->ns) - nvmet_put_namespace(req->ns); req->ops->queue_response(req); + if (ns) + nvmet_put_namespace(ns); }
void nvmet_req_complete(struct nvmet_req *req, u16 status)
On 8/13/22 00:03, Bart Van Assche wrote:
Fix the following use-after-free complaint triggered by blktests nvme/004:
BUG: KASAN: user-memory-access in blk_mq_complete_request_remote+0xac/0x350 Read of size 4 at addr 0000607bd1835943 by task kworker/13:1/460 Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop] Call Trace: show_stack+0x52/0x58 dump_stack_lvl+0x49/0x5e print_report.cold+0x36/0x1e2 kasan_report+0xb9/0xf0 __asan_load4+0x6b/0x80 blk_mq_complete_request_remote+0xac/0x350 nvme_loop_queue_response+0x1df/0x275 [nvme_loop] __nvmet_req_complete+0x132/0x4f0 [nvmet] nvmet_req_complete+0x15/0x40 [nvmet] nvmet_execute_io_connect+0x18a/0x1f0 [nvmet] nvme_loop_execute_work+0x20/0x30 [nvme_loop] process_one_work+0x56e/0xa70 worker_thread+0x2d1/0x640 kthread+0x183/0x1c0 ret_from_fork+0x1f/0x30
Cc: stable@vger.kernel.org Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Bart Van Assche bvanassche@acm.org
drivers/nvme/target/core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index a1345790005f..7f4083cf953a 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -735,6 +735,8 @@ static void nvmet_set_error(struct nvmet_req *req, u16 status) static void __nvmet_req_complete(struct nvmet_req *req, u16 status) {
- struct nvmet_ns *ns = req->ns;
- if (!req->sq->sqhd_disabled) nvmet_update_sq_head(req); req->cqe->sq_id = cpu_to_le16(req->sq->qid);
@@ -745,9 +747,9 @@ static void __nvmet_req_complete(struct nvmet_req *req, u16 status) trace_nvmet_req_complete(req);
- if (req->ns)
req->ops->queue_response(req);nvmet_put_namespace(req->ns);
- if (ns)
nvmet_put_namespace(ns);
Why did the put change position? I'm not exactly clear what was used-after-free here..
} void nvmet_req_complete(struct nvmet_req *req, u16 status)
On 8/14/22 04:45, Sagi Grimberg wrote:
On 8/13/22 00:03, Bart Van Assche wrote:
@@ -745,9 +747,9 @@ static void __nvmet_req_complete(struct nvmet_req *req, u16 status) trace_nvmet_req_complete(req); - if (req->ns) - nvmet_put_namespace(req->ns); req->ops->queue_response(req); + if (ns) + nvmet_put_namespace(ns);
Why did the put change position? I'm not exactly clear what was used-after-free here..
Hi Sagi,
Is my understanding correct that the NVMe target namespace owns the block device `req` is associated with and hence that the namespace reference count must only be dropped after dereferencing the `req` pointer has finished?
This is what I found in the NVMe target code: * nvmet_put_namespace() decreases ns->ref. * Dropping the last ns->ref causes nvmet_destroy_namespace() to be called. That function completes ns->disable_done. * nvmet_ns_disable() waits on that completion and calls nvmet_ns_dev_disable(). * For a block device, nvmet_ns_dev_disable() calls blkdev_put(). * The last blkdev_put() call calls disk_release(). * disk_release() calls blk_put_queue(). * The last blk_put_queue() call calls blk_release_queue(). * blk_release_queue() frees struct request_queue. * blk_mq_complete_request_remote() dereferences req->q.
Thanks,
Bart.
@@ -745,9 +747,9 @@ static void __nvmet_req_complete(struct nvmet_req *req, u16 status) trace_nvmet_req_complete(req); - if (req->ns) - nvmet_put_namespace(req->ns); req->ops->queue_response(req); + if (ns) + nvmet_put_namespace(ns);
Why did the put change position? I'm not exactly clear what was used-after-free here..
Hi Sagi,
Is my understanding correct that the NVMe target namespace owns the block device `req` is associated with and hence that the namespace reference count must only be dropped after dereferencing the `req` pointer has finished?
This is what I found in the NVMe target code:
- nvmet_put_namespace() decreases ns->ref.
- Dropping the last ns->ref causes nvmet_destroy_namespace() to be
called. That function completes ns->disable_done.
- nvmet_ns_disable() waits on that completion and calls
nvmet_ns_dev_disable().
- For a block device, nvmet_ns_dev_disable() calls blkdev_put().
- The last blkdev_put() call calls disk_release().
- disk_release() calls blk_put_queue().
- The last blk_put_queue() call calls blk_release_queue().
- blk_release_queue() frees struct request_queue.
- blk_mq_complete_request_remote() dereferences req->q.
The analysis is correct, specifically for loop transport, but it is not the same request_queue...
blk_mq_complete_request_remote references the initiator block device created from the namespace, but nvmet_ns_dev_disable puts the backend blockdev, which is a separate blockdev...
Maybe I'm misunderstanding your point?
nvmet_req *req, u16 status) trace_nvmet_req_complete(req); - if (req->ns) - nvmet_put_namespace(req->ns); req->ops->queue_response(req); + if (ns) + nvmet_put_namespace(ns);
Why did the put change position? I'm not exactly clear what was used-after-free here..
Hi Sagi,
Is my understanding correct that the NVMe target namespace owns the block device `req` is associated with and hence that the namespace reference count must only be dropped after dereferencing the `req` pointer has finished?
This is what I found in the NVMe target code:
- nvmet_put_namespace() decreases ns->ref.
- Dropping the last ns->ref causes nvmet_destroy_namespace() to be
called. That function completes ns->disable_done.
- nvmet_ns_disable() waits on that completion and calls
nvmet_ns_dev_disable().
- For a block device, nvmet_ns_dev_disable() calls blkdev_put().
- The last blkdev_put() call calls disk_release().
- disk_release() calls blk_put_queue().
- The last blk_put_queue() call calls blk_release_queue().
- blk_release_queue() frees struct request_queue.
- blk_mq_complete_request_remote() dereferences req->q.
The analysis is correct, specifically for loop transport, but it is not the same request_queue...
blk_mq_complete_request_remote references the initiator block device created from the namespace, but nvmet_ns_dev_disable puts the backend blockdev, which is a separate blockdev...
Maybe I'm misunderstanding your point?
And specifically your stack trace references nvmet_execute_io_connect which is not addressed to a namespace... This I think that the request_queue is actually the ctrl->connect_q, which is used for io_connect commands.
It is unclear to me how this one ended up with a use-after-free... Maybe we have a stray io_connect command lingering after we teardown the controller?
linux-stable-mirror@lists.linaro.org