nvmet_req *req, u16 status) trace_nvmet_req_complete(req); - if (req->ns) - nvmet_put_namespace(req->ns); req->ops->queue_response(req); + if (ns) + nvmet_put_namespace(ns);
Why did the put change position? I'm not exactly clear what was used-after-free here..
Hi Sagi,
Is my understanding correct that the NVMe target namespace owns the block device `req` is associated with and hence that the namespace reference count must only be dropped after dereferencing the `req` pointer has finished?
This is what I found in the NVMe target code:
- nvmet_put_namespace() decreases ns->ref.
- Dropping the last ns->ref causes nvmet_destroy_namespace() to be
called. That function completes ns->disable_done.
- nvmet_ns_disable() waits on that completion and calls
nvmet_ns_dev_disable().
- For a block device, nvmet_ns_dev_disable() calls blkdev_put().
- The last blkdev_put() call calls disk_release().
- disk_release() calls blk_put_queue().
- The last blk_put_queue() call calls blk_release_queue().
- blk_release_queue() frees struct request_queue.
- blk_mq_complete_request_remote() dereferences req->q.
The analysis is correct, specifically for loop transport, but it is not the same request_queue...
blk_mq_complete_request_remote references the initiator block device created from the namespace, but nvmet_ns_dev_disable puts the backend blockdev, which is a separate blockdev...
Maybe I'm misunderstanding your point?
And specifically your stack trace references nvmet_execute_io_connect which is not addressed to a namespace... This I think that the request_queue is actually the ctrl->connect_q, which is used for io_connect commands.
It is unclear to me how this one ended up with a use-after-free... Maybe we have a stray io_connect command lingering after we teardown the controller?