On 9/3/24 2:16 AM, Muchun Song wrote:
Supposing first scenario with a virtio_blk driver.
CPU0 CPU1
blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_request_bypass_insert() blk_mq_start_stopped_hw_queues() /* Add IO request to dispatch list */ 1) store blk_mq_start_stopped_hw_queue() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) if (!blk_mq_hctx_has_pending()) 4) load return return blk_mq_sched_dispatch_requests() blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load if (blk_mq_hctx_stopped()) return return __blk_mq_sched_dispatch_requests() __blk_mq_sched_dispatch_requests()
Supposing another scenario.
CPU0 CPU1
blk_mq_requeue_work() /* Add IO request to dispatch list */ 1) store virtblk_done() blk_mq_run_hw_queues()/blk_mq_delay_run_hw_queues() blk_mq_start_stopped_hw_queues() if (blk_mq_hctx_stopped()) 2) load blk_mq_start_stopped_hw_queue() continue clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue()/blk_mq_delay_run_hw_queue() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests()
Both scenarios are similar, the full memory barrier should be inserted between
- and 2), as well as between 3) and 4) to make sure that either CPU0 sees
BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU will not rerun the hardware queue causing starvation of the request.
The easy way to fix it is to add the essential full memory barrier into helper of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue is not stopped most of the time), we only insert the barrier into the slow path. Actually, only slow path needs to care about missing of dispatching the request to the low-level device driver.
Again, this is way too wide, it's unreadable.
Patch looks fine, though.