On Mon, Mar 17, 2025 at 03:20:21PM +0800, Muchun Song wrote:
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests()
The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained.
Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song muchun.song@linux.dev Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe axboe@kernel.dk (cherry picked from commit 6bda857bcbb86fb9d0e54fbef93a093d51172acc)
For obvious reasons we can not take a change for an older stable kernel tree, and NOT a newer one.
Please resubmit the backports for ALL relevant stable kernel branches. You do not want to upgrade and have a regression.
thanks,
greg k-h