The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 6bda857bcbb86fb9d0e54fbef93a093d51172acc # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2024120342-monsoon-wildcat-d0a1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bda857bcbb86fb9d0e54fbef93a093d51172acc Mon Sep 17 00:00:00 2001 From: Muchun Song muchun.song@linux.dev Date: Mon, 14 Oct 2024 17:29:33 +0800 Subject: [PATCH] block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests()
The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained.
Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song muchun.song@linux.dev Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe axboe@kernel.dk
diff --git a/block/blk-mq.c b/block/blk-mq.c index 5deb9dffca0a..bb4ee2380dce 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2227,6 +2227,24 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) } EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
+static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx) +{ + bool need_run; + + /* + * When queue is quiesced, we may be switching io scheduler, or + * updating nr_hw_queues, or other things, and we can't run queue + * any more, even blk_mq_hctx_has_pending() can't be called safely. + * + * And queue will be rerun in blk_mq_unquiesce_queue() if it is + * quiesced. + */ + __blk_mq_run_dispatch_ops(hctx->queue, false, + need_run = !blk_queue_quiesced(hctx->queue) && + blk_mq_hctx_has_pending(hctx)); + return need_run; +} + /** * blk_mq_run_hw_queue - Start to run a hardware queue. * @hctx: Pointer to the hardware queue to run. @@ -2247,20 +2265,23 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING);
- /* - * When queue is quiesced, we may be switching io scheduler, or - * updating nr_hw_queues, or other things, and we can't run queue - * any more, even __blk_mq_hctx_has_pending() can't be called safely. - * - * And queue will be rerun in blk_mq_unquiesce_queue() if it is - * quiesced. - */ - __blk_mq_run_dispatch_ops(hctx->queue, false, - need_run = !blk_queue_quiesced(hctx->queue) && - blk_mq_hctx_has_pending(hctx)); + need_run = blk_mq_hw_queue_need_run(hctx); + if (!need_run) { + unsigned long flags;
- if (!need_run) - return; + /* + * Synchronize with blk_mq_unquiesce_queue(), because we check + * if hw queue is quiesced locklessly above, we need the use + * ->queue_lock to make sure we see the up-to-date status to + * not miss rerunning the hw queue. + */ + spin_lock_irqsave(&hctx->queue->queue_lock, flags); + need_run = blk_mq_hw_queue_need_run(hctx); + spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); + + if (!need_run) + return; + }
if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) { blk_mq_delay_run_hw_queue(hctx, 0);
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests()
The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained.
Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song muchun.song@linux.dev Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe axboe@kernel.dk --- block/blk-mq.c | 42 ++++++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index a15c665a77100..3db8cc6b51fb1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1610,16 +1610,7 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) } EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
-/** - * blk_mq_run_hw_queue - Start to run a hardware queue. - * @hctx: Pointer to the hardware queue to run. - * @async: If we want to run the queue asynchronously. - * - * Check if the request queue is not in a quiesced state and if there are - * pending requests to be sent. If this is true, run the queue to send requests - * to hardware. - */ -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) +static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx) { int srcu_idx; bool need_run; @@ -1637,6 +1628,37 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) blk_mq_hctx_has_pending(hctx); hctx_unlock(hctx, srcu_idx);
+ return need_run; +} + +/** + * blk_mq_run_hw_queue - Start to run a hardware queue. + * @hctx: Pointer to the hardware queue to run. + * @async: If we want to run the queue asynchronously. + * + * Check if the request queue is not in a quiesced state and if there are + * pending requests to be sent. If this is true, run the queue to send requests + * to hardware. + */ +void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) +{ + bool need_run; + + need_run = blk_mq_hw_queue_need_run(hctx); + if (!need_run) { + unsigned long flags; + + /* + * Synchronize with blk_mq_unquiesce_queue(), because we check + * if hw queue is quiesced locklessly above, we need the use + * ->queue_lock to make sure we see the up-to-date status to + * not miss rerunning the hw queue. + */ + spin_lock_irqsave(&hctx->queue->queue_lock, flags); + need_run = blk_mq_hw_queue_need_run(hctx); + spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); + } + if (need_run) __blk_mq_delay_run_hw_queue(hctx, async, 0); }
On Mon, Mar 17, 2025 at 11:29:34AM +0800, Muchun Song wrote:
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests()
The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained.
Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song muchun.song@linux.dev Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe axboe@kernel.dk
block/blk-mq.c | 42 ++++++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 10 deletions(-)
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ⚠️ Found matching upstream commit but patch is missing proper reference to it
Found matching upstream commit: 6bda857bcbb86fb9d0e54fbef93a093d51172acc
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (different SHA1: 2094bd1b5225) 6.6.y | Present (different SHA1: 679b1874eba7) 6.1.y | Not found
Note: The patch differs from the upstream commit: --- 1: 6bda857bcbb86 < -: ------------- block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding -: ------------- > 1: 211b2b8efdf06 block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-5.15.y | Success | Success |
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests()
The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained.
Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song muchun.song@linux.dev Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe axboe@kernel.dk (cherry picked from commit 6bda857bcbb86fb9d0e54fbef93a093d51172acc) --- block/blk-mq.c | 42 ++++++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index a15c665a77100..3db8cc6b51fb1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1610,16 +1610,7 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) } EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
-/** - * blk_mq_run_hw_queue - Start to run a hardware queue. - * @hctx: Pointer to the hardware queue to run. - * @async: If we want to run the queue asynchronously. - * - * Check if the request queue is not in a quiesced state and if there are - * pending requests to be sent. If this is true, run the queue to send requests - * to hardware. - */ -void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) +static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx) { int srcu_idx; bool need_run; @@ -1637,6 +1628,37 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) blk_mq_hctx_has_pending(hctx); hctx_unlock(hctx, srcu_idx);
+ return need_run; +} + +/** + * blk_mq_run_hw_queue - Start to run a hardware queue. + * @hctx: Pointer to the hardware queue to run. + * @async: If we want to run the queue asynchronously. + * + * Check if the request queue is not in a quiesced state and if there are + * pending requests to be sent. If this is true, run the queue to send requests + * to hardware. + */ +void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) +{ + bool need_run; + + need_run = blk_mq_hw_queue_need_run(hctx); + if (!need_run) { + unsigned long flags; + + /* + * Synchronize with blk_mq_unquiesce_queue(), because we check + * if hw queue is quiesced locklessly above, we need the use + * ->queue_lock to make sure we see the up-to-date status to + * not miss rerunning the hw queue. + */ + spin_lock_irqsave(&hctx->queue->queue_lock, flags); + need_run = blk_mq_hw_queue_need_run(hctx); + spin_unlock_irqrestore(&hctx->queue->queue_lock, flags); + } + if (need_run) __blk_mq_delay_run_hw_queue(hctx, async, 0); }
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ⚠️ Found matching upstream commit but patch is missing proper reference to it
Found matching upstream commit: 6bda857bcbb86fb9d0e54fbef93a093d51172acc
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (different SHA1: 2094bd1b5225) 6.6.y | Present (different SHA1: 679b1874eba7) 6.1.y | Not found
Note: The patch differs from the upstream commit: --- 1: 6bda857bcbb86 < -: ------------- block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding -: ------------- > 1: e5851a818c868 block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-5.15.y | Success | Success |
linux-stable-mirror@lists.linaro.org