Hi Bart,
On 4/3/19 6:32 AM, Bart Van Assche wrote:
Since the callback function called by blk_mq_queue_tag_busy_iter() may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues() is not sufficient to wait until blk_mq_queue_tag_busy_iter() has finished. Instead of making __blk_mq_update_nr_hw_queues() wait until q->q_usage_counter == 0 is globally visible, do not iterate over tags if the request queue is frozen.> Cc: Christoph Hellwig hch@infradead.org Cc: Hannes Reinecke hare@suse.com Cc: James Smart james.smart@broadcom.com Cc: Ming Lei ming.lei@redhat.com Cc: Jianchao Wang jianchao.w.wang@oracle.com Cc: Keith Busch keith.busch@intel.com Cc: Dongli Zhang dongli.zhang@oracle.com Cc: stable@vger.kernel.org Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # v4.19. Signed-off-by: Bart Van Assche bvanassche@acm.org
block/blk-mq-tag.c | 10 +++++----- block/blk-mq.c | 5 +---- 2 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index a4931fc7be8a..89f479634b4d 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, /* * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
* while the queue is frozen. So we can use q_usage_counter to avoid
* racing with it. __blk_mq_update_nr_hw_queues() uses
* synchronize_rcu() to ensure this function left the critical section
* below.
* while the queue is frozen. Hold q_usage_counter to serialize
*/ if (!percpu_ref_tryget(&q->q_usage_counter)) return;* __blk_mq_update_nr_hw_queues() against this function.
- if (atomic_read(&q->mq_freeze_depth))
queue_for_each_hw_ctx(q, hctx, i) { struct blk_mq_tags *tags = hctx->tags;goto out;
@@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, bt_for_each(hctx, &tags->breserved_tags, fn, priv, true); bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false); } +out: blk_queue_exit(q);
If callback function goes to sleep, we would not be able to reach at blk_queue_exit() to dec q->q_usage_counter.
On the other side of __blk_mq_update_nr_hw_queues(), once blk_mq_freeze_queue() is passed, __PERCPU_REF_DEAD is set and q->q_usage_counter should always be 0. Otherwise, blk_mq_freeze_queue_wait() would not move forward.
I think we would even not be able to call the callback (which might sleep) if blk_mq_freeze_queue() is already passed?
Why we need extra check of q->mq_freeze_depth? Is it an optimization?
Please let me know if my understanding is incorrect.
Thank you very much!
} diff --git a/block/blk-mq.c b/block/blk-mq.c index 652d0c6d5945..f9fc1536408d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, list_for_each_entry(q, &set->tag_list, tag_set_list) blk_mq_freeze_queue(q);
- /*
* Sync with blk_mq_queue_tag_busy_iter.
*/
- synchronize_rcu();
- /*
- Switch IO scheduler to 'none', cleaning up the data associated
- with the previous scheduler. We will switch back once we are done
Dongli Zhang