Re: [PATCH] block: Fix a race between tag iteration and hardware queue changes

3 Apr 2019

Hi Bart,
On 4/3/19 6:32 AM, Bart Van Assche wrote:
...
Since the callback function called by blk_mq_queue_tag_busy_iter()
may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues()
is not sufficient to wait until blk_mq_queue_tag_busy_iter() has
finished. Instead of making __blk_mq_update_nr_hw_queues() wait until
q->q_usage_counter == 0 is globally visible, do not iterate over tags
if the request queue is frozen.>
Cc: Christoph Hellwig hch@infradead.org
Cc: Hannes Reinecke hare@suse.com
Cc: James Smart james.smart@broadcom.com
Cc: Ming Lei ming.lei@redhat.com
Cc: Jianchao Wang jianchao.w.wang@oracle.com
Cc: Keith Busch keith.busch@intel.com
Cc: Dongli Zhang dongli.zhang@oracle.com
Cc: stable@vger.kernel.org
Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # v4.19.
Signed-off-by: Bart Van Assche bvanassche@acm.org

block/blk-mq-tag.c | 10 +++++-----
 block/blk-mq.c     |  5 +----
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a4931fc7be8a..89f479634b4d 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 
   /*
    * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx

* while the queue is frozen. So we can use q_usage_counter to avoid


* racing with it. __blk_mq_update_nr_hw_queues() uses


* synchronize_rcu() to ensure this function left the critical section


* below.




* while the queue is frozen. Hold q_usage_counter to serialize


* __blk_mq_update_nr_hw_queues() against this function.

*/
 if (!percpu_ref_tryget(&q->q_usage_counter))
  return;





if (atomic_read(&q->mq_freeze_depth))
goto out;

queue_for_each_hw_ctx(q, hctx, i) {
struct blk_mq_tags *tags = hctx->tags;

@@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
   		bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
   	bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
   }
+out:
   blk_queue_exit(q);
If callback function goes to sleep, we would not be able to reach at
blk_queue_exit() to dec q->q_usage_counter.
On the other side of __blk_mq_update_nr_hw_queues(), once blk_mq_freeze_queue()
is passed, __PERCPU_REF_DEAD is set and q->q_usage_counter should always be 0.
Otherwise, blk_mq_freeze_queue_wait() would not move forward.
I think we would even not be able to call the callback (which might sleep) if
blk_mq_freeze_queue() is already passed?
Why we need extra check of q->mq_freeze_depth? Is it an optimization?
Please let me know if my understanding is incorrect.
Thank you very much!
...
}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 652d0c6d5945..f9fc1536408d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 
   list_for_each_entry(q, &set->tag_list, tag_set_list)
   	blk_mq_freeze_queue(q);

/*
* Sync with blk_mq_queue_tag_busy_iter.


*/


synchronize_rcu();


/*
Switch IO scheduler to 'none', cleaning up the data associated
with the previous scheduler. We will switch back once we are done



Dongli Zhang

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] block: Fix a race between tag iteration and hardware queue changes