Re: [PATCH] blk-mq: Avoid that a request queue stalls when restarting a shared hctx

24 Jul 2018

      On 07/23/2018 11:50 PM, Bart Van Assche wrote:
...
The patch below fixes queue stalling when shared hctx marked for restart
(BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero.  The
root cause is that hctxs are shared between queues, but 'shared_hctx_restart'
The blk_mq_hw_ctx structure is also per request_queue
Please refer to blk_mq_init_allocated_queue -> blk_mq_realloc_hw_ctxs
...
belongs to the particular queue, which in fact may not need to be restarted,
thus we return from blk_mq_sched_restart() and leave shared hctx of another
queue never restarted.
The fix is to make shared_hctx_restart counter belong not to the queue, but
to tags, thereby counter will reflect real number of shared hctx needed to
be restarted.
During tests 1 hctx (set->nr_hw_queues) was used and all stalled requests
were noticed in dd->fifo_list of mq-deadline scheduler.
Seeming possible sequence of events:

Request A of queue A is inserted into dd->fifo_list of the scheduler.

Request B of queue A bypasses scheduler and goes directly to
hctx->dispatch.

Request C of queue B is inserted.

blk_mq_sched_dispatch_requests() is invoked, since hctx->dispatch is not
empty (request B is in the list) hctx is only marked for for next restart
and request A is left in a list (see comment "So it's best to leave them
there for as long as we can. Mark the hw queue as needing a restart in
that case." in blk-mq-sched.c)

Eventually request B is completed/freed and blk_mq_sched_restart() is
called, but by chance hctx from queue B is chosen for restart and request C
gets a chance to be dispatched.

Request C is just inserted into queue B. If there is no mark restart there,
it will not be chosen.
blk_mq_sched_restart_hctx
if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
    	return false;
If blk_mq_sched_restart choose queue B, one of its hctx must have SCHED_RESTART flag,
and q->shared_hctx_restart must not be zero.
...

Eventually request C is completed/freed and blk_mq_sched_restart() is
called, but shared_hctx_restart for queue B is zero and we return without
attempt to restart hctx from queue A, thus request A is stuck forever.

But stalling queue is not the only one problem with blk_mq_sched_restart().
My tests show that those loops thru all queues and hctxs can be very costly,
even with shared_hctx_restart counter, which aims to fix performance issue.
Currently, SCHED_RESTART is always set when there is reqs on hctx->dispatch list in
blk_mq_sched_dispatch_requests. And no driver tag case is the main reason for hctx->dispatch
is not empty when there is io scheduler.
Therefore, most of time, blk_mq_sched_restart will be invoked further for no driver tag case.
For non-share-tag case, it will wakeup the hctx.
But for shared-tag case, it is unnecessary, because the sbitmap_queue wakeup hook will work
and hctx_may_queue will avoid starvation of other ones.
Therefore, the costly loop through the queues and hctxs is unnecessary most of time.
Thanks
Jianchao

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] blk-mq: Avoid that a request queue stalls when restarting a shared hctx