Any request queue data structure may change while a queue is frozen. Hence make sure that blk_mq_run_hw_queues() does not access any hw queue while a request queue is frozen.
After blk_cleanup_queue() has marked a queue as dead it is no longer safe to access the hardware queue data structures. This patch avoids that blk_mq_run_hw_queues() crashes when called during or after blk_cleanup_queue() has freed the hardware queues. This patch is a variant of a patch posted by Hannes Reinecke ("[PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues "). This patch is similar in nature to commit c246e80d8673 ("block: Avoid that request_fn is invoked on a dead queue"; v3.8). An example of a crash that is fixed by this patch:
BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30 Call Trace: [<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90 [<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120 [<ffffffff81098cb4>] process_one_work+0x154/0x410 [<ffffffff81099896>] worker_thread+0x116/0x4a0 [<ffffffff8109edb9>] kthread+0xc9/0xe0 [<ffffffff81619b05>] ret_from_fork+0x55/0x80
Cc: Christoph Hellwig hch@infradead.org Cc: Hannes Reinecke hare@suse.com Cc: James Smart james.smart@broadcom.com Cc: Ming Lei ming.lei@redhat.com Cc: Jianchao Wang jianchao.w.wang@oracle.com Cc: Dongli Zhang dongli.zhang@oracle.com Cc: stable@vger.kernel.org Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17. Reported-by: James Smart james.smart@broadcom.com Signed-off-by: Bart Van Assche bvanassche@acm.org --- block/blk-mq.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c index 3ff3d7b49969..652d0c6d5945 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) struct blk_mq_hw_ctx *hctx; int i;
+ /* + * Do not run any hardware queues if the queue is frozen or if a + * concurrent blk_cleanup_queue() call is removing any data + * structures used by this function. + */ + if (!percpu_ref_tryget(&q->q_usage_counter)) + return; queue_for_each_hw_ctx(q, hctx, i) { if (blk_mq_hctx_stopped(hctx)) continue;
blk_mq_run_hw_queue(hctx, async); } + percpu_ref_put(&q->q_usage_counter); } EXPORT_SYMBOL(blk_mq_run_hw_queues);
On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote:
Any request queue data structure may change while a queue is frozen. Hence make sure that blk_mq_run_hw_queues() does not access any hw queue while a request queue is frozen.
After blk_cleanup_queue() has marked a queue as dead it is no longer safe to access the hardware queue data structures. This patch avoids that blk_mq_run_hw_queues() crashes when called during or after blk_cleanup_queue() has freed the hardware queues. This patch is a variant of a patch posted by Hannes Reinecke ("[PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues "). This patch is similar in nature to commit c246e80d8673 ("block: Avoid that request_fn is invoked on a dead queue"; v3.8). An example of a crash that is fixed by this patch:
BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30 Call Trace: [<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90 [<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120 [<ffffffff81098cb4>] process_one_work+0x154/0x410 [<ffffffff81099896>] worker_thread+0x116/0x4a0 [<ffffffff8109edb9>] kthread+0xc9/0xe0 [<ffffffff81619b05>] ret_from_fork+0x55/0x80
Cc: Christoph Hellwig hch@infradead.org Cc: Hannes Reinecke hare@suse.com Cc: James Smart james.smart@broadcom.com Cc: Ming Lei ming.lei@redhat.com Cc: Jianchao Wang jianchao.w.wang@oracle.com Cc: Dongli Zhang dongli.zhang@oracle.com Cc: stable@vger.kernel.org Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17. Reported-by: James Smart james.smart@broadcom.com Signed-off-by: Bart Van Assche bvanassche@acm.org
block/blk-mq.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c index 3ff3d7b49969..652d0c6d5945 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) struct blk_mq_hw_ctx *hctx; int i;
- /*
* Do not run any hardware queues if the queue is frozen or if a
* concurrent blk_cleanup_queue() call is removing any data
* structures used by this function.
*/
- if (!percpu_ref_tryget(&q->q_usage_counter))
queue_for_each_hw_ctx(q, hctx, i) { if (blk_mq_hctx_stopped(hctx)) continue;return;
blk_mq_run_hw_queue(hctx, async); }
- percpu_ref_put(&q->q_usage_counter);
} EXPORT_SYMBOL(blk_mq_run_hw_queues);
I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put() in the fast path if we simply release all hctx resource in hctx's release handler by the following patch:
https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.c...
Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in scsi_end_request().
Thanks, Ming
On Tue, 2019-04-02 at 08:53 +0800, Ming Lei wrote:
On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote:
diff --git a/block/blk-mq.c b/block/blk-mq.c index 3ff3d7b49969..652d0c6d5945 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) struct blk_mq_hw_ctx *hctx; int i;
- /*
* Do not run any hardware queues if the queue is frozen or if a
* concurrent blk_cleanup_queue() call is removing any data
* structures used by this function.
*/
- if (!percpu_ref_tryget(&q->q_usage_counter))
queue_for_each_hw_ctx(q, hctx, i) { if (blk_mq_hctx_stopped(hctx)) continue;return;
blk_mq_run_hw_queue(hctx, async); }
- percpu_ref_put(&q->q_usage_counter);
} EXPORT_SYMBOL(blk_mq_run_hw_queues);
I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put() in the fast path if we simply release all hctx resource in hctx's release handler by the following patch:
https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.c...
Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in scsi_end_request().
The above approach has the advantages of being easy to review and to maintain.
Patch "[PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler" makes the block layer more complicated because it introduces a new state for hardware queues: block driver cleanup has happened (set->ops->exit_hctx(...)) but the hardware queues are still in use by the block layer core.
Let's see what other reviewers think.
Bart.
On Tue, Apr 02, 2019 at 08:44:10AM -0700, Bart Van Assche wrote:
On Tue, 2019-04-02 at 08:53 +0800, Ming Lei wrote:
On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote:
diff --git a/block/blk-mq.c b/block/blk-mq.c index 3ff3d7b49969..652d0c6d5945 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) struct blk_mq_hw_ctx *hctx; int i;
- /*
* Do not run any hardware queues if the queue is frozen or if a
* concurrent blk_cleanup_queue() call is removing any data
* structures used by this function.
*/
- if (!percpu_ref_tryget(&q->q_usage_counter))
queue_for_each_hw_ctx(q, hctx, i) { if (blk_mq_hctx_stopped(hctx)) continue;return;
blk_mq_run_hw_queue(hctx, async); }
- percpu_ref_put(&q->q_usage_counter);
} EXPORT_SYMBOL(blk_mq_run_hw_queues);
I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put() in the fast path if we simply release all hctx resource in hctx's release handler by the following patch:
https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.c...
Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in scsi_end_request().
The above approach has the advantages of being easy to review and to maintain.
Patch "[PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler" makes the block layer more complicated because it introduces a new state for hardware queues: block driver cleanup has happened (set->ops->exit_hctx(...)) but
We are done with driver after blk_freeze_queue() and blk_sync_queue(), then call .exit_hctx() to say good bye with driver, I don't see it causes any issue.
the hardware queues are still in use by the block layer core.
Block layer has the correct in-memory state to work well, and no driver activity is involved too.
Thanks, Ming
linux-stable-mirror@lists.linaro.org