On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
- blk_mq_start_request(req); if (unlikely(nsock->pending && nsock->pending != req)) { blk_mq_requeue_request(req, true); ret = 0;
(replying to an e-mail from seven months ago)
Hello Josef,
Are you aware that the nbd driver is one of the very few block drivers that calls blk_mq_requeue_request() after a request has been started? I think that can lead to the block layer core to undesired behavior, e.g. that the timeout handler fires concurrently with a request being reinstered. Can you or a colleague have a look at this? I would like to add the following code to the block layer core and I think that the nbd driver would trigger this warning:
void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list) { + WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE); + __blk_mq_requeue_request(rq);
Thanks,
Bart.
On Thu, May 17, 2018 at 06:21:40PM +0000, Bart Van Assche wrote:
On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
- blk_mq_start_request(req); if (unlikely(nsock->pending && nsock->pending != req)) { blk_mq_requeue_request(req, true); ret = 0;
(replying to an e-mail from seven months ago)
Hello Josef,
Are you aware that the nbd driver is one of the very few block drivers that calls blk_mq_requeue_request() after a request has been started? I think that can lead to the block layer core to undesired behavior, e.g. that the timeout handler fires concurrently with a request being reinstered. Can you or a colleague have a look at this? I would like to add the following code to the block layer core and I think that the nbd driver would trigger this warning:
void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list) {
WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE);
__blk_mq_requeue_request(rq);
Yup I can tell you why, on 4.11 where I originally did this work __blk_mq_requeue_request() did this
static void __blk_mq_requeue_request(struct request *rq) { struct request_queue *q = rq->q;
trace_block_rq_requeue(q, rq); rq_qos_requeue(q, &rq->issue_stat); blk_mq_sched_requeue_request(rq);
if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) { if (q->dma_drain_size && blk_rq_bytes(rq)) rq->nr_phys_segments--; } }
So it was clearing the started part when it did the requeue. If that's not what I'm supposed to be doing anymore then I can send a patch to fix it. What is supposed to be done if I did already do blk_mq_start_request, because I can avoid doing the start until after that chunk of code, but there's a part further down that needs to have start done before we reach it, so I'll have to do whatever the special thing is now there. Thanks,
Josef
On Thu, 2018-05-17 at 14:41 -0400, Josef Bacik wrote:
Yup I can tell you why, on 4.11 where I originally did this work __blk_mq_requeue_request() did this
static void __blk_mq_requeue_request(struct request *rq) { struct request_queue *q = rq->q;
trace_block_rq_requeue(q, rq); rq_qos_requeue(q, &rq->issue_stat); blk_mq_sched_requeue_request(rq); if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) { if (q->dma_drain_size && blk_rq_bytes(rq)) rq->nr_phys_segments--; }
}
So it was clearing the started part when it did the requeue. If that's not what I'm supposed to be doing anymore then I can send a patch to fix it. What is supposed to be done if I did already do blk_mq_start_request, because I can avoid doing the start until after that chunk of code, but there's a part further down that needs to have start done before we reach it, so I'll have to do whatever the special thing is now there. Thanks,
Hello Josef,
After having had a closer look I think calling blk_mq_start_request() before blk_mq_requeue_request() is fine anyway. The v4.16 block layer core namely defers timeout processing until after .queue_rq() has returned. So the timeout code shouldn't see the requests for which both blk_mq_start_request() and blk_mq_requeue_request() are called from inside .queue_rq(). I will make sure this behavior is preserved.
Bart.
linux-stable-mirror@lists.linaro.org