6.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit fde04c7e2775feb0746301e0ef86a04d3598c3fe ]
io_queue_deferred() is not tolerant to spurious calls not completing some requests. You can have an inflight drain-marked request and another request that came after and got queued into the drain list. Now, if io_queue_deferred() is called before the first request completes, it'll check the 2nd req with req_need_defer(), find that there is no drain flag set, and queue it for execution.
To make io_queue_deferred() work, it should at least check sequences for the first request, and then we need also need to check if there is another drain request creating another bubble.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/972bde11b7d4ef25b3f5e3fd34f80e4d2aa345b8.174678871... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/io_uring.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index edda31a15c6e6..9266d4f2016ad 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -537,18 +537,30 @@ void io_req_queue_iowq(struct io_kiocb *req) io_req_task_work_add(req); }
+static bool io_drain_defer_seq(struct io_kiocb *req, u32 seq) +{ + struct io_ring_ctx *ctx = req->ctx; + + return seq + READ_ONCE(ctx->cq_extra) != ctx->cached_cq_tail; +} + static __cold noinline void io_queue_deferred(struct io_ring_ctx *ctx) { + bool drain_seen = false, first = true; + spin_lock(&ctx->completion_lock); while (!list_empty(&ctx->defer_list)) { struct io_defer_entry *de = list_first_entry(&ctx->defer_list, struct io_defer_entry, list);
- if (req_need_defer(de->req, de->seq)) + drain_seen |= de->req->flags & REQ_F_IO_DRAIN; + if ((drain_seen || first) && io_drain_defer_seq(de->req, de->seq)) break; + list_del_init(&de->list); io_req_task_queue(de->req); kfree(de); + first = false; } spin_unlock(&ctx->completion_lock); }