From: Pavel Begunkov asml.silence@gmail.com
We have a couple of problems, first reports of unexpected link breakage for reads when cqe->res indicates that the IO was done in full. The reason here is partial IO with retries.
TL;DR; we compare the result in __io_complete_rw_common() against req->cqe.res, but req->cqe.res doesn't store the full length but rather the length left to be done. So, when we pass the full corrected result via kiocb_done() -> __io_complete_rw_common(), it fails.
The second problem is that we don't try to correct res in io_complete_rw(), which, for instance, might be a problem for O_DIRECT but when a prefix of data was cached in the page cache. We also definitely don't want to pass a corrected result into io_rw_done().
The fix here is to leave __io_complete_rw_common() alone, always pass not corrected result into it and fix it up as the last step just before actually finishing the I/O.
Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://github.com/axboe/liburing/issues/643 Reported-by: Beld Zhang beldzhang@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk --- io_uring/rw.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/io_uring/rw.c b/io_uring/rw.c index 1babd77da79c..1e18a44adcf5 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -206,6 +206,20 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res) return false; }
+static inline unsigned io_fixup_rw_res(struct io_kiocb *req, unsigned res) +{ + struct io_async_rw *io = req->async_data; + + /* add previously done IO, if any */ + if (req_has_async_data(req) && io->bytes_done > 0) { + if (res < 0) + res = io->bytes_done; + else + res += io->bytes_done; + } + return res; +} + static void io_complete_rw(struct kiocb *kiocb, long res) { struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); @@ -213,7 +227,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res)
if (__io_complete_rw_common(req, res)) return; - io_req_set_res(req, res, 0); + io_req_set_res(req, io_fixup_rw_res(req, res), 0); req->io_task_work.func = io_req_task_complete; io_req_task_work_add(req); } @@ -240,22 +254,14 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res) static int kiocb_done(struct io_kiocb *req, ssize_t ret, unsigned int issue_flags) { - struct io_async_rw *io = req->async_data; struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); - - /* add previously done IO, if any */ - if (req_has_async_data(req) && io->bytes_done > 0) { - if (ret < 0) - ret = io->bytes_done; - else - ret += io->bytes_done; - } + unsigned final_ret = io_fixup_rw_res(req, ret);
if (req->flags & REQ_F_CUR_POS) req->file->f_pos = rw->kiocb.ki_pos; if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) { if (!__io_complete_rw_common(req, ret)) { - io_req_set_res(req, req->cqe.res, + io_req_set_res(req, final_ret, io_put_kbuf(req, issue_flags)); return IOU_OK; } @@ -268,7 +274,7 @@ static int kiocb_done(struct io_kiocb *req, ssize_t ret, if (io_resubmit_prep(req)) io_req_task_queue_reissue(req); else - io_req_task_queue_fail(req, ret); + io_req_task_queue_fail(req, final_ret); } return IOU_ISSUE_SKIP_COMPLETE; }
On 9/19/22 2:17 PM, Avadhut Naik wrote:
From: Pavel Begunkov asml.silence@gmail.com
We have a couple of problems, first reports of unexpected link breakage for reads when cqe->res indicates that the IO was done in full. The reason here is partial IO with retries.
TL;DR; we compare the result in __io_complete_rw_common() against req->cqe.res, but req->cqe.res doesn't store the full length but rather the length left to be done. So, when we pass the full corrected result via kiocb_done() -> __io_complete_rw_common(), it fails.
The second problem is that we don't try to correct res in io_complete_rw(), which, for instance, might be a problem for O_DIRECT but when a prefix of data was cached in the page cache. We also definitely don't want to pass a corrected result into io_rw_done().
The fix here is to leave __io_complete_rw_common() alone, always pass not corrected result into it and fix it up as the last step just before actually finishing the I/O.
I'm confused by this email, why is it being sent? And what are the 2-3/3 patches?
And while this one should certainly go to stable, also note that:
commit 62bb0647b14646fa6c9aa25ecdf67ad18f13523c Author: Pavel Begunkov asml.silence@gmail.com Date: Tue Sep 13 13:21:23 2022 +0100
io_uring/rw: fix error'ed retry return values
exists in Linus's tree and should go in alongside the parent as it fixes the parameter type.
Hi,
Please ignore this email. It has been sent by mistake. Wont happen again.
Sincerest apologies for the spam.
Thanks and Regards, Avadhut Naik
On 9/19/2022 4:11 PM, Jens Axboe wrote:
On 9/19/22 2:17 PM, Avadhut Naik wrote:
From: Pavel Begunkov asml.silence@gmail.com
We have a couple of problems, first reports of unexpected link breakage for reads when cqe->res indicates that the IO was done in full. The reason here is partial IO with retries.
TL;DR; we compare the result in __io_complete_rw_common() against req->cqe.res, but req->cqe.res doesn't store the full length but rather the length left to be done. So, when we pass the full corrected result via kiocb_done() -> __io_complete_rw_common(), it fails.
The second problem is that we don't try to correct res in io_complete_rw(), which, for instance, might be a problem for O_DIRECT but when a prefix of data was cached in the page cache. We also definitely don't want to pass a corrected result into io_rw_done().
The fix here is to leave __io_complete_rw_common() alone, always pass not corrected result into it and fix it up as the last step just before actually finishing the I/O.
I'm confused by this email, why is it being sent? And what are the 2-3/3 patches?
And while this one should certainly go to stable, also note that:
commit 62bb0647b14646fa6c9aa25ecdf67ad18f13523c Author: Pavel Begunkov asml.silence@gmail.com Date: Tue Sep 13 13:21:23 2022 +0100
io_uring/rw: fix error'ed retry return values
exists in Linus's tree and should go in alongside the parent as it fixes the parameter type.
linux-stable-mirror@lists.linaro.org