On 4/4/24 23:17, Oliver Crumrine wrote:
In his patch to enable zerocopy networking for io_uring, Pavel Begunkov specifically disabled REQ_F_CQE_SKIP, as (at least from my understanding) the userspace program wouldn't receive the IORING_CQE_F_MORE flag in the result value.
No. IORING_CQE_F_MORE means there will be another CQE from this request, so a single CQE without IORING_CQE_F_MORE is trivially fine.
The problem is the semantics, because by suppressing the first CQE you're loosing the result value. You might rely on WAITALL as other sends and "fail" (in terms of io_uring) the request in case of a partial send posting 2 CQEs, but that's not a great way and it's getting userspace complicated pretty easily.
In short, it was left out for later because there is a better way to implement it, but it should be done carefully
To fix this, instead of keeping track of how many CQEs have been received, and subtracting notifs from that, programs can keep track of
That's a benchmark way of doing it, more realistically it'd be more like
event_loop() { cqe = wait_cqe(); struct req *r = (struct req *)cqe->user_data; r->callback(r, cqe); }
send_zc_callback(req, cqe) { if (cqe->flags & F_MORE) { // don't free the req // we should wait for another CQE ... } }
how many SQEs they have issued, and if a CQE is returned with an error, they can simply subtract from how many notifs they expect to receive.
The design specifically untangles those two notions, i.e. there can be a notification even when the main CQE fails (ret<0). It's safer this way, even though AFAIK relying on errors would be fine with current users (TCP/UDP).
Signed-off-by: Oliver Crumrine ozlinuxc@gmail.com
io_uring/net.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/io_uring/net.c b/io_uring/net.c index 1e7665ff6ef7..822f49809b68 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1044,9 +1044,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) return -EINVAL;
- /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
- if (req->flags & REQ_F_CQE_SKIP)
return -EINVAL;
notif = zc->notif = io_alloc_notif(ctx); if (!notif) @@ -1342,7 +1339,8 @@ void io_sendrecv_fail(struct io_kiocb *req) req->cqe.res = sr->done_io; if ((req->flags & REQ_F_NEED_CLEANUP) &&
(req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC))
(req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC) &&
req->cqe.flags |= IORING_CQE_F_MORE; }!(req->flags & REQ_F_CQE_SKIP))