Pavel Begunkov wrote:
On 4/4/24 23:17, Oliver Crumrine wrote:
In his patch to enable zerocopy networking for io_uring, Pavel Begunkov specifically disabled REQ_F_CQE_SKIP, as (at least from my understanding) the userspace program wouldn't receive the IORING_CQE_F_MORE flag in the result value.
No. IORING_CQE_F_MORE means there will be another CQE from this request, so a single CQE without IORING_CQE_F_MORE is trivially fine.
The problem is the semantics, because by suppressing the first CQE you're loosing the result value. You might rely on WAITALL
That's already happening with io_send.
as other sends and "fail" (in terms of io_uring) the request in case of a partial send posting 2 CQEs, but that's not a great way and it's getting userspace complicated pretty easily.
In short, it was left out for later because there is a better way to implement it, but it should be done carefully
Maybe we could put the return values in the notifs? That would be a discrepancy between io_send and io_send_zc, though.
To fix this, instead of keeping track of how many CQEs have been received, and subtracting notifs from that, programs can keep track of
That's a benchmark way of doing it, more realistically it'd be more like
event_loop() { cqe = wait_cqe(); struct req *r = (struct req *)cqe->user_data; r->callback(r, cqe);
send_zc_callback(req, cqe) { if (cqe->flags & F_MORE) { // don't free the req // we should wait for another CQE ... } }
how many SQEs they have issued, and if a CQE is returned with an error, they can simply subtract from how many notifs they expect to receive.
The design specifically untangles those two notions, i.e. there can be a notification even when the main CQE fails (ret<0). It's safer this way, even though AFAIK relying on errors would be fine with current users (TCP/UDP).
Signed-off-by: Oliver Crumrine ozlinuxc@gmail.com
io_uring/net.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/io_uring/net.c b/io_uring/net.c index 1e7665ff6ef7..822f49809b68 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1044,9 +1044,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) return -EINVAL;
/* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
if (req->flags & REQ_F_CQE_SKIP)
return -EINVAL;
notif = zc->notif = io_alloc_notif(ctx); if (!notif)
@@ -1342,7 +1339,8 @@ void io_sendrecv_fail(struct io_kiocb *req) req->cqe.res = sr->done_io;
if ((req->flags & REQ_F_NEED_CLEANUP) &&
(req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC))
(req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC) &&
req->cqe.flags |= IORING_CQE_F_MORE; }!(req->flags & REQ_F_CQE_SKIP))
-- Pavel Begunkov