On Wed, Mar 19, 2025 at 10:50:18PM -0700, Christoph Hellwig wrote:
On Wed, Mar 19, 2025 at 08:32:19AM -0700, Joe Damato wrote:
See the docs on MSG_ZEROCOPY [1], but in short when a user app calls sendmsg and passes MSG_ZEROCOPY a completion notification is added to the error queue. The user app can poll for these to find out when the TX has completed and the buffer it passed to the kernel can be overwritten.
Yikes. That's not just an ugly interface, but something entirely specific to sockets and incompatible with all other asynchronous I/O interfaces.
I don't really know but I would assume it was introduced, as Jens said, as a work-around long before other completion mechanisms existed.
and why aren't you simply plugging this into io_uring and generate a CQE so that it works like all other asynchronous operations?
I linked to the iouring work that Pavel did in the cover letter. Please take a look.
Please write down what matters in the cover letter, including all the important tradeoffs.
OK, I will enhance the cover letter for the next submission. I had originally thought I'd submit something officially, but I think I'll probably submit another RFC with some of the changes I've made based on the discussion with Jens.
Namely: dropping sendfile2 completely and plumbing the bits through for splice. I'll wait a bit to hear what Jens thinks about the SO_ZEROCOPY thing (basically: if a network socket has that option set, maybe the existing sendfile can generate error queue completions without needing a separate system call?).
I agree overall that sendfile2 or sendmsg2 or whatever else could likely be built differently now that better interfaces and mechanisms exist in the kernel - but I still think there's room to improve existing system calls so they can be used safely.