On 6/10/24 23:15, Jason Gunthorpe wrote:
On Mon, Jun 10, 2024 at 08:20:08PM +0100, Pavel Begunkov wrote:
On 6/10/24 16:16, David Ahern wrote:
There is no reason you shouldn't be able to use your fast io_uring completion and lifecycle flow with DMABUF backed memory. Those are not widly different things and there is good reason they should work together.
Let's not mix up devmem TCP and dmabuf specifically, as I see it your question was concerning the latter: "... DMABUF memory registered through Mina's mechanism". io_uring's zcrx can trivially get dmabuf support in future, as mentioned it's mostly the setup side. ABI, buffer workflow and some details is a separate issue, and I don't see how further integration aside from what we're already sharing is beneficial, on opposite it'll complicate things.
Again, I am talking about composability here, duplicating the DMABUF stuff into io_uring is not composable, it is just duplicating things.
Ok, then registering, say, a dmabuf via devmem TCP and then using it in io_uring. Let's say we make devmem TCP API to be able to register a dmabuf without using it, from where io_uring can take ownership over it and use in the flow. And I strongly believe the same memory region/dmabuf should never be used by both at the same time and hence lifetime of any such memory should be exclusively bound to io_uring.
That leaves the user api, where to add memory you need to create a netlink socket and pass everything through it, which is an extra step, and then letting know io_uring that it can use the memory, not forgetting to eject it from netlink. That's not a good api as far as it goes with io_uring.
I don't think slight duplicating of registration is a problem when the upside is much cleaner API. Internals, however, can be easily shared. We can even say that the net stack should provide helpers like init_page_pool_from_dmabuf_fd() and now allow poking into related bits aside from it (initialising net_iov / etc.).
It does not match the view that there should be two distinct layers here, one that provides the pages and one that manages the lifecycle. As HCH pushes for pages either come from the allocator and get to use the struct folio or the come from a dmabuf and they don't. That is it, the only two choices.
The iouring stuff is trying to confuse the source of the pages with the lifecycle - which is surely convenient, but is why Christoph is opposing it.