On Tue, May 7, 2024 at 9:55 AM Pavel Begunkov asml.silence@gmail.com wrote:
On 5/7/24 17:23, Christoph Hellwig wrote:
On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote:
even in tree if you give them enough rope, and they should not have that rope when the only sensible options are page/folio based kernel memory (incuding large/huge folios) and dmabuf.
I believe there is at least one deep confusion here, considering you previously mentioned Keith's pre-mapping patches. The "hooks" are not that about in what format you pass memory, it's arguably the least interesting part for page pool, more or less it'd circulate whatever is given. It's more of how to have a better control over buffer lifetime and implement a buffer pool passing data to users and empty buffers back.
Isn't that more or less exactly what dmabuf is? Why do you need another almost dma-buf thing for another project?
That's the exact point I've been making since the last round of the series. We don't need to reinvent dmabuf poorly in every subsystem, but instead fix the odd parts in it and make it suitable for everyone.
Someone would need to elaborate how dma-buf is like that addition to page pool infra.
I think I understand what Jason is requesting here, and I'll take a shot at elaborating. AFAICT what he's saying is technically feasible and addresses the nack while giving you the uapi you want. It just requires a bit (a lot?) of work on your end unfortunately.
CONFIG_UDMABUF takes in a memfd, converts it to a dmabuf, and returns it to userspace. See udmabuf_create().
I think what Jason is saying here, is that you can write similar code to udmabuf_creat() that takes in a io_uring memory region, and converts it to a dmabuf inside the kernel.
I haven't looked at your series yet too closely (sorry!), but I assume you currently have a netlink API that binds an io_uring memory region to the NIC rx-queue page_pool, right? That netlink API would need to be changed to:
1. Take in the io_uring memory. 2. Convert it to a dmabuf like udmabuf_create() does. 3. Bind the resulting dmabuf to the rx-queue page_pool.
There would be more changes needed vis-a-vis the clean up path and lifetime management, but I think this is the general idea.
This would give you the uapi you want, while the page_pool never seen non-dmabuf memory (addresses the nack, I think).