On 2023-11-07 15:03, Mina Almasry wrote:
On Tue, Nov 7, 2023 at 2:55 PM David Ahern dsahern@kernel.org wrote:
On 11/7/23 3:10 PM, Mina Almasry wrote:
On Mon, Nov 6, 2023 at 3:44 PM David Ahern dsahern@kernel.org wrote:
On 11/5/23 7:44 PM, Mina Almasry wrote:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index eeeda849115c..1c351c138a5b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding { };
#ifdef CONFIG_DMA_SHARED_BUFFER +struct page_pool_iov * +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding); +void netdev_free_devmem(struct page_pool_iov *ppiov);
netdev_{alloc,free}_dmabuf?
Can do.
I say that because a dmabuf can be host memory, at least I am not aware of a restriction that a dmabuf is device memory.
In my limited experience dma-buf is generally device memory, and that's really its use case. CONFIG_UDMABUF is a driver that mocks dma-buf with a memfd which I think is used for testing. But I can do the rename, it's more clear anyway, I think.
config UDMABUF bool "userspace dmabuf misc driver" default n depends on DMA_SHARED_BUFFER depends on MEMFD_CREATE || COMPILE_TEST help A driver to let userspace turn memfd regions into dma-bufs. Qemu can use this to create host dmabufs for guest framebuffers.
Qemu is just a userspace process; it is no way a special one.
Treating host memory as a dmabuf should radically simplify the io_uring extension of this set.
I agree actually, and I was about to make that comment to David Wei's series once I have the time.
David, your io_uring RX zerocopy proposal actually works with devmem TCP, if you're inclined to do that instead, what you'd do roughly is (I think):
- Allocate a memfd,
- Use CONFIG_UDMABUF to create a dma-buf out of that memfd.
- Bind the dma-buf to the NIC using the netlink API in this RFC.
- Your io_uring extensions and io_uring uapi should work as-is almost
on top of this series, I think.
If you do this the incoming packets should land into your memfd, which may or may not work for you. In the future if you feel inclined to use device memory, this approach that I'm describing here would be more extensible to device memory, because you'd already be using dma-bufs for your user memory; you'd just replace one kind of dma-buf (UDMABUF) with another.
How would TCP devmem change if we no longer assume that dmabuf is device memory? Pavel will know more on the perf side, but I wouldn't want to put any if/else on the hot path if we can avoid it. I could be wrong, but right now in my mind using different memory providers solves this neatly and the driver/networking stack doesn't need to care.
Mina, I believe you said at NetDev conf that you already had an udmabuf implementation for testing. I would like to see this (you can send privately) to see how TCP devmem would handle both user memory and device memory.
That the io_uring set needs to dive into page_pools is just wrong - complicating the design and code and pushing io_uring into a realm it does not need to be involved in.
Most (all?) of this patch set can work with any memory; only device memory is unreadable.