On Tue, Jul 18, 2023 at 10:36:52AM -0700, Mina Almasry wrote:
That is specific to this proposal, and will likely be very different in future ones. I thought the dma-buf pages approach was extensible and the uapi belonged somewhere in dma-buf. Clearly not. The next proposal, I think, will program the rxq via some net uapi and will take the dma-buf as input. Probably some netlink api (not sure if ethtool family or otherwise). I'm working out details of this non-paged networking first.
In practice you want the application to startup, get itself some 3/5 tuples and then request the kernel to setup the flow steering and provision the NIC queues.
This is the right moment for the application to provide the backing for the rx queue memory via a DMABUF handle.
Ideally this would all be accessible to non-priv applications as well, so I think you'd want some kind of system call that sets all this up and takes in a FD for the 3/5-tuple socket (to prove ownership over the steering) and the DMABUF FD.
The queues and steering should exist only as long as the application is still running (whatever that means). Otherwise you have a big mess to clean up whenever anything crashes.
netlink feels like a weird API choice for that, in particular it would be really wrong to somehow bind the lifecycle of a netlink object to a process.
Further, if you are going to all the trouble of doing this, it seems to me you should make it work with any kind of memory, including CPU memory. Get a consistent approach to zero-copy TCP RX. So also allow a memfd or similar to be passed in as the backing storage.
Jason