On 15/05/2024 13:01, Nikolay Aleksandrov wrote:
On 11/05/2024 02:21, Mina Almasry wrote:
Add a netdev_dmabuf_binding struct which represents the dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to rx queues on the netdevice. On the binding, the dma_buf_attach & dma_buf_map_attachment will occur. The entries in the sg_table from mapping will be inserted into a genpool to make it ready for allocation.
The chunks in the genpool are owned by a dmabuf_chunk_owner struct which holds the dma-buf offset of the base of the chunk and the dma_addr of the chunk. Both are needed to use allocations that come from this chunk.
We create a new type that represents an allocation from the genpool: net_iov. We setup the net_iov allocation size in the genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally allocated by the page pool and given to the drivers.
The user can unbind the dmabuf from the netdevice by closing the netlink socket that established the binding. We do this so that the binding is automatically unbound even if the userspace process crashes.
The binding and unbinding leaves an indicator in struct netdev_rx_queue that the given queue is bound, but the binding doesn't take effect until the driver actually reconfigures its queues, and re-initializes its page pool.
The netdev_dmabuf_binding struct is refcounted, and releases its resources only when all the refs are released.
Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: Kaiyuan Zhang kaiyuanz@google.com Signed-off-by: Mina Almasry almasrymina@google.com
v9: https://lore.kernel.org/all/20240403002053.2376017-5-almasrymina@google.com/
- Removed net_devmem_restart_rx_queues and put it in its own patch (David).
v8:
- move dmabuf_devmem_ops usage to later patch to avoid patch-by-patch build error.
v7:
- Use IS_ERR() instead of IS_ERR_OR_NULL() for the dma_buf_get() return value.
- Changes netdev_* naming in devmem.c to net_devmem_* (Yunsheng).
- DMA_BIDIRECTIONAL -> DMA_FROM_DEVICE (Yunsheng).
- Added a comment around recovering of the old rx queue in net_devmem_restart_rx_queue(), and added freeing of old_mem if the restart of the old queue fails. (Yunsheng).
- Use kernel-family sock-priv (Jakub).
- Put pp_memory_provider_params in netdev_rx_queue instead of the dma-buf specific binding (Pavel & David).
- Move queue management ops to queue_mgmt_ops instead of netdev_ops (Jakub).
- Remove excess whitespaces (Jakub).
- Use genlmsg_iput (Jakub).
v6:
- Validate rx queue index
- Refactor new functions into devmem.c (Pavel)
v5:
- Renamed page_pool_iov to net_iov, and moved that support to devmem.h or netmem.h.
v1:
- Introduce devmem.h instead of bloating netdevice.h (Jakub)
- ENOTSUPP -> EOPNOTSUPP (checkpatch.pl I think)
- Remove unneeded rcu protection for binding->list (rtnl protected)
- Removed extraneous err_binding_put: label.
- Removed dma_addr += len (Paolo).
- Don't override err on netdev_bind_dmabuf_to_queue failure.
- Rename devmem -> dmabuf (David).
- Add id to dmabuf binding (David/Stan).
- Fix missing xa_destroy bound_rq_list.
- Use queue api to reset bound RX queues (Jakub).
- Update netlink API for rx-queue type (tx/re) (Jakub).
RFC v3:
- Support multi rx-queue binding
Documentation/netlink/specs/netdev.yaml | 4 + include/net/devmem.h | 111 +++++++++++ include/net/netdev_rx_queue.h | 2 + include/net/netmem.h | 10 + include/net/page_pool/types.h | 5 + net/core/Makefile | 2 +- net/core/dev.c | 3 + net/core/devmem.c | 254 ++++++++++++++++++++++++ net/core/netdev-genl-gen.c | 4 + net/core/netdev-genl-gen.h | 4 + net/core/netdev-genl.c | 105 +++++++++- 11 files changed, 501 insertions(+), 3 deletions(-) create mode 100644 include/net/devmem.h create mode 100644 net/core/devmem.c
[snip]
+/* Protected by rtnl_lock() */ +static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
+void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) +{
- struct netdev_rx_queue *rxq;
- unsigned long xa_idx;
- unsigned int rxq_idx;
- if (!binding)
return;
- if (binding->list.next)
list_del(&binding->list);
minor nit: In theory list.next can still be != null if it's poisoned (e.g. after del). You can use the list api here (!list_empty(&binding->list) -> list_del_init(&binding->list)) if you initialize it in net_devmem_bind_dmabuf(), then you'll also get nice list debugging.
On second thought nevermind this, sorry for the noise.
- xa_for_each(&binding->bound_rxq_list, xa_idx, rxq) {
if (rxq->mp_params.mp_priv == binding) {
/* We hold the rtnl_lock while binding/unbinding
* dma-buf, so we can't race with another thread that
* is also modifying this value. However, the page_pool
* may read this config while it's creating its
* rx-queues. WRITE_ONCE() here to match the
* READ_ONCE() in the page_pool.
*/
WRITE_ONCE(rxq->mp_params.mp_ops, NULL);
WRITE_ONCE(rxq->mp_params.mp_priv, NULL);
rxq_idx = get_netdev_rx_queue_index(rxq);
netdev_rx_queue_restart(binding->dev, rxq_idx);
}
- }
- xa_erase(&net_devmem_dmabuf_bindings, binding->id);
- net_devmem_dmabuf_binding_put(binding);
+}
[snip]
Cheers, Nik