On Wed, Nov 19, 2025 at 07:37:07PM -0800, Bobby Eshleman wrote:
This series improves the CPU cost of RX token management by adding an attribute to NETDEV_CMD_BIND_RX that configures sockets using the binding to avoid the xarray allocator and instead use a per-binding niov array and a uref field in niov.
Improvement is ~13% cpu util per RX user thread. Using kperf, the following results were observed:
Before: Average RX worker idle %: 13.13, flows 4, test runs 11 After: Average RX worker idle %: 26.32, flows 4, test runs 11
Two other approaches were tested, but with no improvement. Namely, 1) using a hashmap for tokens and 2) keeping an xarray of atomic counters but using RCU so that the hotpath could be mostly lockless. Neither of these approaches proved better than the simple array in terms of CPU.
The attribute NETDEV_A_DMABUF_AUTORELEASE is added to toggle the optimization. It is an optional attribute and defaults to 0 (i.e., optimization on).
[...]
Changes in v7:
- use netlink instead of sockopt (Stan)
- restrict system to only one mode, dmabuf bindings can not co-exist with different modes (Stan)
- use static branching to enforce single system-wide mode (Stan)
- Link to v6: https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-up...
Mina, I was wondering if you had any feedback on this approach?
Best, Bobby