This series improves the CPU cost of RX token management by adding an attribute to NETDEV_CMD_BIND_RX that configures sockets using the binding to avoid the xarray allocator and instead use a per-binding niov array and a uref field in niov.
Improvement is ~13% cpu util per RX user thread.
Using kperf, the following results were observed:
Before: Average RX worker idle %: 13.13, flows 4, test runs 11 After: Average RX worker idle %: 26.32, flows 4, test runs 11
Two other approaches were tested, but with no improvement. Namely, 1) using a hashmap for tokens and 2) keeping an xarray of atomic counters but using RCU so that the hotpath could be mostly lockless. Neither of these approaches proved better than the simple array in terms of CPU.
The attribute NETDEV_A_DMABUF_AUTORELEASE is added to toggle the optimization. It is an optional attribute and defaults to 0 (i.e., optimization on).
To: David S. Miller davem@davemloft.net To: Eric Dumazet edumazet@google.com To: Jakub Kicinski kuba@kernel.org To: Paolo Abeni pabeni@redhat.com To: Simon Horman horms@kernel.org To: Kuniyuki Iwashima kuniyu@google.com To: Willem de Bruijn willemb@google.com To: Neal Cardwell ncardwell@google.com To: David Ahern dsahern@kernel.org To: Mina Almasry almasrymina@google.com To: Arnd Bergmann arnd@arndb.de To: Jonathan Corbet corbet@lwn.net To: Andrew Lunn andrew+netdev@lunn.ch To: Shuah Khan shuah@kernel.org Cc: Stanislav Fomichev sdf@fomichev.me Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v7: - use netlink instead of sockopt (Stan) - restrict system to only one mode, dmabuf bindings can not co-exist with different modes (Stan) - use static branching to enforce single system-wide mode (Stan) - Link to v6: https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-up...
Changes in v6: - renamed 'net: devmem: use niov array for token management' to refer to optionality of new config - added documentation and tests - make autorelease flag per-socket sockopt instead of binding field / sysctl - many per-patch changes (see Changes sections per-patch) - Link to v5: https://lore.kernel.org/r/20251023-scratch-bobbyeshleman-devmem-tcp-token-up...
Changes in v5: - add sysctl to opt-out of performance benefit, back to old token release - Link to v4: https://lore.kernel.org/all/20250926-scratch-bobbyeshleman-devmem-tcp-token-...
Changes in v4: - rebase to net-next - Link to v3: https://lore.kernel.org/r/20250926-scratch-bobbyeshleman-devmem-tcp-token-up...
Changes in v3: - make urefs per-binding instead of per-socket, reducing memory footprint - fallback to cleaning up references in dmabuf unbind if socket leaked tokens - drop ethtool patch - Link to v2: https://lore.kernel.org/r/20250911-scratch-bobbyeshleman-devmem-tcp-token-up...
Changes in v2: - net: ethtool: prevent user from breaking devmem single-binding rule (Mina) - pre-assign niovs in binding->vec for RX case (Mina) - remove WARNs on invalid user input (Mina) - remove extraneous binding ref get (Mina) - remove WARN for changed binding (Mina) - always use GFP_ZERO for binding->vec (Mina) - fix length of alloc for urefs - use atomic_set(, 0) to initialize sk_user_frags.urefs - Link to v1: https://lore.kernel.org/r/20250902-scratch-bobbyeshleman-devmem-tcp-token-up...
--- Bobby Eshleman (5): net: devmem: rename tx_vec to vec in dmabuf binding net: devmem: refactor sock_devmem_dontneed for autorelease split net: devmem: implement autorelease token management net: devmem: document NETDEV_A_DMABUF_AUTORELEASE netlink attribute selftests: drv-net: devmem: add autorelease tests
Documentation/netlink/specs/netdev.yaml | 12 +++ Documentation/networking/devmem.rst | 70 +++++++++++++ include/net/netmem.h | 1 + include/net/sock.h | 7 +- include/uapi/linux/netdev.h | 1 + net/core/devmem.c | 121 ++++++++++++++++++---- net/core/devmem.h | 13 ++- net/core/netdev-genl-gen.c | 5 +- net/core/netdev-genl.c | 13 ++- net/core/sock.c | 103 ++++++++++++++---- net/ipv4/tcp.c | 78 +++++++++++--- net/ipv4/tcp_ipv4.c | 13 ++- net/ipv4/tcp_minisocks.c | 3 +- tools/include/uapi/linux/netdev.h | 1 + tools/testing/selftests/drivers/net/hw/devmem.py | 22 +++- tools/testing/selftests/drivers/net/hw/ncdevmem.c | 19 ++-- 16 files changed, 401 insertions(+), 81 deletions(-) --- base-commit: 4c52142904b33b41c3ff7ee58670b4e3b3bf1120 change-id: 20250829-scratch-bobbyeshleman-devmem-tcp-token-upstream-292be174d503
Best regards,