New subject: [PATCH net-next V10 1/6] devlink: Extend devlink rate API with traffic classes bandwidth management

20 May 2025


      Hi,
This patch series by Carolina is V10 of the feature that adds rate
management support on traffic classes in devlink and mlx5, see full
description by Carolina below [0].
V10:
- Added netdevsim selftest for tc-bw ops.
- Dropped header: field as it’s unnecessary for local constants in
  devlink.yaml.
Regards,
Tariq
[0]
This patch series extends the devlink-rate API to support traffic class
(TC) bandwidth management, enabling more granular control over traffic
shaping and rate limiting across multiple TCs. The API now allows users
to specify bandwidth proportions for different traffic classes in a
single command. This is particularly useful for managing Enhanced
Transmission Selection (ETS) for groups of Virtual Functions (VFs),
allowing precise bandwidth allocation across traffic classes.
Additionally the series refines the QoS handling in net/mlx5 to support
TC arbitration and bandwidth management on vports and rate nodes.
Discussions on traffic class shaping in net-shapers began in V5 [1],
where we discussed with maintainers whether net-shapers should support
traffic classes and how this could be implemented.
Later, after further conversations with Paolo Abeni and Simon Horman,
Cosmin provided an update [2], confirming that net-shapers' tree-based
hierarchy aligns well with traffic classes when treated as distinct
subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
queues and traffic classes, this approach seems feasible, though some
open questions remain regarding queue reconfiguration and certain mlx5
scheduling behaviors.
Building on that discussion, Cosmin has now shared a concrete
implementation plan on the netdev mailing list [3]. The plan, developed
in collaboration with Paolo and Simon, outlines how net-shapers can be
extended to support the same use cases currently covered by
devlink-rate, with the eventual goal of aligning both and simplifying
the shaping infrastructure in the kernel.
This work was presented at Netdev 0x19 in Zagreb [4].
There we presented how TC scheduling is enforced in mlx5 hardware,
which led to discussions on the mailing list.
A summary of how things work:
Classification means labeling a packet with a traffic class based on the
packet's DSCP or VLAN PCP field, then treating packets with different
traffic classes differently during transmit processing.
In a virtualized setup, VFs are untrusted and do not control
classification or shaping.  Classification is done by the hardware using
a prio-to-TC mapping set by the hypervisor. VFs only select which send
queue to use and are expected to respect the classification logic by
sending each traffic class on its dedicated queue. As stated in the
net-shapers plan [3], each transmit queue should carry only a single
traffic class. Mixing classes in a single queue can lead to HOL
blocking.
In the mlx5 implementation, if the queue used does not match the
classified traffic class, the hardware moves the queue to the correct TC
scheduler. This movement is not a reclassification; it’s a necessary
enforcement step to ensure traffic class isolation is maintained.
Extend devlink-rate API to support rate management on TCs:
- devlink: Extend the devlink rate API to support traffic class
  bandwidth management
Introduce a no-op implementation:
- net/mlx5: Add no-op implementation for setting tc-bw on rate objects
Add support for enabling and disabling TC QoS on vports and nodes:
- net/mlx5: Add support for setting tc-bw on nodes
- net/mlx5: Add traffic class scheduling support for vport QoS
Support for setting tc-bw on rate objects:
- net/mlx5: Manage TC arbiter nodes and implement full support for
  tc-bw
[1]
https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/
[2]
https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.came...
[3]
https://lore.kernel.org/netdev/d9831d0c940a7b77419abe7c7330e822bbfd1cfb.came...
[4]
https://netdevconf.info/0x19/sessions/talk/optimizing-bandwidth-allocation-w...
Carolina Jubran (6):
  devlink: Extend devlink rate API with traffic classes bandwidth
    management
  selftest: netdevsim: Add devlink rate tc-bw test
  net/mlx5: Add no-op implementation for setting tc-bw on rate objects
  net/mlx5: Add support for setting tc-bw on nodes
  net/mlx5: Add traffic class scheduling support for vport QoS
  net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw
Documentation/netlink/specs/devlink.yaml      |   35 +-
 .../networking/devlink/devlink-port.rst       |    7 +
 .../net/ethernet/mellanox/mlx5/core/devlink.c |    2 +
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1007 ++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.h |    8 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |   14 +-
 drivers/net/netdevsim/dev.c                   |   43 +
 drivers/net/netdevsim/netdevsim.h             |    1 +
 include/net/devlink.h                         |    6 +
 include/uapi/linux/devlink.h                  |    9 +
 net/devlink/netlink_gen.c                     |   15 +-
 net/devlink/netlink_gen.h                     |    1 +
 net/devlink/rate.c                            |  127 +++
 .../drivers/net/netdevsim/devlink.sh          |   51 +
 14 files changed, 1289 insertions(+), 37 deletions(-)
base-commit: f685204c57e87d2a88b159c7525426d70ee745c9
-- 
2.31.1

[PATCH net-next V10 0/6] Support rate management on traffic classes in devlink and mlx5