The rss_ctx test has gotten pretty flaky after I increased the queue count in NIPA 2->3. Not 100% clear why. We get a lot of failures in the rss_ctx.test_hitless_key_update case.
Looking closer it appears that the failures are mostly due to startup costs. I measured the following timing for ethtool -X: - python cmd(shell=True) : 150-250msec - python cmd(shell=False) : 50- 70msec - timed in bash : 45- 55msec - YNL Netlink call : 2- 4msec - .set_rxfh callback : 1- 2msec
The target in the test was set to 200msec. We were mostly measuring ethtool startup cost it seems. Switch to YNL since it's 100x faster.
Lower the pass criteria to 150msec, no real science behind this number but we removed some overhead, drivers which previously passed 200msec should easily pass 150msec now.
Separately we should probably follow up on defaulting to shell=False, when script doesn't explicitly ask for True, because the overhead is rather significant.
Switch from _rss_key_rand() to random.randbytes(), YNL takes a binary array rather than array of ints.
Signed-off-by: Jakub Kicinski kuba@kernel.org --- v2: - increase the threshold to safer 150msec - mention change away from _rss_key_rand() v1: https://lore.kernel.org/20250829220712.327920-1-kuba@kernel.org --- tools/testing/selftests/drivers/net/hw/rss_ctx.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py index 9838b8457e5a..a5562a9f729f 100755 --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py @@ -335,19 +335,20 @@ from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure data = get_rss(cfg) key_len = len(data['rss-hash-key'])
- key = _rss_key_rand(key_len) + ethnl = EthtoolFamily() + key = random.randbytes(key_len)
tgen = GenerateTraffic(cfg) try: errors0, carrier0 = get_drop_err_sum(cfg) t0 = datetime.datetime.now() - ethtool(f"-X {cfg.ifname} hkey " + _rss_key_str(key)) + ethnl.rss_set({"header": {"dev-index": cfg.ifindex}, "hkey": key}) t1 = datetime.datetime.now() errors1, carrier1 = get_drop_err_sum(cfg) finally: tgen.wait_pkts_and_stop(5000)
- ksft_lt((t1 - t0).total_seconds(), 0.2) + ksft_lt((t1 - t0).total_seconds(), 0.15) ksft_eq(errors1 - errors1, 0) ksft_eq(carrier1 - carrier0, 0)
rss_ctx.test_rss_key_indir implicitly expects at least 5 queues, as it checks that the traffic on first 2 queues is lower than the remaining queues when we use all queues. Special case fewer queues.
Signed-off-by: Jakub Kicinski kuba@kernel.org --- tools/testing/selftests/drivers/net/hw/rss_ctx.py | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py index a5562a9f729f..ed7e405682f0 100755 --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py @@ -178,8 +178,13 @@ from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure cnts = _get_rx_cnts(cfg) GenerateTraffic(cfg).wait_pkts_and_stop(20000) cnts = _get_rx_cnts(cfg, prev=cnts) - # First two queues get less traffic than all the rest - ksft_lt(sum(cnts[:2]), sum(cnts[2:]), "traffic distributed: " + str(cnts)) + if qcnt > 4: + # First two queues get less traffic than all the rest + ksft_lt(sum(cnts[:2]), sum(cnts[2:]), + "traffic distributed: " + str(cnts)) + else: + # When queue count is low make sure third queue got significant pkts + ksft_ge(cnts[2], 3500, "traffic distributed: " + str(cnts))
def test_rss_queue_reconfigure(cfg, main_ctx=True):
On Mon, Sep 01, 2025 at 10:31:39AM -0700, Jakub Kicinski wrote:
rss_ctx.test_rss_key_indir implicitly expects at least 5 queues, as it checks that the traffic on first 2 queues is lower than the remaining queues when we use all queues. Special case fewer queues.
Signed-off-by: Jakub Kicinski kuba@kernel.org
Reviewed-by: Simon Horman horms@kernel.org
On Mon, Sep 01, 2025 at 10:31:38AM -0700, Jakub Kicinski wrote:
The rss_ctx test has gotten pretty flaky after I increased the queue count in NIPA 2->3. Not 100% clear why. We get a lot of failures in the rss_ctx.test_hitless_key_update case.
Looking closer it appears that the failures are mostly due to startup costs. I measured the following timing for ethtool -X:
- python cmd(shell=True) : 150-250msec
- python cmd(shell=False) : 50- 70msec
- timed in bash : 45- 55msec
- YNL Netlink call : 2- 4msec
- .set_rxfh callback : 1- 2msec
The target in the test was set to 200msec. We were mostly measuring ethtool startup cost it seems. Switch to YNL since it's 100x faster.
Lower the pass criteria to 150msec, no real science behind this number but we removed some overhead, drivers which previously passed 200msec should easily pass 150msec now.
Separately we should probably follow up on defaulting to shell=False, when script doesn't explicitly ask for True, because the overhead is rather significant.
Switch from _rss_key_rand() to random.randbytes(), YNL takes a binary array rather than array of ints.
Signed-off-by: Jakub Kicinski kuba@kernel.org
v2:
- increase the threshold to safer 150msec
- mention change away from _rss_key_rand()
v1: https://lore.kernel.org/20250829220712.327920-1-kuba@kernel.org
Reviewed-by: Simon Horman horms@kernel.org
Hello:
This series was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Mon, 1 Sep 2025 10:31:38 -0700 you wrote:
The rss_ctx test has gotten pretty flaky after I increased the queue count in NIPA 2->3. Not 100% clear why. We get a lot of failures in the rss_ctx.test_hitless_key_update case.
Looking closer it appears that the failures are mostly due to startup costs. I measured the following timing for ethtool -X:
- python cmd(shell=True) : 150-250msec
- python cmd(shell=False) : 50- 70msec
- timed in bash : 45- 55msec
- YNL Netlink call : 2- 4msec
- .set_rxfh callback : 1- 2msec
[...]
Here is the summary with links: - [net-next,v2,1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig https://git.kernel.org/netdev/net-next/c/4022f92a2e4e - [net-next,v2,2/2] selftests: drv-net: rss_ctx: make the test pass with few queues https://git.kernel.org/netdev/net-next/c/e2cf2d5baa09
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org