Create a netconsole test that puts a lot of pressure on the netconsole list manipulation. Do it by creating dynamic targets and deleting targets while messages are being sent. Also put interface down while the messages are being sent, as creating parallel targets.
The code launches three background jobs on distinct schedules:
* Toggle netcons target every 30 iterations * create and delete random_target every 50 iterations * toggle iface every 70 iterations
This creates multiple concurrency sources that interact with netconsole states. This is good practice to simulate stress, and exercise netpoll and netconsole locks.
This test already found an issue as reported in [1]
Link: https://lore.kernel.org/all/20250901-netpoll_memleak-v1-1-34a181977dfc@debia... [1] Signed-off-by: Breno Leitao leitao@debian.org --- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/netcons_torture.sh | 133 +++++++++++++++++++++ 2 files changed, 134 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index 984ece05f7f92..2b253b1ff4f38 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -17,6 +17,7 @@ TEST_PROGS := \ netcons_fragmented_msg.sh \ netcons_overflow.sh \ netcons_sysdata.sh \ + netcons_torture.sh \ netpoll_basic.py \ ping.py \ queues.py \ diff --git a/tools/testing/selftests/drivers/net/netcons_torture.sh b/tools/testing/selftests/drivers/net/netcons_torture.sh new file mode 100755 index 0000000000000..d41884c83cab3 --- /dev/null +++ b/tools/testing/selftests/drivers/net/netcons_torture.sh @@ -0,0 +1,133 @@ +#!/usr/bin/env bash +# SPDX-License-Identifier: GPL-2.0 + +# Repeatedly send kernel messages, toggles netconsole targets on and off, +# creates and deletes targets in parallel, and toggles the source interface to +# simulate stress conditions. +# +# This test aims verify the robustness of netconsole under dynamic +# configurations and concurrent operations. +# +# The major goal is to run this test with LOCKDEP, Kmemleak and KASAN to make +# sure no issues is reported. +# +# Author: Breno Leitao leitao@debian.org + +set -euo pipefail + +SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")") + +source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh + +# Number of times the main loop run +ITERATIONS=${1:-1000} + +# Only test extended format +FORMAT="extended" +# And ipv6 only +IP_VERSION="ipv6" + +# Create, enable and delete some targets. +create_and_delete_random_target() { + COUNT=1 + RND_PREFIX=$(mktemp -u netcons_rnd_XXXX_) + + if [ -d "${NETCONS_CONFIGFS}/${RND_PREFIX}${COUNT}" ] || \ + [ -d "${NETCONS_CONFIGFS}/${RND_PREFIX}0" ]; then + echo "Function didn't finish yet, skipping it." >&2 + return + fi + + # enable COUNT targets + for i in $(seq 0 ${COUNT}) + do + RND_TARGET="${RND_PREFIX}"${i} + RND_TARGET_PATH="${NETCONS_CONFIGFS}"/"${RND_TARGET}" + + # Basic population so the target can come up + mkdir "${RND_TARGET_PATH}" + echo "${DSTIP}" > "${RND_TARGET_PATH}"/remote_ip + echo "${SRCIP}" > "${RND_TARGET_PATH}"/local_ip + echo "${DSTMAC}" > "${RND_TARGET_PATH}"/remote_mac + echo "${SRCIF}" > "${RND_TARGET_PATH}"/dev_name + + echo 1 > "${RND_TARGET_PATH}"/enabled + done + + echo "netconsole selftest: ${COUNT} additional target was created" > /dev/kmsg + # disable them all + for i in $(seq 0 ${COUNT}) + do + RND_TARGET="${RND_PREFIX}"${i} + RND_TARGET_PATH="${NETCONS_CONFIGFS}"/"${RND_TARGET}" + echo 0 > "${RND_TARGET_PATH}"/enabled + rmdir "${RND_TARGET_PATH}" + done +} + +# Disable and enable the target mid-air, while messages +# are being transmitted. +toggle_netcons_target() { + for i in $(seq 2) + do + if [ ! -d "${NETCONS_PATH}" ] + then + break + fi + echo 0 > "${NETCONS_PATH}"/enabled 2> /dev/null || true + # Try to enable a bit harder, given it might fail to enable + # Write to `enabled` might fail depending on the lock, which is + # highly contentious here + for _ in $(seq 5) + do + echo 1 > "${NETCONS_PATH}"/enabled 2> /dev/null || true + done + done +} + +toggle_iface(){ + ip link set "${SRCIF}" down + ip link set "${SRCIF}" up +} + +# Start here + +modprobe netdevsim 2> /dev/null || true +modprobe netconsole 2> /dev/null || true + +# Check for basic system dependency and exit if not found +check_for_dependencies +# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5) +echo "6 5" > /proc/sys/kernel/printk +# Remove the namespace, interfaces and netconsole target on exit +trap cleanup EXIT +# Create one namespace and two interfaces +set_network "${IP_VERSION}" +# Create a dynamic target for netconsole +create_dynamic_target "${FORMAT}" + +for i in $(seq "$ITERATIONS") +do + for _ in $(seq 10) + do + echo "${MSG}: ${TARGET} ${i}" > /dev/kmsg + wait + done + + if (( i % 30 == 0 )); then + toggle_netcons_target & + fi + + if (( i % 50 == 0 )); then + # create some targets, enable them, send msg and disable + # all in a parallel thread + create_and_delete_random_target & + fi + + if (( i % 70 == 0 )); then + toggle_iface & + fi +done +wait + +exit "${ksft_pass}"
--- base-commit: 2fd4161d0d2547650d9559d57fc67b4e0a26a9e3 change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards, -- Breno Leitao leitao@debian.org
On Tue, 02 Sep 2025 09:33:33 -0700 Breno Leitao wrote:
Create a netconsole test that puts a lot of pressure on the netconsole list manipulation. Do it by creating dynamic targets and deleting targets while messages are being sent. Also put interface down while the messages are being sent, as creating parallel targets.
The code launches three background jobs on distinct schedules:
- Toggle netcons target every 30 iterations
- create and delete random_target every 50 iterations
- toggle iface every 70 iterations
This creates multiple concurrency sources that interact with netconsole states. This is good practice to simulate stress, and exercise netpoll and netconsole locks.
Oh, when you said "selftest will be posted later" in the fix I thought you meant days, not hours later :) It's better if the fix and test are in one series. Better for backports, and it avoid situations like last night when the fix was already dropped from pw but this test was still running (and crashing the kernel).
Regarding the test, I think it makes sense. Tho is there a way we can reuse more of the existing code? Do you write all these scripts by hand or get AI to write them? I was hoping you'd add more tests relating to bonding. To confirm bonding still works. And as I mentioned I think bonding is still a bit buggy if we "propagate" multiple nps and then remove them out of order..
On Wed, Sep 03, 2025 at 05:37:46PM -0700, Jakub Kicinski wrote:
This creates multiple concurrency sources that interact with netconsole states. This is good practice to simulate stress, and exercise netpoll and netconsole locks.
Oh, when you said "selftest will be posted later" in the fix I thought you meant days, not hours later :) It's better if the fix and test are in one series.
Oh, I remember I read somewhere that new tests usually go to net-next, and the fix goes to `net`
That is why I split them. In fact, the tests showed up earlier, when i was moving the target_list in netconsole to be RCU safe.
Regarding the test, I think it makes sense. Tho is there a way we can reuse more of the existing code?
Maybe. The only part that is similar is inner core (4 lines) create_and_delete_random_target() that I could reuse if I rework the other tests and the library.
Basically creating a new function that would do the following on an argument (instead of NETCONS_PATH):
echo "${DSTIP}" > "${NETCONS_PATH}"/remote_ip echo "${SRCIP}" > "${NETCONS_PATH}"/local_ip echo "${DSTMAC}" > "${NETCONS_PATH}"/remote_mac echo "${SRCIF}" > "${NETCONS_PATH}"/dev_name
Then reuse it in create_dynamic_target() and create_and_delete_random_target().
Initially I didn't think it was worth it, but, I can spend more time on it, and see how it looks like.
Do you write all these scripts by hand > or get AI to write them?
100% handcrafted test, very unfortunately. I've created the selftest to test my changes to test some RCU changes I am going to propose to netconsole, when I ran it, I got the memleak issue.
I was hoping you'd add more tests relating to bonding.
I am happy to do so, given I want to get closer to netpoll. I will not be able to do it right now due to lack of bonding knowledge, but, I will add it to my TODO list.
To confirm bonding still works. And as I mentioned I think bonding is still a bit buggy if we "propagate" multiple nps and then remove them out of order..
Tell me more about this possible code path. I can focus my this initial selftest to exercise it.
Thanks for the review. I will send a v2 with more core utilization and then we check which version is better.
--breno
linux-kselftest-mirror@lists.linaro.org