Hi,
I of course took the opportunity at my first time to make a mistake: two patches were missing in v1.. please note that patch #9 and #10 are newly added.
Previous version:
v1: https://lore.kernel.org/rcu/20230315235454.2993-1-boqun.feng@gmail.com/
Changes since v1:
* Add two missing patches. * Fix checkpatch warnings.
You will also be able to find the series at:
https://github.com/fbq/linux rcu/rcutorture.2023.03.20a
top commit is:
6bc6e6b27524 List of changes:
Bhaskar Chowdhury (1): tools: rcu: Add usage function and check for argument
Paul E. McKenney (7): rcutorture: Add test_nmis module parameter rcutorture: Set CONFIG_BOOTPARAM_HOTPLUG_CPU0 to offline CPU 0 rcutorture: Make scenario TREE04 enable lazy call_rcu() torture: Permit kvm-again.sh --duration to default to previous run torture: Enable clocksource watchdog with "tsc=watchdog" rcuscale: Move shutdown from wait_event() to wait_event_idle() refscale: Move shutdown from wait_event() to wait_event_idle()
Yue Hu (1): rcutorture: Eliminate variable n_rcu_torture_boost_rterror
Zqiang (1): rcutorture: Create nocb kthreads only when testing rcu in CONFIG_RCU_NOCB_CPU=y kernels
kernel/rcu/rcuscale.c | 7 ++- kernel/rcu/rcutorture.c | 49 +++++++++++++++---- kernel/rcu/refscale.c | 2 +- tools/rcu/extract-stall.sh | 26 +++++++--- .../selftests/rcutorture/bin/kvm-again.sh | 2 +- .../selftests/rcutorture/bin/torture.sh | 6 +-- .../selftests/rcutorture/configs/rcu/TREE01 | 1 + .../selftests/rcutorture/configs/rcu/TREE04 | 1 + 8 files changed, 69 insertions(+), 25 deletions(-) mode change 100644 => 100755 tools/rcu/extract-stall.sh
From: "Paul E. McKenney" paulmck@kernel.org
This commit adds a test_nmis module parameter to generate the specified number of NMI stack backtraces 15 seconds apart. This module parameter can be used to test NMI delivery and accompanying diagnostics. Note that this parameter is ignored when rcutorture is a module rather than built into the kernel. This could be changed with the addition of an EXPORT_SYMBOL_GPL().
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- kernel/rcu/rcutorture.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 8e6c023212cb..480bba142e3a 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -119,6 +119,7 @@ torture_param(int, stutter, 5, "Number of seconds to run/halt test"); torture_param(int, test_boost, 1, "Test RCU prio boost: 0=no, 1=maybe, 2=yes."); torture_param(int, test_boost_duration, 4, "Duration of each boost test, seconds."); torture_param(int, test_boost_interval, 7, "Interval between boost tests, seconds."); +torture_param(int, test_nmis, 0, "End-test NMI tests, 0 to disable."); torture_param(bool, test_no_idle_hz, true, "Test support for tickless idle CPUs"); torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
@@ -2358,7 +2359,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag) "n_barrier_cbs=%d " "onoff_interval=%d onoff_holdoff=%d " "read_exit_delay=%d read_exit_burst=%d " - "nocbs_nthreads=%d nocbs_toggle=%d\n", + "nocbs_nthreads=%d nocbs_toggle=%d " + "test_nmis=%d\n", torture_type, tag, nrealreaders, nfakewriters, stat_interval, verbose, test_no_idle_hz, shuffle_interval, stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter, @@ -2369,7 +2371,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag) n_barrier_cbs, onoff_interval, onoff_holdoff, read_exit_delay, read_exit_burst, - nocbs_nthreads, nocbs_toggle); + nocbs_nthreads, nocbs_toggle, + test_nmis); }
static int rcutorture_booster_cleanup(unsigned int cpu) @@ -3273,6 +3276,29 @@ static void rcu_torture_read_exit_cleanup(void) torture_stop_kthread(rcutorture_read_exit, read_exit_task); }
+static void rcutorture_test_nmis(int n) +{ +#if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) + int cpu; + int dumpcpu; + int i; + + for (i = 0; i < n; i++) { + preempt_disable(); + cpu = smp_processor_id(); + dumpcpu = cpu + 1; + if (dumpcpu >= nr_cpu_ids) + dumpcpu = 0; + pr_alert("%s: CPU %d invoking dump_cpu_task(%d)\n", __func__, cpu, dumpcpu); + dump_cpu_task(dumpcpu); + preempt_enable(); + schedule_timeout_uninterruptible(15 * HZ); + } +#else // #if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) + WARN_ONCE(n, "Non-zero rcutorture.test_nmis=%d permitted only when rcutorture is built in.\n", test_nmis); +#endif // #else // #if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) +} + static enum cpuhp_state rcutor_hp;
static void @@ -3297,6 +3323,8 @@ rcu_torture_cleanup(void) return; }
+ rcutorture_test_nmis(test_nmis); + if (cur_ops->gp_kthread_dbg) cur_ops->gp_kthread_dbg(); rcu_torture_read_exit_cleanup();
From: "Paul E. McKenney" paulmck@kernel.org
There is now a BOOTPARAM_HOTPLUG_CPU0 Kconfig option that allows CPU 0 to be offlined on x86 systems. This commit therefore sets this option in the TREE01 rcutorture scenario in order to regularly test this capability.
Reported-by: "Zhang, Qiang1" qiang1.zhang@intel.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- tools/testing/selftests/rcutorture/configs/rcu/TREE01 | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01 index 8ae41d5f81a3..04831ef1f9b5 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01 @@ -15,3 +15,4 @@ CONFIG_DEBUG_LOCK_ALLOC=n CONFIG_RCU_BOOST=n CONFIG_DEBUG_OBJECTS_RCU_HEAD=n CONFIG_RCU_EXPERT=y +CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
From: "Paul E. McKenney" paulmck@kernel.org
This commit enables the RCU_LAZY Kconfig option in scenario TREE04 in order to provide some ongoing testing of this configuration.
Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- tools/testing/selftests/rcutorture/configs/rcu/TREE04 | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 b/tools/testing/selftests/rcutorture/configs/rcu/TREE04 index ae395981b5e5..dc4985064b3a 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04 @@ -15,3 +15,4 @@ CONFIG_DEBUG_LOCK_ALLOC=n CONFIG_DEBUG_OBJECTS_RCU_HEAD=n CONFIG_RCU_EXPERT=y CONFIG_RCU_EQS_DEBUG=y +CONFIG_RCU_LAZY=y
From: Bhaskar Chowdhury unixbhaskar@gmail.com
This commit converts extract-stall.sh script's header comment to a usage() function, and adds an argument check. While in the area, make this script be executable.
[ paulmck: Strength argument check, remove extraneous comment. ]
Signed-off-by: Bhaskar Chowdhury unixbhaskar@gmail.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- tools/rcu/extract-stall.sh | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) mode change 100644 => 100755 tools/rcu/extract-stall.sh
diff --git a/tools/rcu/extract-stall.sh b/tools/rcu/extract-stall.sh old mode 100644 new mode 100755 index e565697c9f90..08a39ad44320 --- a/tools/rcu/extract-stall.sh +++ b/tools/rcu/extract-stall.sh @@ -1,11 +1,25 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0+ -# -# Extract any RCU CPU stall warnings present in specified file. -# Filter out clocksource lines. Note that preceding-lines excludes the -# initial line of the stall warning but trailing-lines includes it. -# -# Usage: extract-stall.sh dmesg-file [ preceding-lines [ trailing-lines ] ] + +usage() { + echo Extract any RCU CPU stall warnings present in specified file. + echo Filter out clocksource lines. Note that preceding-lines excludes the + echo initial line of the stall warning but trailing-lines includes it. + echo + echo Usage: $(basename $0) dmesg-file [ preceding-lines [ trailing-lines ] ] + echo + echo Error: $1 +} + +# Terminate the script, if the argument is missing + +if test -f "$1" && test -r "$1" +then + : +else + usage "Console log file "$1" missing or unreadable." + exit 1 +fi
echo $1 preceding_lines="${2-3}"
From: "Paul E. McKenney" paulmck@kernel.org
Currently, invoking kvm-again.sh without a --duration argument results in a bash error message. This commit therefore adds quotes around the $dur argument to kvm-transform.sh to allow a default duration to be taken from the earlier run.
Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- tools/testing/selftests/rcutorture/bin/kvm-again.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-again.sh b/tools/testing/selftests/rcutorture/bin/kvm-again.sh index 8a968fbda02c..88ca4e368489 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-again.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-again.sh @@ -193,7 +193,7 @@ do qemu_cmd_dir="`dirname "$i"`" kernel_dir="`echo $qemu_cmd_dir | sed -e 's/.[0-9]+$//'`" jitter_dir="`dirname "$kernel_dir"`" - kvm-transform.sh "$kernel_dir/bzImage" "$qemu_cmd_dir/console.log" "$jitter_dir" $dur "$bootargs" < $T/qemu-cmd > $i + kvm-transform.sh "$kernel_dir/bzImage" "$qemu_cmd_dir/console.log" "$jitter_dir" "$dur" "$bootargs" < $T/qemu-cmd > $i if test -n "$arg_remote" then echo "# TORTURE_KCONFIG_GDB_ARG=''" >> $i
From: Yue Hu huyue2@coolpad.com
After commit 8b700983de82 ("sched: Remove sched_set_*() return value"), this variable is not used anymore. So eliminate it entirely.
Signed-off-by: Yue Hu huyue2@coolpad.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- kernel/rcu/rcutorture.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 480bba142e3a..c0b2fd687bbb 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -180,7 +180,6 @@ static atomic_t n_rcu_torture_mbchk_tries; static atomic_t n_rcu_torture_error; static long n_rcu_torture_barrier_error; static long n_rcu_torture_boost_ktrerror; -static long n_rcu_torture_boost_rterror; static long n_rcu_torture_boost_failure; static long n_rcu_torture_boosts; static atomic_long_t n_rcu_torture_timers; @@ -2195,12 +2194,11 @@ rcu_torture_stats_print(void) atomic_read(&n_rcu_torture_alloc), atomic_read(&n_rcu_torture_alloc_fail), atomic_read(&n_rcu_torture_free)); - pr_cont("rtmbe: %d rtmbkf: %d/%d rtbe: %ld rtbke: %ld rtbre: %ld ", + pr_cont("rtmbe: %d rtmbkf: %d/%d rtbe: %ld rtbke: %ld ", atomic_read(&n_rcu_torture_mberror), atomic_read(&n_rcu_torture_mbchk_fail), atomic_read(&n_rcu_torture_mbchk_tries), n_rcu_torture_barrier_error, - n_rcu_torture_boost_ktrerror, - n_rcu_torture_boost_rterror); + n_rcu_torture_boost_ktrerror); pr_cont("rtbf: %ld rtb: %ld nt: %ld ", n_rcu_torture_boost_failure, n_rcu_torture_boosts, @@ -2218,15 +2216,13 @@ rcu_torture_stats_print(void) if (atomic_read(&n_rcu_torture_mberror) || atomic_read(&n_rcu_torture_mbchk_fail) || n_rcu_torture_barrier_error || n_rcu_torture_boost_ktrerror || - n_rcu_torture_boost_rterror || n_rcu_torture_boost_failure || - i > 1) { + n_rcu_torture_boost_failure || i > 1) { pr_cont("%s", "!!! "); atomic_inc(&n_rcu_torture_error); WARN_ON_ONCE(atomic_read(&n_rcu_torture_mberror)); WARN_ON_ONCE(atomic_read(&n_rcu_torture_mbchk_fail)); WARN_ON_ONCE(n_rcu_torture_barrier_error); // rcu_barrier() WARN_ON_ONCE(n_rcu_torture_boost_ktrerror); // no boost kthread - WARN_ON_ONCE(n_rcu_torture_boost_rterror); // can't set RT prio WARN_ON_ONCE(n_rcu_torture_boost_failure); // boost failed (TIMER_SOFTIRQ RT prio?) WARN_ON_ONCE(i > 1); // Too-short grace period } @@ -3568,7 +3564,6 @@ rcu_torture_init(void) atomic_set(&n_rcu_torture_error, 0); n_rcu_torture_barrier_error = 0; n_rcu_torture_boost_ktrerror = 0; - n_rcu_torture_boost_rterror = 0; n_rcu_torture_boost_failure = 0; n_rcu_torture_boosts = 0; for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++)
From: "Paul E. McKenney" paulmck@kernel.org
This commit tests the "tsc=watchdog" kernel boot parameter when running the clocksourcewd torture tests.
Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- tools/testing/selftests/rcutorture/bin/torture.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh index 130d0de4c3bb..5a2ae2264403 100755 --- a/tools/testing/selftests/rcutorture/bin/torture.sh +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -497,16 +497,16 @@ fi
if test "$do_clocksourcewd" = "yes" then - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000" + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 tsc=watchdog" torture_set "clocksourcewd-1" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --trust-make
- torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 clocksource.max_cswd_read_retries=1" + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 clocksource.max_cswd_read_retries=1 tsc=watchdog" torture_set "clocksourcewd-2" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --trust-make
# In case our work is already done... if test "$do_rcutorture" != "yes" then - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000" + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 tsc=watchdog" torture_set "clocksourcewd-3" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --trust-make fi fi
From: Zqiang qiang1.zhang@intel.com
Given a non-zero rcutorture.nocbs_nthreads module parameter, the specified number of nocb kthreads will be created, regardless of whether or not the RCU implementation under test is capable of offloading callbacks. Please note that even vanilla RCU is incapable of offloading in kernels built with CONFIG_RCU_NOCB_CPU=n. And when the RCU implementation is incapable of offloading callbacks, there is no point in creating those kthreads.
This commit therefore checks the cur_ops.torture_type module parameter and CONFIG_RCU_NOCB_CPU Kconfig option in order to avoid creating unnecessary nocb tasks.
Signed-off-by: Zqiang qiang1.zhang@intel.com Reviewed-by: Joel Fernandes (Google) joel@joelfernandes.org Signed-off-by: Paul E. McKenney paulmck@kernel.org [ boqun: Fix checkpatch warning ] Signed-off-by: Boqun Feng boqun.feng@gmail.com --- kernel/rcu/rcutorture.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index c0b2fd687bbb..e046d2c6fe10 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -3525,6 +3525,12 @@ rcu_torture_init(void) pr_alert("rcu-torture: ->fqs NULL and non-zero fqs_duration, fqs disabled.\n"); fqs_duration = 0; } + if (nocbs_nthreads != 0 && (cur_ops != &rcu_ops || + !IS_ENABLED(CONFIG_RCU_NOCB_CPU))) { + pr_alert("rcu-torture types: %s and CONFIG_RCU_NOCB_CPU=%d, nocb toggle disabled.\n", + cur_ops->name, IS_ENABLED(CONFIG_RCU_NOCB_CPU)); + nocbs_nthreads = 0; + } if (cur_ops->init) cur_ops->init();
From: "Paul E. McKenney" paulmck@kernel.org
The rcu_scale_shutdown() and kfree_scale_shutdown() kthreads/functions use wait_event() to wait for the rcuscale test to complete. However, each updater thread in such a test waits for at least 100 grace periods. If each grace period takes more than 1.2 seconds, which is long, but not insanely so, this can trigger the hung-task timeout.
This commit therefore replaces those wait_event() calls with calls to wait_event_idle(), which do not trigger the hung-task timeout.
Reported-by: kernel test robot yujie.liu@intel.com Reported-by: Liam Howlett liam.howlett@oracle.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Tested-by: Yujie Liu yujie.liu@intel.com Signed-off-by: Boqun Feng boqun.feng@gmail.com --- kernel/rcu/rcuscale.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c index 91fb5905a008..4120f94030c3 100644 --- a/kernel/rcu/rcuscale.c +++ b/kernel/rcu/rcuscale.c @@ -631,8 +631,7 @@ static int compute_real(int n) static int rcu_scale_shutdown(void *arg) { - wait_event(shutdown_wq, - atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters); + wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters); smp_mb(); /* Wake before output. */ rcu_scale_cleanup(); kernel_power_off(); @@ -771,8 +770,8 @@ kfree_scale_cleanup(void) static int kfree_scale_shutdown(void *arg) { - wait_event(shutdown_wq, - atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); + wait_event_idle(shutdown_wq, + atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
smp_mb(); /* Wake before output. */
From: "Paul E. McKenney" paulmck@kernel.org
The ref_scale_shutdown() kthread/function uses wait_event() to wait for the refscale test to complete. However, although the read-side tests are normally extremely fast, there is no law against specifying a very large value for the refscale.loops module parameter or against having a slow read-side primitive. Either way, this might well trigger the hung-task timeout.
This commit therefore replaces those wait_event() calls with calls to wait_event_idle(), which do not trigger the hung-task timeout.
Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Boqun Feng boqun.feng@gmail.com --- kernel/rcu/refscale.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c index afa3e1a2f690..1970ce5f22d4 100644 --- a/kernel/rcu/refscale.c +++ b/kernel/rcu/refscale.c @@ -1031,7 +1031,7 @@ ref_scale_cleanup(void) static int ref_scale_shutdown(void *arg) { - wait_event(shutdown_wq, shutdown_start); + wait_event_idle(shutdown_wq, shutdown_start);
smp_mb(); // Wake before output. ref_scale_cleanup();
linux-kselftest-mirror@lists.linaro.org