v2: https://lkml.org/lkml/2020/7/17/369 Changelog v2-->v3 Based on comments from Gautham R. Shenoy adding the following in the selftest, 1. Grepping modules to determine if already loaded 2. Wrapper to enable/disable states 3. Preventing any operation/test on offlined CPUs ---
The patch series introduces a mechanism to measure wakeup latency for IPI and timer based interrupts The motivation behind this series is to find significant deviations behind advertised latency and resisdency values
To achieve this, we introduce a kernel module and expose its control knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within /sys/kernel/debug/latency_test/ for, 1. IPI test: ipi_cpu_dest # Destination CPU for the IPI ipi_cpu_src # Origin of the IPI ipi_latency_ns # Measured latency time in ns 2. Timeout test: timeout_cpu_src # CPU on which the timer to be queued timeout_expected_ns # Timer duration timeout_diff_ns # Difference of actual duration vs expected timer To include the module, check option and include as module kernel hacking -> Cpuidle latency selftests
The selftest inserts the module, disables all the idle states and enables them one by one testing the following: 1. Keeping source CPU constant, iterates through all the CPUS measuring IPI latency for baseline (CPU is busy with "cat /dev/random > /dev/null" workload) and the when the CPU is allowed to be at rest 2. Iterating through all the CPUs, sending expected timer durations to be equivalent to the residency of the the deepest idle state enabled and extracting the difference in time between the time of wakeup and the expected timer duration
Usage ----- Can be used in conjuction to the rest of the selftests. Default Output location in: tools/testing/cpuidle/cpuidle.log
To run this test specifically: $ make -C tools/testing/selftests TARGETS="cpuidle" run_tests
There are a few optinal arguments too that the script can take [-h <help>] [-m <location of the module>] [-o <location of the output>]
Sample output snippet --------------------- --IPI Latency Test--- --Baseline IPI Latency measurement: CPU Busy-- SRC_CPU DEST_CPU IPI_Latency(ns) ... 0 8 1996 0 9 2125 0 10 1264 0 11 1788 0 12 2045 Baseline Average IPI latency(ns): 1843 ---Enabling state: 5--- SRC_CPU DEST_CPU IPI_Latency(ns) 0 8 621719 0 9 624752 0 10 622218 0 11 623968 0 12 621303 Expected IPI latency(ns): 100000 Observed Average IPI latency(ns): 622792
--Timeout Latency Test-- --Baseline Timeout Latency measurement: CPU Busy-- Wakeup_src Baseline_delay(ns) ... 8 2249 9 2226 10 2211 11 2183 12 2263 Baseline Average timeout diff(ns): 2226 ---Enabling state: 5--- 8 10749 9 10911 10 10912 11 12100 12 73276 Expected timeout(ns): 10000200 Observed Average timeout diff(ns): 23589
Pratik Rajesh Sampat (2): cpuidle: Trace IPI based and timer based wakeup latency from idle states selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 + drivers/cpuidle/test-cpuidle_latency.c | 150 ++++++++++ lib/Kconfig.debug | 10 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/cpuidle/Makefile | 6 + tools/testing/selftests/cpuidle/cpuidle.sh | 310 +++++++++++++++++++++ tools/testing/selftests/cpuidle/settings | 1 + 7 files changed, 479 insertions(+) create mode 100644 drivers/cpuidle/test-cpuidle_latency.c create mode 100644 tools/testing/selftests/cpuidle/Makefile create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh create mode 100644 tools/testing/selftests/cpuidle/settings
Fire directed smp_call_function_single IPIs from a specified source CPU to the specified target CPU to reduce the noise we have to wade through in the trace log. The module is based on the idea written by Srivatsa Bhat and maintained by Vaidyanathan Srinivasan internally.
Queue HR timer and measure jitter. Wakeup latency measurement for idle states using hrtimer. Echo a value in ns to timer_test_function and watch trace. A HRtimer will be queued and when it fires the expected wakeup vs actual wakeup is computes and delay printed in ns.
Implemented as a module which utilizes debugfs so that it can be integrated with selftests.
To include the module, check option and include as module kernel hacking -> Cpuidle latency selftests
[srivatsa.bhat@linux.vnet.ibm.com: Initial implementation in cpidle/sysfs]
[svaidy@linux.vnet.ibm.com: wakeup latency measurements using hrtimer and fix some of the time calculation]
[ego@linux.vnet.ibm.com: Fix some whitespace and tab errors and increase the resolution of IPI wakeup]
Signed-off-by: Pratik Rajesh Sampat psampat@linux.ibm.com Reviewed-by: Gautham R. Shenoy ego@linux.vnet.ibm.com --- drivers/cpuidle/Makefile | 1 + drivers/cpuidle/test-cpuidle_latency.c | 150 +++++++++++++++++++++++++ lib/Kconfig.debug | 10 ++ 3 files changed, 161 insertions(+) create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index f07800cbb43f..2ae05968078c 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o +obj-$(CONFIG_IDLE_LATENCY_SELFTEST) += test-cpuidle_latency.o
################################################################################## # ARM SoC drivers diff --git a/drivers/cpuidle/test-cpuidle_latency.c b/drivers/cpuidle/test-cpuidle_latency.c new file mode 100644 index 000000000000..61574665e972 --- /dev/null +++ b/drivers/cpuidle/test-cpuidle_latency.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Module-based API test facility for cpuidle latency using IPIs and timers + */ + +#include <linux/debugfs.h> +#include <linux/kernel.h> +#include <linux/module.h> + +/* IPI based wakeup latencies */ +struct latency { + unsigned int src_cpu; + unsigned int dest_cpu; + ktime_t time_start; + ktime_t time_end; + u64 latency_ns; +} ipi_wakeup; + +static void measure_latency(void *info) +{ + struct latency *v; + ktime_t time_diff; + + v = (struct latency *)info; + v->time_end = ktime_get(); + time_diff = ktime_sub(v->time_end, v->time_start); + v->latency_ns = ktime_to_ns(time_diff); +} + +void run_smp_call_function_test(unsigned int cpu) +{ + ipi_wakeup.src_cpu = smp_processor_id(); + ipi_wakeup.dest_cpu = cpu; + ipi_wakeup.time_start = ktime_get(); + smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1); +} + +/* Timer based wakeup latencies */ +struct timer_data { + unsigned int src_cpu; + u64 timeout; + ktime_t time_start; + ktime_t time_end; + struct hrtimer timer; + u64 timeout_diff_ns; +} timer_wakeup; + +static enum hrtimer_restart timer_called(struct hrtimer *hrtimer) +{ + struct timer_data *w; + ktime_t time_diff; + + w = container_of(hrtimer, struct timer_data, timer); + w->time_end = ktime_get(); + + time_diff = ktime_sub(w->time_end, w->time_start); + time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout)); + w->timeout_diff_ns = ktime_to_ns(time_diff); + return HRTIMER_NORESTART; +} + +static void run_timer_test(unsigned int ns) +{ + hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC, + HRTIMER_MODE_REL); + timer_wakeup.timer.function = timer_called; + timer_wakeup.time_start = ktime_get(); + timer_wakeup.src_cpu = smp_processor_id(); + timer_wakeup.timeout = ns; + + hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns), + HRTIMER_MODE_REL_PINNED); +} + +static struct dentry *dir; + +static int cpu_read_op(void *data, u64 *value) +{ + *value = ipi_wakeup.dest_cpu; + return 0; +} + +static int cpu_write_op(void *data, u64 value) +{ + run_smp_call_function_test(value); + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n"); + +static int timeout_read_op(void *data, u64 *value) +{ + *value = timer_wakeup.timeout; + return 0; +} + +static int timeout_write_op(void *data, u64 value) +{ + run_timer_test(value); + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(timeout_ops, timeout_read_op, timeout_write_op, "%llu\n"); + +static int __init latency_init(void) +{ + struct dentry *temp; + + dir = debugfs_create_dir("latency_test", 0); + if (!dir) { + pr_alert("latency_test: failed to create /sys/kernel/debug/latency_test\n"); + return -1; + } + temp = debugfs_create_file("ipi_cpu_dest", + 0666, + dir, + NULL, + &ipi_ops); + if (!temp) { + pr_alert("latency_test: failed to create /sys/kernel/debug/ipi_cpu_dest\n"); + return -1; + } + debugfs_create_u64("ipi_latency_ns", 0444, dir, &ipi_wakeup.latency_ns); + debugfs_create_u32("ipi_cpu_src", 0444, dir, &ipi_wakeup.src_cpu); + + temp = debugfs_create_file("timeout_expected_ns", + 0666, + dir, + NULL, + &timeout_ops); + if (!temp) { + pr_alert("latency_test: failed to create /sys/kernel/debug/timeout_expected_ns\n"); + return -1; + } + debugfs_create_u64("timeout_diff_ns", 0444, dir, &timer_wakeup.timeout_diff_ns); + debugfs_create_u32("timeout_cpu_src", 0444, dir, &timer_wakeup.src_cpu); + pr_info("Latency Test module loaded\n"); + return 0; +} + +static void __exit latency_cleanup(void) +{ + pr_info("Cleaning up Latency Test module.\n"); + debugfs_remove_recursive(dir); +} + +module_init(latency_init); +module_exit(latency_cleanup); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("IBM Corporation"); +MODULE_DESCRIPTION("Measuring idle latency for IPIs and Timers"); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index d74ac0fd6b2d..e2283790245a 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1375,6 +1375,16 @@ config DEBUG_KOBJECT If you say Y here, some extra kobject debugging messages will be sent to the syslog.
+config IDLE_LATENCY_SELFTEST + tristate "Cpuidle latency selftests" + depends on CPU_IDLE + help + This option provides a kernel module that runs tests using the IPI and + timers to measure latency. + + Say M if you want these self tests to build as a module. + Say N if you are unsure. + config DEBUG_KOBJECT_RELEASE bool "kobject release debugging" depends on DEBUG_OBJECTS_TIMERS
On Tue, Jul 21, 2020 at 2:43 PM Pratik Rajesh Sampat psampat@linux.ibm.com wrote:
Fire directed smp_call_function_single IPIs from a specified source CPU to the specified target CPU to reduce the noise we have to wade through in the trace log.
And what's the purpose of it?
The module is based on the idea written by Srivatsa Bhat and maintained by Vaidyanathan Srinivasan internally.
Queue HR timer and measure jitter. Wakeup latency measurement for idle states using hrtimer. Echo a value in ns to timer_test_function and watch trace. A HRtimer will be queued and when it fires the expected wakeup vs actual wakeup is computes and delay printed in ns.
Implemented as a module which utilizes debugfs so that it can be integrated with selftests.
To include the module, check option and include as module kernel hacking -> Cpuidle latency selftests
[srivatsa.bhat@linux.vnet.ibm.com: Initial implementation in cpidle/sysfs]
[svaidy@linux.vnet.ibm.com: wakeup latency measurements using hrtimer and fix some of the time calculation]
[ego@linux.vnet.ibm.com: Fix some whitespace and tab errors and increase the resolution of IPI wakeup]
Signed-off-by: Pratik Rajesh Sampat psampat@linux.ibm.com Reviewed-by: Gautham R. Shenoy ego@linux.vnet.ibm.com
drivers/cpuidle/Makefile | 1 + drivers/cpuidle/test-cpuidle_latency.c | 150 +++++++++++++++++++++++++ lib/Kconfig.debug | 10 ++ 3 files changed, 161 insertions(+) create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index f07800cbb43f..2ae05968078c 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o +obj-$(CONFIG_IDLE_LATENCY_SELFTEST) += test-cpuidle_latency.o
################################################################################## # ARM SoC drivers diff --git a/drivers/cpuidle/test-cpuidle_latency.c b/drivers/cpuidle/test-cpuidle_latency.c new file mode 100644 index 000000000000..61574665e972 --- /dev/null +++ b/drivers/cpuidle/test-cpuidle_latency.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/*
- Module-based API test facility for cpuidle latency using IPIs and timers
I'd like to see a more detailed description of what it does and how it works here.
- */
+#include <linux/debugfs.h> +#include <linux/kernel.h> +#include <linux/module.h>
+/* IPI based wakeup latencies */ +struct latency {
unsigned int src_cpu;
unsigned int dest_cpu;
ktime_t time_start;
ktime_t time_end;
u64 latency_ns;
+} ipi_wakeup;
+static void measure_latency(void *info) +{
struct latency *v;
ktime_t time_diff;
v = (struct latency *)info;
v->time_end = ktime_get();
time_diff = ktime_sub(v->time_end, v->time_start);
v->latency_ns = ktime_to_ns(time_diff);
+}
+void run_smp_call_function_test(unsigned int cpu) +{
ipi_wakeup.src_cpu = smp_processor_id();
ipi_wakeup.dest_cpu = cpu;
ipi_wakeup.time_start = ktime_get();
smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1);
+}
+/* Timer based wakeup latencies */ +struct timer_data {
unsigned int src_cpu;
u64 timeout;
ktime_t time_start;
ktime_t time_end;
struct hrtimer timer;
u64 timeout_diff_ns;
+} timer_wakeup;
+static enum hrtimer_restart timer_called(struct hrtimer *hrtimer) +{
struct timer_data *w;
ktime_t time_diff;
w = container_of(hrtimer, struct timer_data, timer);
w->time_end = ktime_get();
time_diff = ktime_sub(w->time_end, w->time_start);
time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
w->timeout_diff_ns = ktime_to_ns(time_diff);
return HRTIMER_NORESTART;
+}
+static void run_timer_test(unsigned int ns) +{
hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL);
timer_wakeup.timer.function = timer_called;
timer_wakeup.time_start = ktime_get();
timer_wakeup.src_cpu = smp_processor_id();
timer_wakeup.timeout = ns;
hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns),
HRTIMER_MODE_REL_PINNED);
+}
+static struct dentry *dir;
+static int cpu_read_op(void *data, u64 *value) +{
*value = ipi_wakeup.dest_cpu;
return 0;
+}
+static int cpu_write_op(void *data, u64 value) +{
run_smp_call_function_test(value);
return 0;
+} +DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n");
+static int timeout_read_op(void *data, u64 *value) +{
*value = timer_wakeup.timeout;
return 0;
+}
+static int timeout_write_op(void *data, u64 value) +{
run_timer_test(value);
return 0;
+} +DEFINE_SIMPLE_ATTRIBUTE(timeout_ops, timeout_read_op, timeout_write_op, "%llu\n");
+static int __init latency_init(void) +{
struct dentry *temp;
dir = debugfs_create_dir("latency_test", 0);
if (!dir) {
pr_alert("latency_test: failed to create /sys/kernel/debug/latency_test\n");
return -1;
}
temp = debugfs_create_file("ipi_cpu_dest",
0666,
dir,
NULL,
&ipi_ops);
if (!temp) {
pr_alert("latency_test: failed to create /sys/kernel/debug/ipi_cpu_dest\n");
return -1;
}
debugfs_create_u64("ipi_latency_ns", 0444, dir, &ipi_wakeup.latency_ns);
debugfs_create_u32("ipi_cpu_src", 0444, dir, &ipi_wakeup.src_cpu);
temp = debugfs_create_file("timeout_expected_ns",
0666,
dir,
NULL,
&timeout_ops);
if (!temp) {
pr_alert("latency_test: failed to create /sys/kernel/debug/timeout_expected_ns\n");
return -1;
}
debugfs_create_u64("timeout_diff_ns", 0444, dir, &timer_wakeup.timeout_diff_ns);
debugfs_create_u32("timeout_cpu_src", 0444, dir, &timer_wakeup.src_cpu);
pr_info("Latency Test module loaded\n");
return 0;
+}
+static void __exit latency_cleanup(void) +{
pr_info("Cleaning up Latency Test module.\n");
debugfs_remove_recursive(dir);
+}
+module_init(latency_init); +module_exit(latency_cleanup);
+MODULE_LICENSE("GPL"); +MODULE_AUTHOR("IBM Corporation"); +MODULE_DESCRIPTION("Measuring idle latency for IPIs and Timers"); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index d74ac0fd6b2d..e2283790245a 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1375,6 +1375,16 @@ config DEBUG_KOBJECT If you say Y here, some extra kobject debugging messages will be sent to the syslog.
+config IDLE_LATENCY_SELFTEST
tristate "Cpuidle latency selftests"
depends on CPU_IDLE
help
This option provides a kernel module that runs tests using the IPI and
timers to measure latency.
What latency does it measure?
Say M if you want these self tests to build as a module.
Say N if you are unsure.
config DEBUG_KOBJECT_RELEASE bool "kobject release debugging" depends on DEBUG_OBJECTS_TIMERS -- 2.25.4
Hello Rafael,
On 27/07/20 7:12 pm, Rafael J. Wysocki wrote:
On Tue, Jul 21, 2020 at 2:43 PM Pratik Rajesh Sampat psampat@linux.ibm.com wrote:
Fire directed smp_call_function_single IPIs from a specified source CPU to the specified target CPU to reduce the noise we have to wade through in the trace log.
And what's the purpose of it?
The idea for this comes from that fact that estimating wake-up latencies and residencies for stop states is not an easy task.
The purpose is essentially to determine wakeup latencies, that are caused by either, an IPI or a timer and compare with the advertised wakeup latencies for each stop state.
This might help in determining the accuracy of our advertised values and/or if they need any re-calibration.
The module is based on the idea written by Srivatsa Bhat and maintained by Vaidyanathan Srinivasan internally.
Queue HR timer and measure jitter. Wakeup latency measurement for idle states using hrtimer. Echo a value in ns to timer_test_function and watch trace. A HRtimer will be queued and when it fires the expected wakeup vs actual wakeup is computes and delay printed in ns.
Implemented as a module which utilizes debugfs so that it can be integrated with selftests.
To include the module, check option and include as module kernel hacking -> Cpuidle latency selftests
[srivatsa.bhat@linux.vnet.ibm.com: Initial implementation in cpidle/sysfs]
[svaidy@linux.vnet.ibm.com: wakeup latency measurements using hrtimer and fix some of the time calculation]
[ego@linux.vnet.ibm.com: Fix some whitespace and tab errors and increase the resolution of IPI wakeup]
Signed-off-by: Pratik Rajesh Sampat psampat@linux.ibm.com Reviewed-by: Gautham R. Shenoy ego@linux.vnet.ibm.com
drivers/cpuidle/Makefile | 1 + drivers/cpuidle/test-cpuidle_latency.c | 150 +++++++++++++++++++++++++ lib/Kconfig.debug | 10 ++ 3 files changed, 161 insertions(+) create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index f07800cbb43f..2ae05968078c 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o +obj-$(CONFIG_IDLE_LATENCY_SELFTEST) += test-cpuidle_latency.o
################################################################################## # ARM SoC drivers diff --git a/drivers/cpuidle/test-cpuidle_latency.c b/drivers/cpuidle/test-cpuidle_latency.c new file mode 100644 index 000000000000..61574665e972 --- /dev/null +++ b/drivers/cpuidle/test-cpuidle_latency.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/*
- Module-based API test facility for cpuidle latency using IPIs and timers
I'd like to see a more detailed description of what it does and how it works here.
Right, I'll add that. Based on comments from Daniel I have also been working on a user-space only variant of this test as that does seem like a better way to go.
The only downside is that the latency will be higher, but as we are taking baseline measurements the diff of that from our observed reading should still remain the same. Just that the test will take longer to run. I'm yet to accurately confirm this.
I would appreciate your thoughts on that.
- */
+#include <linux/debugfs.h> +#include <linux/kernel.h> +#include <linux/module.h>
+/* IPI based wakeup latencies */ +struct latency {
unsigned int src_cpu;
unsigned int dest_cpu;
ktime_t time_start;
ktime_t time_end;
u64 latency_ns;
+} ipi_wakeup;
+static void measure_latency(void *info) +{
struct latency *v;
ktime_t time_diff;
v = (struct latency *)info;
v->time_end = ktime_get();
time_diff = ktime_sub(v->time_end, v->time_start);
v->latency_ns = ktime_to_ns(time_diff);
+}
+void run_smp_call_function_test(unsigned int cpu) +{
ipi_wakeup.src_cpu = smp_processor_id();
ipi_wakeup.dest_cpu = cpu;
ipi_wakeup.time_start = ktime_get();
smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1);
+}
+/* Timer based wakeup latencies */ +struct timer_data {
unsigned int src_cpu;
u64 timeout;
ktime_t time_start;
ktime_t time_end;
struct hrtimer timer;
u64 timeout_diff_ns;
+} timer_wakeup;
+static enum hrtimer_restart timer_called(struct hrtimer *hrtimer) +{
struct timer_data *w;
ktime_t time_diff;
w = container_of(hrtimer, struct timer_data, timer);
w->time_end = ktime_get();
time_diff = ktime_sub(w->time_end, w->time_start);
time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
w->timeout_diff_ns = ktime_to_ns(time_diff);
return HRTIMER_NORESTART;
+}
+static void run_timer_test(unsigned int ns) +{
hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL);
timer_wakeup.timer.function = timer_called;
timer_wakeup.time_start = ktime_get();
timer_wakeup.src_cpu = smp_processor_id();
timer_wakeup.timeout = ns;
hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns),
HRTIMER_MODE_REL_PINNED);
+}
+static struct dentry *dir;
+static int cpu_read_op(void *data, u64 *value) +{
*value = ipi_wakeup.dest_cpu;
return 0;
+}
+static int cpu_write_op(void *data, u64 value) +{
run_smp_call_function_test(value);
return 0;
+} +DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n");
+static int timeout_read_op(void *data, u64 *value) +{
*value = timer_wakeup.timeout;
return 0;
+}
+static int timeout_write_op(void *data, u64 value) +{
run_timer_test(value);
return 0;
+} +DEFINE_SIMPLE_ATTRIBUTE(timeout_ops, timeout_read_op, timeout_write_op, "%llu\n");
+static int __init latency_init(void) +{
struct dentry *temp;
dir = debugfs_create_dir("latency_test", 0);
if (!dir) {
pr_alert("latency_test: failed to create /sys/kernel/debug/latency_test\n");
return -1;
}
temp = debugfs_create_file("ipi_cpu_dest",
0666,
dir,
NULL,
&ipi_ops);
if (!temp) {
pr_alert("latency_test: failed to create /sys/kernel/debug/ipi_cpu_dest\n");
return -1;
}
debugfs_create_u64("ipi_latency_ns", 0444, dir, &ipi_wakeup.latency_ns);
debugfs_create_u32("ipi_cpu_src", 0444, dir, &ipi_wakeup.src_cpu);
temp = debugfs_create_file("timeout_expected_ns",
0666,
dir,
NULL,
&timeout_ops);
if (!temp) {
pr_alert("latency_test: failed to create /sys/kernel/debug/timeout_expected_ns\n");
return -1;
}
debugfs_create_u64("timeout_diff_ns", 0444, dir, &timer_wakeup.timeout_diff_ns);
debugfs_create_u32("timeout_cpu_src", 0444, dir, &timer_wakeup.src_cpu);
pr_info("Latency Test module loaded\n");
return 0;
+}
+static void __exit latency_cleanup(void) +{
pr_info("Cleaning up Latency Test module.\n");
debugfs_remove_recursive(dir);
+}
+module_init(latency_init); +module_exit(latency_cleanup);
+MODULE_LICENSE("GPL"); +MODULE_AUTHOR("IBM Corporation"); +MODULE_DESCRIPTION("Measuring idle latency for IPIs and Timers"); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index d74ac0fd6b2d..e2283790245a 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1375,6 +1375,16 @@ config DEBUG_KOBJECT If you say Y here, some extra kobject debugging messages will be sent to the syslog.
+config IDLE_LATENCY_SELFTEST
tristate "Cpuidle latency selftests"
depends on CPU_IDLE
help
This option provides a kernel module that runs tests using the IPI and
timers to measure latency.
What latency does it measure?
It measures latencies incurred on wakeup after an IPI and a timer interrupt.
Say M if you want these self tests to build as a module.
Say N if you are unsure.
- config DEBUG_KOBJECT_RELEASE bool "kobject release debugging" depends on DEBUG_OBJECTS_TIMERS
-- 2.25.4
Thanks, Pratik
This patch adds support to trace IPI based and timer based wakeup latency from idle states
Latches onto the test-cpuidle_latency kernel module using the debugfs interface to send IPIs or schedule a timer based event, which in-turn populates the debugfs with the latency measurements.
Currently for the IPI and timer tests; first disable all idle states and then test for latency measurements incrementally enabling each state
Signed-off-by: Pratik Rajesh Sampat psampat@linux.ibm.com --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/cpuidle/Makefile | 6 + tools/testing/selftests/cpuidle/cpuidle.sh | 310 +++++++++++++++++++++ tools/testing/selftests/cpuidle/settings | 1 + 4 files changed, 318 insertions(+) create mode 100644 tools/testing/selftests/cpuidle/Makefile create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh create mode 100644 tools/testing/selftests/cpuidle/settings
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 1195bd85af38..ab6cf51f3518 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -7,6 +7,7 @@ TARGETS += capabilities TARGETS += cgroup TARGETS += clone3 TARGETS += cpufreq +TARGETS += cpuidle TARGETS += cpu-hotplug TARGETS += drivers/dma-buf TARGETS += efivarfs diff --git a/tools/testing/selftests/cpuidle/Makefile b/tools/testing/selftests/cpuidle/Makefile new file mode 100644 index 000000000000..72fd5d2e974d --- /dev/null +++ b/tools/testing/selftests/cpuidle/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +all: + +TEST_PROGS := cpuidle.sh + +include ../lib.mk diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh b/tools/testing/selftests/cpuidle/cpuidle.sh new file mode 100755 index 000000000000..19cc24ccd4af --- /dev/null +++ b/tools/testing/selftests/cpuidle/cpuidle.sh @@ -0,0 +1,310 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +LOG=cpuidle.log +MODULE=/lib/modules/$(uname -r)/kernel/drivers/cpuidle/test-cpuidle_latency.ko + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +helpme() +{ + printf "Usage: $0 [-h] [-todg args] + [-h <help>] + [-m <location of the module>] + [-o <location of the output>] + \n" + exit 2 +} + +parse_arguments() +{ + while getopts ht:m:o: arg + do + case $arg in + h) # --help + helpme + ;; + m) # --mod-file + MODULE=$OPTARG + ;; + o) # output log files + LOG=$OPTARG + ;; + ?) + helpme + ;; + esac + done +} + +ins_mod() +{ + if [ ! -f "$MODULE" ]; then + printf "$MODULE module does not exist. Exitting\n" + exit $ksft_skip + fi + # Check if the module is already loaded + grep "test_cpuidle_latency" /proc/modules + if [ $? == 0 ]; then + printf "Module: $MODULE already loaded\n\n" + return 0 + fi + printf "Inserting $MODULE module\n\n" + insmod $MODULE + if [ $? != 0 ]; then + printf "Insmod $MODULE failed\n" + exit $ksft_skip + fi +} + +compute_average() +{ + arr=("$@") + sum=0 + size=${#arr[@]} + if [ $size == 0 ]; then + avg=0 + return 1 + fi + for i in "${arr[@]}" + do + sum=$((sum + i)) + done + avg=$((sum/size)) +} + +# Disable all stop states +disable_idle() +{ + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + for ((state=0; state<NUM_STATES; state++)) + do + echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable + done + done +} + +# Perform operation on each CPU for the given state +# $1 - Operation: enable (0) / disable (1) +# $2 - State to enable +op_state() +{ + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + echo $1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable + done +} + +cpuidle_enable_state() +{ + state=$1 + op_state 0 $state +} + +cpuidle_disable_state() +{ + state=$1 + op_state 1 $state +} + +cpu_is_online() +{ + cpu=$1 + status=$(cat /sys/devices/system/cpu/cpu$cpu/online) + echo $status +} + +# Extract latency in microseconds and convert to nanoseconds +extract_latency() +{ + for ((state=0; state<NUM_STATES; state++)) + do + latency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/latency) * 1000)) + latency_arr+=($latency) + done +} + +# Run the IPI test +# $1 run for baseline - busy cpu or regular environment +# $2 destination cpu +ipi_test_once() +{ + dest_cpu=$2 + if [ "$1" = "baseline" ]; then + # Keep the CPU busy + taskset -c $dest_cpu cat /dev/random > /dev/null & + task_pid=$! + # Wait for the workload to achieve 100% CPU usage + sleep 1 + fi + taskset 0x1 echo $dest_cpu > /sys/kernel/debug/latency_test/ipi_cpu_dest + ipi_latency=$(cat /sys/kernel/debug/latency_test/ipi_latency_ns) + src_cpu=$(cat /sys/kernel/debug/latency_test/ipi_cpu_src) + if [ "$1" = "baseline" ]; then + kill $task_pid + wait $task_pid 2>/dev/null + fi +} + +# Incrementally Enable idle states one by one and compute the latency +run_ipi_tests() +{ + extract_latency + disable_idle + declare -a avg_arr + echo -e "--IPI Latency Test---" >> $LOG + + echo -e "--Baseline IPI Latency measurement: CPU Busy--" >> $LOG + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + ipi_test_once "baseline" $cpu + printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG + avg_arr+=($ipi_latency) + done + compute_average "${avg_arr[@]}" + echo -e "Baseline Average IPI latency(ns): $avg" >> $LOG + + for ((state=0; state<NUM_STATES; state++)) + do + unset avg_arr + echo -e "---Enabling state: $state---" >> $LOG + cpuidle_enable_state $state + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + # Running IPI test and logging results + sleep 1 + ipi_test_once "test" $cpu + printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG + avg_arr+=($ipi_latency) + done + compute_average "${avg_arr[@]}" + echo -e "Expected IPI latency(ns): ${latency_arr[$state]}" >> $LOG + echo -e "Observed Average IPI latency(ns): $avg" >> $LOG + cpuidle_disable_state $state + done +} + +# Extract the residency in microseconds and convert to nanoseconds. +# Add 100 ns so that the timer stays for a little longer than the residency +extract_residency() +{ + for ((state=0; state<NUM_STATES; state++)) + do + residency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/residency) * 1000 + 200)) + residency_arr+=($residency) + done +} + +# Run the Timeout test +# $1 run for baseline - busy cpu or regular environment +# $2 destination cpu +# $3 timeout +timeout_test_once() +{ + dest_cpu=$2 + if [ "$1" = "baseline" ]; then + # Keep the CPU busy + taskset -c $dest_cpu cat /dev/random > /dev/null & + task_pid=$! + # Wait for the workload to achieve 100% CPU usage + sleep 1 + fi + taskset -c $dest_cpu echo $3 > /sys/kernel/debug/latency_test/timeout_expected_ns + # Wait for the result to populate + sleep 0.1 + timeout_diff=$(cat /sys/kernel/debug/latency_test/timeout_diff_ns) + src_cpu=$(cat /sys/kernel/debug/latency_test/timeout_cpu_src) + if [ "$1" = "baseline" ]; then + kill $task_pid + wait $task_pid 2>/dev/null + fi +} + +run_timeout_tests() +{ + extract_residency + disable_idle + declare -a avg_arr + echo -e "\n--Timeout Latency Test--" >> $LOG + + echo -e "--Baseline Timeout Latency measurement: CPU Busy--" >> $LOG + printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + timeout_test_once "baseline" $cpu ${residency_arr[0]} + printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG + avg_arr+=($timeout_diff) + done + compute_average "${avg_arr[@]}" + echo -e "Baseline Average timeout diff(ns): $avg" >> $LOG + + for ((state=0; state<NUM_STATES; state++)) + do + unset avg_arr + echo -e "---Enabling state: $state---" >> $LOG + cpuidle_enable_state $state + printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)" "Delay(ns)" >> $LOG + for ((cpu=0; cpu<NUM_CPUS; cpu++)) + do + local cpu_status=$(cpu_is_online $cpu) + if [ $cpu_status == 0 ]; then + continue + fi + timeout_test_once "test" $cpu ${residency_arr[$state]} + printf "%-3s %13s %18s\n" $src_cpu $baseline_timeout_diff $timeout_diff >> $LOG + avg_arr+=($timeout_diff) + done + compute_average "${avg_arr[@]}" + echo -e "Expected timeout(ns): ${residency_arr[$state]}" >> $LOG + echo -e "Observed Average timeout diff(ns): $avg" >> $LOG + cpuidle_disable_state $state + done +} + +declare -a residency_arr +declare -a latency_arr + +# Parse arguments +parse_arguments $@ + +rm -f $LOG +touch $LOG +NUM_CPUS=$(nproc --all) +NUM_STATES=$(ls -1 /sys/devices/system/cpu/cpu0/cpuidle/ | wc -l) + +# Insert the module +ins_mod $MODULE + +printf "Started IPI latency tests\n" +run_ipi_tests + +printf "Started Timer latency tests\n" +run_timeout_tests + +printf "Removing $MODULE module\n" +printf "Output logged at: $LOG\n" +rmmod $MODULE diff --git a/tools/testing/selftests/cpuidle/settings b/tools/testing/selftests/cpuidle/settings new file mode 100644 index 000000000000..e7b9417537fb --- /dev/null +++ b/tools/testing/selftests/cpuidle/settings @@ -0,0 +1 @@ +timeout=0
On 21/07/2020 14:42, Pratik Rajesh Sampat wrote:
v2: https://lkml.org/lkml/2020/7/17/369 Changelog v2-->v3 Based on comments from Gautham R. Shenoy adding the following in the selftest,
- Grepping modules to determine if already loaded
- Wrapper to enable/disable states
- Preventing any operation/test on offlined CPUs
The patch series introduces a mechanism to measure wakeup latency for IPI and timer based interrupts The motivation behind this series is to find significant deviations behind advertised latency and resisdency values
Why do you want to measure for the timer and the IPI ? Whatever the source of the wakeup, the exit latency remains the same, no ?
Is all this kernel-ish code really needed ?
What about using a highres periodic timer and make it expires every eg. 50ms x 2400, so it is 120 secondes and measure the deviation. Repeat the operation for each idle states.
And in order to make it as much accurate as possible, set the program affinity on a CPU and isolate this one by preventing other processes to be scheduled on and migrate the interrupts on the other CPUs.
That will be all userspace code, no?
Hello Daniel,
On 21/07/20 8:27 pm, Daniel Lezcano wrote:
On 21/07/2020 14:42, Pratik Rajesh Sampat wrote:
v2: https://lkml.org/lkml/2020/7/17/369 Changelog v2-->v3 Based on comments from Gautham R. Shenoy adding the following in the selftest,
- Grepping modules to determine if already loaded
- Wrapper to enable/disable states
- Preventing any operation/test on offlined CPUs
The patch series introduces a mechanism to measure wakeup latency for IPI and timer based interrupts The motivation behind this series is to find significant deviations behind advertised latency and resisdency values
Why do you want to measure for the timer and the IPI ? Whatever the source of the wakeup, the exit latency remains the same, no ?
Is all this kernel-ish code really needed ?
What about using a highres periodic timer and make it expires every eg. 50ms x 2400, so it is 120 secondes and measure the deviation. Repeat the operation for each idle states.
And in order to make it as much accurate as possible, set the program affinity on a CPU and isolate this one by preventing other processes to be scheduled on and migrate the interrupts on the other CPUs.
That will be all userspace code, no?
The kernel module may not needed now that you mention it. IPI latencies could be measured using pipes and threads using pthread_attr_setaffinity_np to control the experiment, as you suggested. This should internally fire a smp_call_function_single.
The original idea was to essentially measure it as closely as possible in the kernel without involving the kernel-->userspace overhead. However, the user-space approach may not be too much of a problem as we are collecting a baseline and the delta of the latency is what we would be concerned about anyways!
With respect to measuring both timers and IPI latencies: In principle yes, the exit latency should remain the same but if there is a deviation in reality we may want to measure it.
I'll implement this experiment in the userspace and get back with the numbers to confirm.
Thanks for your comments! Best, Pratik
linux-kselftest-mirror@lists.linaro.org