Add a new kselftest to detect and report slowdowns in key boot events. The test uses ftrace to track timings for specific boot events and compares these timestamps against reference values provided in YAML format.
The test includes the following files:
- `bootconfig` file: configures ftrace and lists reference key boot events. - `config` fragment: enables boot time tracing and attaches the bootconfig file to the kernel image. - `kprobe_timestamps_to_yaml.py` script: parses the current trace file to extract event names and timestamps and writes them to a YAML file. The script is intended to be run once to generate initial reference values; the generated file is not meant to be stored in the kernel sources but should be provided as input to the test itself. YAML format was chosen to allow easy integration with per-platform data used in other tests, such as the discoverable devices probe test in tools/testing/selftests/devices. Another option is to use JSON, as the file is not intended for manual editing and JSON is already supported by the Python standard library. - `test_boot_time.py` script: parses the current trace file and compares timestamps against the values in the YAML file provided as input. Reports a failure if any timestamp differs from the reference value by more than the specified delta. - `trace_utils.py` file: utility functions to mount debugfs and parse the trace file to extract relevant information.
The bootconfig file provided is an initial draft with some reference kprobe events to showcase how the test works. I would appreciate feedback from those interested in running this test on which boot events should be added. Different key events might be relevant depending on the platform and its boot time requirements. This file should serve as a common ground and be populated with critical events and functions common to different platforms.
Feedback on the overall approach of this test and suggestions for additional boot events to trace would be greatly appreciated.
Example output with a deliberately small delta of 0.01 to demonstrate failures:
TAP version 13 1..4 ok 1 populate_rootfs_begin # 'run_init_process_begin' differs by 0.033990 seconds. not ok 2 run_init_process_begin # 'run_init_process_end' differs by 0.033796 seconds. not ok 3 run_init_process_end ok 4 unpack_to_rootfs_begin # Totals: pass:2 fail:2 xfail:0 xpass:0 skip:0 error:0
This patch depends on "kselftest: Move ksft helper module to common directory": https://lore.kernel.org/all/20240705-dev-err-log-selftest-v2-2-163b9cd7b3c1@... which was picked through the usb tree and is queued for 6.11-rc1.
Best,
Laura
Laura Nao (1): kselftests: Add test to detect boot event slowdowns
tools/testing/selftests/Makefile | 1 + tools/testing/selftests/boot-time/Makefile | 17 ++++ tools/testing/selftests/boot-time/bootconfig | 8 ++ tools/testing/selftests/boot-time/config | 4 + .../boot-time/kprobe_timestamps_to_yaml.py | 55 +++++++++++ .../selftests/boot-time/test_boot_time.py | 94 +++++++++++++++++++ .../selftests/boot-time/trace_utils.py | 63 +++++++++++++ 7 files changed, 242 insertions(+) create mode 100644 tools/testing/selftests/boot-time/Makefile create mode 100644 tools/testing/selftests/boot-time/bootconfig create mode 100644 tools/testing/selftests/boot-time/config create mode 100755 tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py create mode 100644 tools/testing/selftests/boot-time/trace_utils.py
-- 2.30.2
Introduce a new kselftest to identify slowdowns in key boot events. The test uses ftrace to track timings for specific boot events. The kprobe_timestamps_to_yaml.py script can be run once to generate a YAML file with the initial reference timestamps for these events. The test_boot_time.py script can then be run on subsequent boots to compare current timings against the reference values and check for significant slowdowns over time. The test ships with a bootconfig file for ftrace setup and a config fragment for the necessary kernel configurations.
Signed-off-by: Laura Nao laura.nao@collabora.com --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/boot-time/Makefile | 17 ++++ tools/testing/selftests/boot-time/bootconfig | 8 ++ tools/testing/selftests/boot-time/config | 4 + .../boot-time/kprobe_timestamps_to_yaml.py | 55 +++++++++++ .../selftests/boot-time/test_boot_time.py | 94 +++++++++++++++++++ .../selftests/boot-time/trace_utils.py | 63 +++++++++++++ 7 files changed, 242 insertions(+) create mode 100644 tools/testing/selftests/boot-time/Makefile create mode 100644 tools/testing/selftests/boot-time/bootconfig create mode 100644 tools/testing/selftests/boot-time/config create mode 100755 tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py create mode 100644 tools/testing/selftests/boot-time/trace_utils.py
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index bc8fe9e8f7f2..1e675779322f 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -2,6 +2,7 @@ TARGETS += alsa TARGETS += amd-pstate TARGETS += arm64 +TARGETS += boot-time TARGETS += bpf TARGETS += breakpoints TARGETS += cachestat diff --git a/tools/testing/selftests/boot-time/Makefile b/tools/testing/selftests/boot-time/Makefile new file mode 100644 index 000000000000..43ccd0a3b62e --- /dev/null +++ b/tools/testing/selftests/boot-time/Makefile @@ -0,0 +1,17 @@ +PY3 = $(shell which python3 2>/dev/null) + +ifneq ($(PY3),) + +TEST_PROGS := test_boot_time.py +TEST_FILES := trace_utils.py kprobe_timestamps_to_yaml.py + +include ../lib.mk + +else + +all: no_py3_warning + +no_py3_warning: + @echo "Missing python3. This test will be skipped." + +endif \ No newline at end of file diff --git a/tools/testing/selftests/boot-time/bootconfig b/tools/testing/selftests/boot-time/bootconfig new file mode 100644 index 000000000000..2883f03e0766 --- /dev/null +++ b/tools/testing/selftests/boot-time/bootconfig @@ -0,0 +1,8 @@ +ftrace { + event.kprobes { + populate_rootfs_begin.probes = "populate_rootfs" + unpack_to_rootfs_begin.probes = "unpack_to_rootfs" + run_init_process_begin.probes = "run_init_process" + run_init_process_end.probes = "run_init_process%return" + } +} \ No newline at end of file diff --git a/tools/testing/selftests/boot-time/config b/tools/testing/selftests/boot-time/config new file mode 100644 index 000000000000..957b40b45bc0 --- /dev/null +++ b/tools/testing/selftests/boot-time/config @@ -0,0 +1,4 @@ +CONFIG_TRACING=y +CONFIG_BOOTTIME_TRACING=y +CONFIG_BOOT_CONFIG_EMBED=y +CONFIG_BOOT_CONFIG_EMBED_FILE="tools/testing/selftests/boot-time/bootconfig" \ No newline at end of file diff --git a/tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py b/tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py new file mode 100755 index 000000000000..43e742e0759d --- /dev/null +++ b/tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py @@ -0,0 +1,55 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2024 Collabora Ltd +# +# +# This script reads the /sys/kernel/debug/tracing/trace file, extracts kprobe +# event names and timestamps, and saves them to a YAML file. +# +# Example of the YAML content generated: +# +# - event: populate_rootfs_begin +# timestamp: 0.525161 +# - event: unpack_to_rootfs_begin +# timestamp: 0.525169 +# +# Usage: +# ./kprobe_timestamps_to_yaml.py [output_file] +# + +import sys +import argparse +import yaml + +from trace_utils import mount_debugfs, parse_kprobe_event_trace + + +def write_trace_timestamps_to_yaml(output_file_path): + """ + Parse /sys/kernel/debug/tracing/trace file to extract kprobe event + names and timestamps, and write them to a YAML file. + @output_file_path: the path for the output YAML file, defaults to + 'kprobe-timestamps.yaml'. + """ + trace_entries = parse_kprobe_event_trace("/sys/kernel/debug/tracing/trace") + filtered_entries = {entry['event']: entry['timestamp'] for entry in trace_entries} + + with open(output_file_path, 'w', encoding="utf-8") as yaml_file: + yaml.dump(filtered_entries, yaml_file) + + print(f"Generated {output_file_path}") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Extract kprobe event names and timestamps from the \ + /sys/kernel/debug/tracing/trace file and save them to a YAML file.") + parser.add_argument('output_file_path', nargs='?', default='kprobe-timestamps.yaml', + help="path for the output YAML file") + args = parser.parse_args() + + if not mount_debugfs("/sys/kernel/debug"): + sys.exit(1) + + write_trace_timestamps_to_yaml(args.output_file_path) diff --git a/tools/testing/selftests/boot-time/test_boot_time.py b/tools/testing/selftests/boot-time/test_boot_time.py new file mode 100755 index 000000000000..8a8de28c485c --- /dev/null +++ b/tools/testing/selftests/boot-time/test_boot_time.py @@ -0,0 +1,94 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2024 Collabora Ltd +# +# This script reads the /sys/kernel/debug/tracing/trace file, extracts kprobe +# event names and timestamps, and compares them against reference timestamps +# provided in an input YAML file to identify significant slowdowns in +# the reference kprobe events. +# +# Usage: +# ./test_boot_time.py <kprobe_timestamps_file> <delta(seconds)> + +import os +import sys +import argparse +import yaml +from trace_utils import mount_debugfs, parse_kprobe_event_trace + +this_dir = os.path.dirname(os.path.realpath(__file__)) +sys.path.append(os.path.join(this_dir, "../kselftest/")) + +import ksft + +def load_kprobe_timestamps_from_yaml(file_path): + """ + Load YAML file containing kprobe event timestamps. Return a dictionary + with event names as keys and the corresponding timestamps as values. + For example: + {'populate_rootfs_begin': 0.525161, 'unpack_to_rootfs_begin': 0.525169} + @file_path: path to a YAML file containing the kprobe event timestamps. + """ + try: + with open(file_path, 'r', encoding="utf-8") as file: + return yaml.safe_load(file) + except FileNotFoundError: + ksft.print_msg(f"Error: File {file_path} not found.") + ksft.exit_fail() + except yaml.YAMLError as e: + ksft.print_msg(f"Error: Failed to parse YAML file {file_path}. Error: {e}") + ksft.exit_fail() + + +def evaluate_kprobe_event_delays(ref_entries, delta): + """ + Read the /sys/kernel/debug/tracing/trace file to extract kprobe event + names and timestamps, then compare these timestamps with the reference + values provided. Report a failure for each timestamp that deviates from + the reference by more than the specified delta. + @ref_entries: dictionary containing event names and their timestamps + @delta: maximum allowed difference in seconds between the current and + reference timestamps + """ + ksft.set_plan(len(ref_entries)) + + trace_entries = {entry['event']: entry['timestamp'] + for entry in parse_kprobe_event_trace("/sys/kernel/debug/tracing/trace")} + + for ref_event, ref_timestamp in ref_entries.items(): + if ref_event in trace_entries: + timestamp = trace_entries[ref_event] + if timestamp >= ref_timestamp and (timestamp - ref_timestamp) >= delta: + ksft.print_msg(f"'{ref_event}' differs by " + f"{(timestamp - ref_timestamp):.6f} seconds.") + ksft.test_result_fail(ref_event) + else: + ksft.test_result_pass(ref_event) + else: + ksft.print_msg(f"Reference event '{ref_event}' not found in trace.") + ksft.test_result_skip(ref_event) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Read the /sys/kernel/debug/tracing/trace file, extract kprobe event names \ + and timestamps, and compares them against reference timestamps provided in an input YAML \ + file to report significant slowdowns in the reference kprobe events.") + parser.add_argument('kprobe_timestamps_file', help="path to YAML file containing \ + kprobe timestamps") + parser.add_argument('delta', type=float, help="maximum allowed difference in seconds \ + (float) between the current and reference timestamps") + args = parser.parse_args() + + # Ensure debugfs is mounted + if not mount_debugfs("/sys/kernel/debug"): + ksft.exit_fail() + + ksft.print_header() + + ref_entries = load_kprobe_timestamps_from_yaml(args.kprobe_timestamps_file) + + evaluate_kprobe_event_delays(ref_entries, args.delta) + + ksft.finished() diff --git a/tools/testing/selftests/boot-time/trace_utils.py b/tools/testing/selftests/boot-time/trace_utils.py new file mode 100644 index 000000000000..c40cef2bf584 --- /dev/null +++ b/tools/testing/selftests/boot-time/trace_utils.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2024 Collabora Ltd +# +# Utility functions for parsing trace files and mounting debugfs. +# + +import re +import subprocess + + +def mount_debugfs(path): + """ + Mount debugfs at the specified path if it is not already mounted. + @path: path where debugfs should be mounted + """ + # Check if debugfs is already mounted + with open('/proc/mounts', 'r', encoding="utf-8") as mounts: + for line in mounts: + if 'debugfs' in line and path in line: + print(f"debugfs is already mounted at {path}") + return True + + # Mount debugfs + try: + subprocess.run(['mount', '-t', 'debugfs', 'none', path], check=True) + return True + except subprocess.CalledProcessError as e: + print(f"Failed to mount debugfs: {e.stderr}") + return False + + +def parse_kprobe_event_trace(trace_file_path): + """ + Parse a trace file containing kprobe events and return a list of + dictionaries, each representing a trace entry. + + Example of trace entry: + {'task_pid': 'sh-1', 'timestamp': 0.256527, 'event': \ + 'populate_rootfs_begin', 'function': 'populate_rootfs+0x4/0x50'} + + @trace_file_path: path to the trace file + """ + trace_pattern = r'^\s*(\S+)\s+[.*]\s+\S+\s+(\d+.\d+):\s+(\S+):\s+((.*))$' + + trace_entries = [] + + # Read trace file and parse entries + with open(trace_file_path, 'r', encoding="utf-8") as trace_file: + for line in trace_file: + match = re.match(trace_pattern, line) + if match: + trace_entry = { + 'task_pid': match.group(1), + 'timestamp': float(match.group(2)), + 'event': match.group(3), + 'function': match.group(4) + } + + trace_entries.append(trace_entry) + + return trace_entries
On 7/25/24 05:06, Laura Nao wrote:
Introduce a new kselftest to identify slowdowns in key boot events. The test uses ftrace to track timings for specific boot events. The kprobe_timestamps_to_yaml.py script can be run once to generate a YAML file with the initial reference timestamps for these events. The test_boot_time.py script can then be run on subsequent boots to compare current timings against the reference values and check for significant slowdowns over time. The test ships with a bootconfig file for ftrace setup and a config fragment for the necessary kernel configurations.
Signed-off-by: Laura Nao laura.nao@collabora.com
I am repeating the same comments I made on the cover letter here as well.
What are the dependencies if any for this new test to work? Please do remember that tests in default run needs to have minimal dependencies so they can run on systems that have minimal support.
As mentioned earlier take a look at the tools/power/pm-graph bootgraph.py and sleepgraph.py to see if you can leverage them - bootgraph detects slowdowns during boot.
We don't want to add duplicate scripts if the other one serves the needs. Those can be moved to selftests if it make sense.
I will review this once we figure out if bootgraph serves the needs and I understand the dependencies for this test to work.
thanks, -- Shuah
Hi Shuah,
On 7/25/24 17:50, Shuah Khan wrote:
On 7/25/24 05:06, Laura Nao wrote:
Introduce a new kselftest to identify slowdowns in key boot events. The test uses ftrace to track timings for specific boot events. The kprobe_timestamps_to_yaml.py script can be run once to generate a YAML file with the initial reference timestamps for these events. The test_boot_time.py script can then be run on subsequent boots to compare current timings against the reference values and check for significant slowdowns over time. The test ships with a bootconfig file for ftrace setup and a config fragment for the necessary kernel configurations.
Signed-off-by: Laura Nao laura.nao@collabora.com
I am repeating the same comments I made on the cover letter here as well.
What are the dependencies if any for this new test to work? Please do remember that tests in default run needs to have minimal dependencies so they can run on systems that have minimal support.
In order to run this test the kernel needs to be compiled with the provided config fragment, which enables tracing and embeds the provided bootconfig file in the kernel. Additionally, a YAML file with reference timestamps must be supplied as input to the test.
As mentioned earlier take a look at the tools/power/pm-graph bootgraph.py and sleepgraph.py to see if you can leverage them - bootgraph detects slowdowns during boot.
We don't want to add duplicate scripts if the other one serves the needs. Those can be moved to selftests if it make sense.
I will review this once we figure out if bootgraph serves the needs and I understand the dependencies for this test to work.
Thanks for the pointers!
It looks like sleepgraph.py is more focused on analyzing suspend/resume timings, while bootgraph.py measures boot time using the kernel log and ftrace. The latter might indeed come in handy. As far as I can see, the script doesn't support automatic detection of boot slowdowns, and the output is in HTML format, which is meant for human analysis. However, I can look into adding support for a more machine-readable output format too. The test proposed in this patch could then use bootgraph.py to generate the reference file and measure current boot timings.
I'll look into this and report back.
Thanks,
Laura
On 7/31/24 11:25, Laura Nao wrote:
It looks like sleepgraph.py is more focused on analyzing suspend/resume timings, while bootgraph.py measures boot time using the kernel log and ftrace. The latter might indeed come in handy. As far as I can see, the script doesn't support automatic detection of boot slowdowns, and the output is in HTML format, which is meant for human analysis. However, I can look into adding support for a more machine-readable output format too. The test proposed in this patch could then use bootgraph.py to generate the reference file and measure current boot timings.
I'll look into this and report back.
After examining the bootgraph.py script, it seems feasible to add support for generating the output in a machine-readable format (e.g., JSON) for automated analysis. Todd, I've CC'd you on this discussion in case you have feedback on possibly using bootgraph.py in an automated test to detect slowdowns.
Some points to consider:
- The bootgraph.py script supports ftrace through the -fstat and -ftrace options, and it parses the kernel log to get initcall timings. To use this in an automated test, we need a way to provide the necessary command line options. One approach is to include these options in a bootconfig file embedded in the kernel image (as per proposal in this RFC). Shuah, do you think this is acceptable? I haven't seen other tests doing this, so I'm unsure if this is a proper way to handle required command line options in a selftest.
- The bootgraph.py script tracks timings for all init calls, which might be excessive and generate too much output when integrated in an automated test. We might need to limit the test output to report only significant slowdowns to make it manageable.
- I'd like to get some feedback on which key boot process events are more relevant to track; depending on this, we could use the bootgraph.py script to monitor initcalls and possibly other events tracked via ftrace. The script currently uses the function_graph tracer, and its parser is designed for this tracer's output. If we need to track other events (e.g., kprobe events), the parser might need some adjustments.
I'll be discussing this at LPC in September (https://lpc.events/event/18/contributions/1700/) and look forward to exploring more details and alternative approaches for an automated boot time test.
Best,
Laura
On 7/25/24 05:06, Laura Nao wrote:
Add a new kselftest to detect and report slowdowns in key boot events. The test uses ftrace to track timings for specific boot events and compares these timestamps against reference values provided in YAML format.
The test includes the following files:
- `bootconfig` file: configures ftrace and lists reference key boot
events.
- `config` fragment: enables boot time tracing and attaches the bootconfig file to the kernel image.
- `kprobe_timestamps_to_yaml.py` script: parses the current trace file to extract event names and timestamps and writes them to a YAML file. The script is intended to be run once to generate initial reference values; the generated file is not meant to be stored in the kernel sources but should be provided as input to the test itself. YAML format was chosen to allow easy integration with per-platform data used in other tests, such as the discoverable devices probe test in tools/testing/selftests/devices. Another option is to use JSON, as the file is not intended for manual editing and JSON is already supported by the Python standard library.
- `test_boot_time.py` script: parses the current trace file and compares timestamps against the values in the YAML file provided as input. Reports a failure if any timestamp differs from the reference value by more than the specified delta.
- `trace_utils.py` file: utility functions to mount debugfs and parse the trace file to extract relevant information.
The bootconfig file provided is an initial draft with some reference kprobe events to showcase how the test works. I would appreciate feedback from those interested in running this test on which boot events should be added. Different key events might be relevant depending on the platform and its boot time requirements. This file should serve as a common ground and be populated with critical events and functions common to different platforms.
Feedback on the overall approach of this test and suggestions for additional boot events to trace would be greatly appreciated.
You don't need a cover letter for a single patch btw.
Please take a look at the tools/power/pm-graph bootgraph.py and sleepgraph.py to see if you can leverage them - bootgraph detects slowdowns during boot.
Example output with a deliberately small delta of 0.01 to demonstrate failures:
TAP version 13 1..4 ok 1 populate_rootfs_begin # 'run_init_process_begin' differs by 0.033990 seconds. not ok 2 run_init_process_begin # 'run_init_process_end' differs by 0.033796 seconds. not ok 3 run_init_process_end ok 4 unpack_to_rootfs_begin # Totals: pass:2 fail:2 xfail:0 xpass:0 skip:0 error:0
This patch depends on "kselftest: Move ksft helper module to common directory": https://lore.kernel.org/all/20240705-dev-err-log-selftest-v2-2-163b9cd7b3c1@... which was picked through the usb tree and is queued for 6.11-rc1.
This tool does need some improvements based on the thread that includes skips.
Best,
Laura
Laura Nao (1): kselftests: Add test to detect boot event slowdowns
As mentioned earlier take a look at the tools/power/pm-graph bootgraph.py and sleepgraph.py to see if you can leverage them - bootgraph detects slowdowns during boot.
We don't want to add duplicate scripts if the other one serves the needs. Those can be moved to selftests if it make sense.
What are the dependencies if any for this new test to work? Please do remember that tests in default run needs to have minimal dependencies so they can run on systems that have minimal support.
tools/testing/selftests/Makefile | 1 + tools/testing/selftests/boot-time/Makefile | 17 ++++ tools/testing/selftests/boot-time/bootconfig | 8 ++ tools/testing/selftests/boot-time/config | 4 + .../boot-time/kprobe_timestamps_to_yaml.py | 55 +++++++++++ .../selftests/boot-time/test_boot_time.py | 94 +++++++++++++++++++ .../selftests/boot-time/trace_utils.py | 63 +++++++++++++ 7 files changed, 242 insertions(+) create mode 100644 tools/testing/selftests/boot-time/Makefile create mode 100644 tools/testing/selftests/boot-time/bootconfig create mode 100644 tools/testing/selftests/boot-time/config create mode 100755 tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py create mode 100644 tools/testing/selftests/boot-time/trace_utils.py
-- 2.30.2
thanks, -- Shuah
linux-kselftest-mirror@lists.linaro.org