Introduce a new test to identify regressions causing devices to go missing on the system.
For each bus and class on the system the test checks the number of devices present against a reference file, which needs to have been generated by the program at a previous point on a known-good kernel, and if there are missing devices they are reported.
Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- Hi,
For details about the test, please see the README.rst included in the patch.
This v2 contains changes addressing feedback received on the RFCv1 series, during the session at plumbers [1] and a few other things I noticed along the way.
[1] https://www.youtube.com/live/kcr8NXEbzcg?si=QWBvJAOjj7tg264o&t=11283
For the open questions I posted in v1, the v2 changelog below should make it clear what was decided. A few clarifications are needed though: * I've decided to leave driver probe out of this test to keep it simple and avoid potential false-positives * The reference file now includes the full kernel config as part of its metadata (Example at [2]). This is clunky but seems worth it for the purposes of reproducibility, and potentially (in the future) choosing the reference that best matches the running system
[2] https://github.com/kernelci/platform-test-parameters/pull/3/files
Let me know your thoughts.
Thanks, Nícolas --- Changes in v2: - Switched reference format from YAML to JSON - Introduced metadata to reference file, it includes: kernel version, kernel configuration and platform identifier - Added -u flag to allow updating reference file in-place if it is a superset - Added -f flag to allow specifying filename of the reference - Added a few device properties (., device, firmware_node, driver) - Un-ignored devlink device class - Refactored code to improve legibility - Added README.rst with documentation - Renamed test from exist.py to test_dev_exist.py - Link to v1: https://lore.kernel.org/r/20240724-kselftest-dev-exist-v1-1-9bc21aa761b5@col... --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/devices/exist/Makefile | 3 + tools/testing/selftests/devices/exist/README.rst | 146 +++++++++ .../selftests/devices/exist/test_dev_exist.py | 357 +++++++++++++++++++++ 4 files changed, 507 insertions(+)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index b38199965f99..eacf4b062f01 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -15,6 +15,7 @@ TARGETS += cpufreq TARGETS += cpu-hotplug TARGETS += damon TARGETS += devices/error_logs +TARGETS += devices/exist TARGETS += devices/probe TARGETS += dmabuf-heaps TARGETS += drivers/dma-buf diff --git a/tools/testing/selftests/devices/exist/Makefile b/tools/testing/selftests/devices/exist/Makefile new file mode 100644 index 000000000000..df85f661aa99 --- /dev/null +++ b/tools/testing/selftests/devices/exist/Makefile @@ -0,0 +1,3 @@ +TEST_PROGS := test_dev_exist.py + +include ../../lib.mk diff --git a/tools/testing/selftests/devices/exist/README.rst b/tools/testing/selftests/devices/exist/README.rst new file mode 100644 index 000000000000..1599204e355d --- /dev/null +++ b/tools/testing/selftests/devices/exist/README.rst @@ -0,0 +1,146 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. Copyright (c) 2024 Collabora Ltd + +========================== +Device existence kselftest +========================== + +This test verifies whether all devices still exist on the system when compared +to a reference run, allowing detection of regressions that cause devices to go +missing. + +TL;DR +===== + +Run ``./test_dev_exist.py -g``, then run ``./test_dev_exist.py``. + +Usage +===== + +The test program can be found as ``test_dev_exist.py`` in this directory. Run it +with the ``--help`` argument to get information for all available arguments. +Detailed usage follows below. + +Reference generation +-------------------- + +Before running the test, it is necessary to generate a reference. To do that, +run it with the ``--generate-reference`` argument. This will generate a JSON +file encoding all the devices available, per subsystem (class or bus), in the +running system, as well as metadata about the system (kernel version, +configuration and system identifiers). + +By default, the file will be saved in the current directory and named based on +the system identifier, but that can be changed through the use of the +``--reference-dir`` and ``--reference-file`` flags. + +Running the test +---------------- + +To run the test, simply execute it **without** the ``--generate-reference`` +argument. By default, once again, the test will look for the reference file in +the current directory and named as the system identifier, but that can be +changed through the ``--reference-dir`` and ``--reference-file`` flags. + +Reading the results +------------------- + +The test outputs in the KTAP format, with one result per subsystem. For each +failure the output shows the devices that were expected by the reference file, +the devices that were found in the running system, and a best-effort guess for +the devices that are missing in the system. For each device, its main properties +are printed out to help in identifying it. + +As an example, below is the snippet printed when one of the three devices in the +media bus went missing:: + + # Missing devices for subsystem 'media': 1 (Expected 3, found 2) + # ================= + # Devices expected: + # + # .: + # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-8/3-8.3/3-8.3.2/3-8.3.2:1.0/media2 + # uevent: + # MAJOR=237 + # MINOR=2 + # DEVNAME=media2 + # + # .: + # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.0/media0 + # uevent: + # MAJOR=237 + # MINOR=0 + # DEVNAME=media0 + # + # .: + # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.2/media1 + # uevent: + # MAJOR=237 + # MINOR=1 + # DEVNAME=media1 + # + # ----------------- + # Devices found: + # + # .: + # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.0/media0 + # uevent: + # MAJOR=237 + # MINOR=0 + # DEVNAME=media0 + # + # .: + # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.2/media1 + # uevent: + # MAJOR=237 + # MINOR=1 + # DEVNAME=media1 + # + # ----------------- + # Devices missing (best guess): + # + # .: + # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-8/3-8.3/3-8.3.2/3-8.3.2:1.0/media2 + # uevent: + # MAJOR=237 + # MINOR=2 + # DEVNAME=media2 + # + # ================= + not ok 67 bus.media + +Updating the reference +---------------------- + +As time goes on, new devices might be introduced in the system. To replace a +reference file with a more up-to-date one containing more devices, pass both +``--generate-reference`` and ``--update-reference`` arguments. The program will +refuse to replace the reference if the new one doesn't contain all the devices +in the old reference, as that is usually not desirable. + +Caveats +======= + +The test relies solely on the count of devices per subsystem to detect missing +devices. [#f1]_ That means that it is possible for the test to fail to detect a +missing device. + +For example, if the running system contains one extra device and one missing +device on the same subsystem compared to the reference, no test will fail since +the count is the same. To minimize the risk of this happening, it is recommended +to keep the reference file as up-to-date as possible. + +.. [#f1] The reason for this is that there aren't any device properties that are + used for every device and that are guaranteed to uniquely identify them and be + stable across kernel releases, so any attempt to match devices based on their + properties would lead to false-positives. + +Pre-existing reference files +============================ + +Due to the per-platform nature of the reference files, it is not viable to keep +them in-tree. + +To facilitate running the test, especially by CI systems, a collection of +pre-existing reference files is kept at +https://github.com/kernelci/platform-test-parameters. diff --git a/tools/testing/selftests/devices/exist/test_dev_exist.py b/tools/testing/selftests/devices/exist/test_dev_exist.py new file mode 100755 index 000000000000..58bff5ea99e7 --- /dev/null +++ b/tools/testing/selftests/devices/exist/test_dev_exist.py @@ -0,0 +1,357 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024 Collabora Ltd + +import os +import sys +import argparse +import gzip +import json + +# Allow ksft module to be imported from different directory +this_dir = os.path.dirname(os.path.realpath(__file__)) +sys.path.append(os.path.join(this_dir, "../../kselftest/")) + +import ksft + + +def generate_ref_metadata(): + metadata = {} + + config_file = "/proc/config.gz" + if os.path.isfile(config_file): + with gzip.open(config_file, "r") as f: + config = str(f.read()) + metadata["config"] = config + + metadata["version"] = os.uname().release + + metadata["platform_ids"] = get_possible_ref_filenames() + + return metadata + + +def generate_dev_data(): + data = {} + + device_subsys_types = [ + { + "type": "class", + "base_dir": "/sys/class", + "add_path": "", + "ignored": [], + }, + { + "type": "bus", + "base_dir": "/sys/bus", + "add_path": "devices", + "ignored": [], + }, + ] + + properties = sorted( + [ + ".", + "uevent", + "name", + "device", + "firmware_node", + "driver", + "device/uevent", + "firmware_node/uevent", + ] + ) + + for dev_subsys_type in device_subsys_types: + subsystems = {} + for subsys_name in sorted(os.listdir(dev_subsys_type["base_dir"])): + if subsys_name in dev_subsys_type["ignored"]: + continue + + devs_path = os.path.join( + dev_subsys_type["base_dir"], subsys_name, dev_subsys_type["add_path"] + ) + # Filter out non-symlinks as they're not devices + dev_dirs = [dev for dev in os.scandir(devs_path) if dev.is_symlink()] + devs_data = [] + for dev_dir in dev_dirs: + dev_path = os.path.join(devs_path, dev_dir) + dev_data = {"info": {}} + for prop in properties: + prop_path = os.path.join(dev_path, prop) + if os.path.isfile(prop_path): + with open(prop_path) as f: + dev_data["info"][prop] = f.read() + elif os.path.isdir(prop_path): + dev_data["info"][prop] = os.path.realpath(prop_path) + devs_data.append(dev_data) + if len(dev_dirs): + subsystems[subsys_name] = { + "count": len(dev_dirs), + "devices": devs_data, + } + data[dev_subsys_type["type"]] = subsystems + + return data + + +def generate_reference(): + return {"metadata": generate_ref_metadata(), "data": generate_dev_data()} + + +def commented(s): + return s.replace("\n", "\n# ") + + +def indented(s, n): + return " " * n + s.replace("\n", "\n" + " " * n) + + +def stripped(s): + return s.strip("\n") + + +def devices_difference(dev1, dev2): + difference = 0 + + for prop in dev1["info"].keys(): + for l1, l2 in zip( + dev1["info"].get(prop, "").split("\n"), + dev2["info"].get(prop, "").split("\n"), + ): + if l1 != l2: + difference += 1 + return difference + + +def guess_missing_devices(cur_subsys_data, ref_subsys_data): + # Detect what devices on the current system are the most similar to devices + # on the reference one by one until the leftovers are the most dissimilar + # devices and therefore most likely the missing ones. + found_count = cur_subsys_data["count"] + expected_count = ref_subsys_data["count"] + missing_count = found_count - expected_count + + diffs = [] + for cur_d in cur_subsys_data["devices"]: + for ref_d in ref_subsys_data["devices"]: + diffs.append((devices_difference(cur_d, ref_d), cur_d, ref_d)) + + diffs.sort(key=lambda x: x[0]) + + assigned_ref_devs = [] + assigned_cur_devs = [] + for diff in diffs: + if len(assigned_ref_devs) >= expected_count - missing_count: + break + if diff[1] in assigned_cur_devs or diff[2] in assigned_ref_devs: + continue + assigned_cur_devs.append(diff[1]) + assigned_ref_devs.append(diff[2]) + + missing_devices = [] + for d in ref_subsys_data["devices"]: + if d not in assigned_ref_devs: + missing_devices.append(d) + + return missing_devices + + +def dump_devices_info(cur_subsys_data, ref_subsys_data): + def dump_device_info(dev): + for name, val in dev["info"].items(): + ksft.print_msg(indented(name + ":", 2)) + val = stripped(val) + if val: + ksft.print_msg(commented(indented(val, 4))) + ksft.print_msg("") + + ksft.print_msg("=================") + ksft.print_msg("Devices expected:") + ksft.print_msg("") + for d in ref_subsys_data["devices"]: + dump_device_info(d) + ksft.print_msg("-----------------") + ksft.print_msg("Devices found:") + ksft.print_msg("") + for d in cur_subsys_data["devices"]: + dump_device_info(d) + ksft.print_msg("-----------------") + ksft.print_msg("Devices missing (best guess):") + ksft.print_msg("") + missing_devices = guess_missing_devices(cur_subsys_data, ref_subsys_data) + for d in missing_devices: + dump_device_info(d) + ksft.print_msg("=================") + + +def load_reference(ref_filename): + with open(ref_filename) as f: + ref = json.load(f) + return ref + + +def run_test(ref_filename): + ksft.print_msg(f"Using reference file: '{ref_filename}'") + + ref_data = load_reference(ref_filename)["data"] + + num_tests = 0 + for subsys_type in ref_data.values(): + num_tests += len(subsys_type) + ksft.set_plan(num_tests) + + cur_data = generate_dev_data() + + reference_outdated = False + + for subsys_type_name, ref_subsys_type_data in ref_data.items(): + for subsys_name, ref_subsys_data in ref_subsys_type_data.items(): + test_name = f"{subsys_type_name}.{subsys_name}" + if not ( + cur_data.get(subsys_type_name) + and cur_data.get(subsys_type_name).get(subsys_name) + ): + ksft.print_msg(f"Device subsystem '{subsys_name}' missing") + ksft.test_result_fail(test_name) + continue + cur_subsys_data = cur_data[subsys_type_name][subsys_name] + + found_count = cur_subsys_data["count"] + expected_count = ref_subsys_data["count"] + if found_count < expected_count: + ksft.print_msg( + f"Missing devices for subsystem '{subsys_name}': {expected_count - found_count} (Expected {expected_count}, found {found_count})" + ) + dump_devices_info(cur_subsys_data, ref_subsys_data) + ksft.test_result_fail(test_name) + else: + ksft.test_result_pass(test_name) + if found_count > expected_count: + reference_outdated = True + + if len(cur_data[subsys_type_name]) > len(ref_subsys_type_data): + reference_outdated = True + + if reference_outdated: + ksft.print_msg( + "Warning: The current system contains more devices and/or subsystems than the reference. Updating the reference is recommended." + ) + + +def ref_is_superset(new_ref_data, old_ref_data): + for subsys_type in old_ref_data: + for subsys in old_ref_data[subsys_type]: + if subsys not in new_ref_data[subsys_type]: + return False + if ( + new_ref_data[subsys_type][subsys]["count"] + < old_ref_data[subsys_type][subsys]["count"] + ): + return False + return True + + +def get_possible_ref_filenames(): + filenames = [] + + dt_board_compatible_file = "/proc/device-tree/compatible" + if os.path.exists(dt_board_compatible_file): + with open(dt_board_compatible_file) as f: + for line in f: + compatibles = [compat for compat in line.split("\0") if compat] + filenames.extend(compatibles) + else: + dmi_id_dir = "/sys/devices/virtual/dmi/id" + vendor_dmi_file = os.path.join(dmi_id_dir, "sys_vendor") + product_dmi_file = os.path.join(dmi_id_dir, "product_name") + + with open(vendor_dmi_file) as f: + vendor = f.read().replace("\n", "") + with open(product_dmi_file) as f: + product = f.read().replace("\n", "") + + filenames = [vendor + "," + product] + + return filenames + + +def get_ref_filename(ref_dir, should_exist=True): + chosen_ref_filename = "" + full_ref_paths = [ + os.path.join(ref_dir, f + ".json") for f in get_possible_ref_filenames() + ] + if not should_exist: + return full_ref_paths[0] + + for path in full_ref_paths: + if os.path.exists(path): + chosen_ref_filename = path + break + + if not chosen_ref_filename: + tried_paths = ",".join(["'" + p + "'" for p in full_ref_paths]) + ksft.print_msg(f"No matching reference file found (tried {tried_paths})") + ksft.exit_fail() + + return chosen_ref_filename + + +parser = argparse.ArgumentParser() +parser.add_argument( + "--reference-dir", + "-d", + default=".", + help="Directory containing the reference files", +) +parser.add_argument( + "--reference-file", "-f", help="File name of the reference to read from or write to" +) +parser.add_argument( + "--generate-reference", + "-g", + action="store_true", + help="Generate a reference file with the devices on the running system", +) +parser.add_argument( + "--update-reference", + "-u", + action="store_true", + help="Allow overwriting the reference in-place if the existing reference is a subset of the new one", +) +args = parser.parse_args() + +if args.reference_file: + ref_filename = os.path.join(args.reference_dir, args.reference_file) + if not os.path.exists(ref_filename) and not args.generate_reference: + ksft.print_msg(f"Reference file not found: '{ref_filename}'") + ksft.exit_fail() +else: + ref_filename = get_ref_filename(args.reference_dir, not args.generate_reference) + +if args.generate_reference: + if os.path.exists(ref_filename) and not args.update_reference: + print( + f"Reference file '{ref_filename}' already exists; won't overwrite; aborting" + ) + sys.exit(1) + + gen_ref = generate_reference() + if args.update_reference and os.path.exists(ref_filename): + loaded_ref = load_reference(ref_filename) + if not ref_is_superset(gen_ref["data"], loaded_ref["data"]): + print( + f"New reference is not a superset of the existing one; skipping update for '{ref_filename}'" + ) + sys.exit(1) + + with open(ref_filename, "w") as f: + json.dump(gen_ref, f, indent=4) + print(f"Reference generated to file '{ref_filename}'") + sys.exit(0) + +ksft.print_header() + +run_test(ref_filename) + +ksft.finished()
--- base-commit: 40e0c9d414f57d450e3ad03c12765e797fc3fede change-id: 20240724-kselftest-dev-exist-bb1bcf884654
Best regards,