Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run. However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Feedback encouraged.
Cheers, Don
--- # List of tests by subsystem # # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation # # Subsystems (alphabetical)
KUNIT TEST: maintainer: - name: name1 email: email1 - name: name2 email: email2 list: version: dependency: - dep1 - dep2 test: - path: tools/testing/kunit cmd: param: - path: cmd: param: hardware: none
-----Original Message----- From: automated-testing@lists.yoctoproject.org automated-testing@lists.yoctoproject.org On Behalf Of Don Zickus Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
Don,
I'm interested in this initiative. Is discussion going to be on a kernel mailing list, or on this e-mail, or somewhere else?
See a few comments below.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run.
Just saying "LTP" is not granular enough. LTP has hundreds of individual test programs, and it would be useful to specify the individual tests from LTP that should be run per sub-system.
I was particularly intrigued by the presentation at Plumbers about test coverage. It would be nice to have data (or easily replicable methods) for determining the code coverage of a test or set of tests, to indicate what parts of the kernel are being missed and help drive new test development.
However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Sounds like a good place to start. Do we have some candidate sub-systems in mind? Has anyone volunteered to lead the way?
Feedback encouraged.
Cheers, Don
# List of tests by subsystem # # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation
Is this something new in MAINTAINERS, or is it a separate file?
# # Subsystems (alphabetical)
KUNIT TEST: maintainer: - name: name1 email: email1 - name: name2 email: email2 list: version: dependency: - dep1 - dep2 test: - path: tools/testing/kunit cmd: param: - path: cmd: param: hardware: none
Looks OK so far - it'd be nice to have a few concrete examples. -- Tim
Hi!
Just saying "LTP" is not granular enough. LTP has hundreds of individual test programs, and it would be useful to specify the individual tests from LTP that should be run per sub-system.
A few thousand tests to be more precise, and also the content tend to change between releases, be it test additions or removal and I do not think this level of changes is somehing that makes sense to be tracked in such database.
It may be better to have more generic description of LTP subsets, there are a few obvious e.g. "SysV IPC" or "Timers", and have the LTP testrunner map that to actual testcases. The hard task here is to figure out which groups would be useful and keep the set reasonably small.
I can move this forward in LTP reasonably quickly we get small list of useful groups from kernel develpers.
Hi Cyril,
On Wed, Oct 16, 2024 at 9:11 AM Cyril Hrubis chrubis@suse.cz wrote:
Hi!
Just saying "LTP" is not granular enough. LTP has hundreds of individual test programs, and it would be useful to specify the individual tests from LTP that should be run per sub-system.
A few thousand tests to be more precise, and also the content tend to change between releases, be it test additions or removal and I do not think this level of changes is somehing that makes sense to be tracked in such database.
It may be better to have more generic description of LTP subsets, there are a few obvious e.g. "SysV IPC" or "Timers", and have the LTP testrunner map that to actual testcases. The hard task here is to figure out which groups would be useful and keep the set reasonably small.
I can move this forward in LTP reasonably quickly we get small list of useful groups from kernel develpers.
Thanks! The thought was if we wanted to encourage contributors to run these tests before submitting, does running the whole LTP testsuite make sense or like you said a targeted set would be much better?
Cheers, Don
-- Cyril Hrubis chrubis@suse.cz
Hi!
A few thousand tests to be more precise, and also the content tend to change between releases, be it test additions or removal and I do not think this level of changes is somehing that makes sense to be tracked in such database.
It may be better to have more generic description of LTP subsets, there are a few obvious e.g. "SysV IPC" or "Timers", and have the LTP testrunner map that to actual testcases. The hard task here is to figure out which groups would be useful and keep the set reasonably small.
I can move this forward in LTP reasonably quickly we get small list of useful groups from kernel develpers.
Thanks! The thought was if we wanted to encourage contributors to run these tests before submitting, does running the whole LTP testsuite make sense or like you said a targeted set would be much better?
The best answer is "it depends". The whole LTP run can take hours on slower hardware and may not even test the code you wanted to test, e.g. if you did changes to compat code you have to build LTP with -m32 to actually excercise the 32bit emulation layer. If you changed kernel core it may make sense to run whole LTP, on the other hand changes isolated to a certain subsystems e.g. SysV IPC, Timers, Cgroups, etc. could be tested fairly quickly with a subset of LTP. So I think that we need some kind of mapping or heuristics so that we can map certain usecases to a subsets of tests.
Hi Tim,
On Tue, Oct 15, 2024 at 12:01 PM Bird, Tim Tim.Bird@sony.com wrote:
-----Original Message----- From: automated-testing@lists.yoctoproject.org automated-testing@lists.yoctoproject.org On Behalf Of Don Zickus Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
Don,
I'm interested in this initiative. Is discussion going to be on a kernel mailing list, or on this e-mail, or somewhere else?
I was going to keep it on this mailing list. Open to adding other lists or moving it.
See a few comments below.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run.
Just saying "LTP" is not granular enough. LTP has hundreds of individual test programs, and it would be useful to specify the individual tests from LTP that should be run per sub-system.
Agreed. Just reiterating what Greg has told me.
I was particularly intrigued by the presentation at Plumbers about test coverage. It would be nice to have data (or easily replicable methods) for determining the code coverage of a test or set of tests, to indicate what parts of the kernel are being missed and help drive new test development.
It would be nice. I see that as orthogonal to this effort for now. But I think this might be a good step towards that idea.
However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Sounds like a good place to start. Do we have some candidate sub-systems in mind? Has anyone volunteered to lead the way?
At our meeting, someone suggested Kunit as it was easy to understand for starters and then add a few other volunteer systems in. I know we have a few maintainers who can probably help us get started. I think arm and media were ones thrown about at our meeting.
Feedback encouraged.
Cheers, Don
# List of tests by subsystem # # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation
Is this something new in MAINTAINERS, or is it a separate file?
For now a separate file. It isn't clear where this could go long term. The thought was to gather data to see what is necessary first. Long term it will probably stay a separate file. *shrugs*
# # Subsystems (alphabetical)
KUNIT TEST: maintainer: - name: name1 email: email1 - name: name2 email: email2 list: version: dependency: - dep1 - dep2 test: - path: tools/testing/kunit cmd: param: - path: cmd: param: hardware: none
Looks OK so far - it'd be nice to have a few concrete examples.
Fair enough. Let me try and work on some.
Cheers, Don
-- Tim
On 10/14/24 15:32, Donald Zickus wrote:
Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run. However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Feedback encouraged.
Cheers, Don
# List of tests by subsystem # # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation # # Subsystems (alphabetical)
KUNIT TEST: maintainer: - name: name1 email: email1 - name: name2 email: email2 list: version: dependency: - dep1 - dep2 test: - path: tools/testing/kunit cmd: param: - path: cmd: param: hardware: none
Don,
thanks for initiating this! I have a few questions/suggestions:
I think the root element in a section (`KUNIT TEST` in your example) is expected to be a container of multiple test definitions ( so there will be one for LTP, KSelfTest, etc) -- can you confirm?
Assuming above is correct and `test` is a container of multiple test definitions, can we add more properties to each: * name -- would be a unique name id for each test * description -- short description of the test. * arch -- applicable platform architectures * runtime -- This is subjective as it can be different for different systems. but maybe we can have some generic names, like 'SHORT', 'MEDIUM', 'LONG', etc and each system may scale the timeout locally?
I see you have a `Subsystems` entry in comments section, but not in the example. Do you expect it to be part of this file, or will there be a file per each subsystem?
Can we define what we mean by a `test`? For me this is a group of one or more individual testcases that can be initiated with a single command-line, and is expected to run in a 'reasonable' time. Any other thoughts?
Thanks! Minas
On Thu, Oct 17, 2024 at 8:32 AM Minas Hambardzumyan minas@ti.com wrote:
On 10/14/24 15:32, Donald Zickus wrote:
Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run. However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Feedback encouraged.
Cheers, Don
# List of tests by subsystem # # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation # # Subsystems (alphabetical)
KUNIT TEST: maintainer: - name: name1 email: email1 - name: name2 email: email2 list: version: dependency: - dep1 - dep2 test: - path: tools/testing/kunit cmd: param: - path: cmd: param: hardware: none
Don,
thanks for initiating this! I have a few questions/suggestions:
I think the root element in a section (`KUNIT TEST` in your example) is expected to be a container of multiple test definitions ( so there will be one for LTP, KSelfTest, etc) -- can you confirm?
Actually I may have misled you. 'KUNIT TEST' was an example I picked out of the MAINTAINERS file as a maintained subsystem that folks contribute code too. Well it was the example folks suggested I use at plumbers (from what I recalled). Inside the subsystem container is a 'test' section that is the container of tests needed for the subsystem.
Assuming above is correct and `test` is a container of multiple test definitions, can we add more properties to each:
- name -- would be a unique name id for each test
- description -- short description of the test.
- arch -- applicable platform architectures
- runtime -- This is subjective as it can be different for different
systems. but maybe we can have some generic names, like 'SHORT', 'MEDIUM', 'LONG', etc and each system may scale the timeout locally?
Based on what I said above, does that change your thoughts a bit? In my head the tests are already out there and defined, I am not sure we can request them to be unique. And the description can be found in the url as I envisioned some tests being run across multiple subsystems, hence minimizing the duplication may be useful. Happy to be swayed in a different direction.
I like the idea of a 'timeout'. That has been useful for our tests internally. I can add that to the fields.
I see you have a `Subsystems` entry in comments section, but not in the example. Do you expect it to be part of this file, or will there be a file per each subsystem?
Hopefully my above comments clarifies your confusion? The subsystem is 'KUNIT TEST' in this example.
Can we define what we mean by a `test`? For me this is a group of one or more individual testcases that can be initiated with a single command-line, and is expected to run in a 'reasonable' time. Any other thoughts?
Yes. I was thinking a test(s) is something the subsystem maintainer expects all contributors (humans) or testers (human or CI bots) to that subsystem to run on posted patches. The test is expected to be command line driven (copy-n-paste is probably preferrable) and it can consist of multiple test command lines or a larger testsuite. Also happy to be swayed differently.
Interested in your feedback to my comments.
Cheers, Don
Thanks! Minas
Hi Don,
Thanks for putting this together: the discussion at Plumbers was very useful.
On Tue, 15 Oct 2024 at 04:33, Donald Zickus dzickus@redhat.com wrote:
Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
I think that there are two (maybe three) separate problems here: 1. What tests do we want to run (for a given patch/subsystem/environment/etc)? 2. How do we describe those tests in such a way that running them can be automated? 3. (Exactly what constitutes a 'test'? A single 'test', a whole suite of tests, a test framework/tool? What about the environment: is, e.g., KUnit on UML different from KUnit on qemu-x86_64 different from KUnit on qemu-arm64?)
My gut feeling here is that (1) is technically quite easy: worst-case we just make every MAINTAINERS entry link to a document describing what tests should be run. Actually getting people to write these documents and then run the tests, though, is very difficult.
(2) is the area where I think this will be most useful. We have some arbitrary (probably .yaml) file which describes a series of tests to run in enough detail that we can automate it. My ideal outcome here would be to have a 'kunit.yaml' file which I can pass to a tool (either locally or automatically on some CI system) which will run all of the checks I'd run on an incoming patch. This would include everything from checkpatch, to test builds, to running KUnit tests and other test scripts. Ideally, it'd even run these across a bunch of different environments (architectures, emulators, hardware, etc) to catch issues which only show up on big-endian or 32-bit machines.
If this means I can publish that yaml file somewhere, and not only give contributors a way to check that those tests pass on their own machine before sending a patch out, but also have CI systems automatically run them (so the results are ready waiting before I manually review the patch), that'd be ideal.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run. However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Feedback encouraged.
Cheers, Don
# List of tests by subsystem
I think we should split this up into several files, partly to avoid merge conflicts, partly to make it easy to maintain custom collections of tests separately.
For example, fs.yaml could contain entries for both xfstests and fs KUnit and selftests.
It's also probably going to be necessary to have separate sets of tests for different use-cases. For example, there might be a smaller, quicker set of tests to run on every patch, and a much longer, more expensive set which only runs every other day. So I don't think there'll even be a 1:1 mapping between 'test collections' (files) and subsystems. But an automated way of running "this collection of tests" would be very useful, particularly if it's more user-friendly than just writing a shell script (e.g., having nicely formatted output, being able to run things in parallel or remotely, etc).
# # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation # # Subsystems (alphabetical)
KUNIT TEST:
For KUnit, it'll be interesting to draw the distinction between KUnit overall and individual KUnit suites. I'd lean towards having a separate entry for each subsystem's KUnit tests (including one for KUnit's own tests)
maintainer: - name: name1 email: email1 - name: name2 email: email2 list:
How important is it to have these in the case where they're already in the MAINTAINERS file? I can see it being important for tests which live elsewhere, though eventually, I'd still prefer the subsystem maintainer to take some responsibility for the tests run for their subsystems.
version:
This field is probably unnecessary for test frameworks which live in the kernel tree.
dependency: - dep1 - dep2
If we want to automate this in any way, we're going to need to work out a way of specifying these. Either we'd have to pick a distro's package names, or have our own mapping.
(A part of me really likes the idea of having a small list of "known" dependencies: python, docker, etc, and trying to limit tests to using those dependencies. Though there are plenty of useful tests with more complicated dependencies, so that probably won't fly forever.)
test: - path: tools/testing/kunit cmd: param: - path: cmd: param:
Is 'path' here supposed to be the path to the test binary, the working directory, etc? Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.
hardware: none
For KUnit, I'd imagine having a kunit.yaml, with something like this, including the KUnit tests in the 'kunit' and 'example' suites, and the 'kunit_tool_test.py' test script:
--- KUnit: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit.py param: run kunit - path: . cmd: tools/testing/kunit.py param: run example hardware: none KUnit Tool: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit_tool_test.py param: hardware: none ---
Obviously there's still some redundancy there, and I've not actually tried implementing something that could run it. It also lacks any information about the environment. In practice, I have about 20 different kunit.py invocations which run the tests with different configs and on different architectures. Though that might make sense to keep in a separate file to only run if the simpler tests pass. And equally, it'd be nice to have a 'common.yaml' file with basic patch and build tests which apply to almost everything (checkpatch, make defconfig, maybe even make allmodconfig, etc).
Cheers, -- David
Hello,
---- On Fri, 18 Oct 2024 04:21:58 -0300 David Gow wrote ---
Hi Don, Thanks for putting this together: the discussion at Plumbers was very useful. On Tue, 15 Oct 2024 at 04:33, Donald Zickus dzickus@redhat.com> wrote:
Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
I think that there are two (maybe three) separate problems here:
- What tests do we want to run (for a given patch/subsystem/environment/etc)?
- How do we describe those tests in such a way that running them can
be automated? 3. (Exactly what constitutes a 'test'? A single 'test', a whole suite of tests, a test framework/tool? What about the environment: is, e.g., KUnit on UML different from KUnit on qemu-x86_64 different from KUnit on qemu-arm64?) My gut feeling here is that (1) is technically quite easy: worst-case we just make every MAINTAINERS entry link to a document describing what tests should be run. Actually getting people to write these documents and then run the tests, though, is very difficult. (2) is the area where I think this will be most useful. We have some arbitrary (probably .yaml) file which describes a series of tests to run in enough detail that we can automate it. My ideal outcome here would be to have a 'kunit.yaml' file which I can pass to a tool (either locally or automatically on some CI system) which will run all of the checks I'd run on an incoming patch. This would include everything from checkpatch, to test builds, to running KUnit tests and other test scripts. Ideally, it'd even run these across a bunch of different environments (architectures, emulators, hardware, etc) to catch issues which only show up on big-endian or 32-bit machines. If this means I can publish that yaml file somewhere, and not only give contributors a way to check that those tests pass on their own machine before sending a patch out, but also have CI systems automatically run them (so the results are ready waiting before I manually review the patch), that'd be ideal.
This though makes sense to me. It will be very interesting for CI systems to be able to figure out which tests to run for a set of folder/file changes.
However, I also feel that a key part of the work is actually convincing people to write (and maintain!) these specs. Only automation through CI we may be able to show the value of this tasks, prompting maintainers to keep their files updated, otherwise we are going create a sea of specs that will just be outdated pretty quickly.
In the new KernelCI maestro, we started with only a handful of tests, so we could actually look at the results, find regressions and report them. Maybe we could start in the same way with a few tests. Eg kselftest-dt and kselftests-acpi. It should be relatively simple to make something that will decide on testing probe of drivers based on which files are being changed.
There needs to be a sort of cultural shift on how we track tests first. Just documenting our current tests may not take us far, but starting small with a comprehensive process from test spec to CI automation to clear ways of deliverying results is the game changer.
Then there are other perspectives that crosses this. For example, many of the LTP and kselftests will just fail, but there is no accumulated knowledge on what the result of each test means. So understanding what is expected to pass/fail for each platform is a sort of dependance in this extensive documentation effort we are set ourselves for.
Best,
- Gus
Hi!
Then there are other perspectives that crosses this. For example, many of the LTP and kselftests will just fail, but there is no accumulated knowledge on what the result of each test means. So understanding what is expected to pass/fail for each platform is a sort of dependance in this extensive documentation effort we are set ourselves for.
We are spending quite a lot of time to make sure LTP tests do not fail unless there is a reason to. If you see LTP tests failing and you think that they shouldn't just report it on the LTP mailing list and we will fix that.
On Fri, Oct 18, 2024 at 03:21:58PM +0800, David Gow wrote:
It's also probably going to be necessary to have separate sets of tests for different use-cases. For example, there might be a smaller, quicker set of tests to run on every patch, and a much longer, more expensive set which only runs every other day. So I don't think there'll even be a 1:1 mapping between 'test collections' (files) and subsystems. But an automated way of running "this collection of tests" would be very useful, particularly if it's more user-friendly than just writing a shell script (e.g., having nicely formatted output, being able to run things in parallel or remotely, etc).
This is definitely the case for me, I have an escallating set of tests that I run per patch, per branch and for things like sending pull requests.
maintainer: - name: name1 email: email1 - name: name2 email: email2 list:
How important is it to have these in the case where they're already in the MAINTAINERS file? I can see it being important for tests which live elsewhere, though eventually, I'd still prefer the subsystem maintainer to take some responsibility for the tests run for their subsystems.
It does seem useful to list the maintainers for tests in addition to the maintaienrs for the code, and like you say some of the tests are out of tree.
On Fri, Oct 18, 2024 at 3:22 AM David Gow davidgow@google.com wrote:
Hi Don,
Thanks for putting this together: the discussion at Plumbers was very useful.
On Tue, 15 Oct 2024 at 04:33, Donald Zickus dzickus@redhat.com wrote:
Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
I think that there are two (maybe three) separate problems here:
- What tests do we want to run (for a given patch/subsystem/environment/etc)?
My thinking is this is maintainer's choice. What would they like to see run on a patch to verify its correctness? I would like to think most maintainers already have scripts they run before commiting patches to their -next branch. All I am trying to do is expose what is already being done I believe.
- How do we describe those tests in such a way that running them can
be automated?
This is the tricky part. But I am going to assume that if most maintainers run tests before committing patches to their -next branch, then there is a good chance those tests are scripted and command line driven (this is the kernel community, right :-) ). So if we could expose those scripts and make the copy-and-pastable such that contributors or testers (including CI bots) can just copy and run them. Some maintainers have more complex environments and separating command line driven tests from the environment scripts might be tricky.
Does that sound reasonable?
- (Exactly what constitutes a 'test'? A single 'test', a whole suite
of tests, a test framework/tool? What about the environment: is, e.g., KUnit on UML different from KUnit on qemu-x86_64 different from KUnit on qemu-arm64?)
My gut feeling here is that (1) is technically quite easy: worst-case we just make every MAINTAINERS entry link to a document describing what tests should be run. Actually getting people to write these documents and then run the tests, though, is very difficult.
Well if I look at kunit or kselftest, really all you are doing as a subsystem maintainer is asking contributors or testers to run a 'make' command right? Everything else is already documented I think.
(2) is the area where I think this will be most useful. We have some arbitrary (probably .yaml) file which describes a series of tests to run in enough detail that we can automate it. My ideal outcome here would be to have a 'kunit.yaml' file which I can pass to a tool (either locally or automatically on some CI system) which will run all of the checks I'd run on an incoming patch. This would include everything from checkpatch, to test builds, to running KUnit tests and other test scripts. Ideally, it'd even run these across a bunch of different environments (architectures, emulators, hardware, etc) to catch issues which only show up on big-endian or 32-bit machines.
If this means I can publish that yaml file somewhere, and not only give contributors a way to check that those tests pass on their own machine before sending a patch out, but also have CI systems automatically run them (so the results are ready waiting before I manually review the patch), that'd be ideal.
Yes, that is exactly the goal of this exercise. :-) but instead of a kunit.yaml file, it is more of a test.yaml file with hundreds of subystems inside it (and probably a corresponding get_tests.pl script)[think how MAINTAINERS file operates and this is a sister file].
Inside the 'KUNIT' section would be a container of tests that would be expected to run (like you listed). Each test has its own command line and params.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run. However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Feedback encouraged.
Cheers, Don
# List of tests by subsystem
I think we should split this up into several files, partly to avoid merge conflicts, partly to make it easy to maintain custom collections of tests separately.
For example, fs.yaml could contain entries for both xfstests and fs KUnit and selftests.
I am not opposed to the idea. But I am a fan of the user experience. So while an fs.yaml might sound good, is it obvious to a contributor or tester that given a patch, do they know if fs.yaml is the correct yaml file to parse when running tests? How do you map a patch to a yaml file? I was trying to use subsystems like MAINTAINERS (and get_maintainers.pl) as my mapping. Open to better suggestions.
It's also probably going to be necessary to have separate sets of tests for different use-cases. For example, there might be a smaller, quicker set of tests to run on every patch, and a much longer, more expensive set which only runs every other day. So I don't think there'll even be a 1:1 mapping between 'test collections' (files) and subsystems. But an automated way of running "this collection of tests" would be very useful, particularly if it's more user-friendly than just writing a shell script (e.g., having nicely formatted output, being able to run things in parallel or remotely, etc).
I don't disagree. I am trying to start small to get things going and some momentum. I proposed a container of tests section. I would like to think adding another field in each individual test area like (short, medium, long OR mandatory, performance, nice-to-have) would be easy to add to the yaml file overall and attempt to accomplish what you are suggesting. Thoughts?
# # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation # # Subsystems (alphabetical)
KUNIT TEST:
For KUnit, it'll be interesting to draw the distinction between KUnit overall and individual KUnit suites. I'd lean towards having a separate entry for each subsystem's KUnit tests (including one for KUnit's own tests)
KUNIT may not have been the best 'common' test example due to its complexities across other subsystems. :-/
maintainer: - name: name1 email: email1 - name: name2 email: email2 list:
How important is it to have these in the case where they're already in the MAINTAINERS file? I can see it being important for tests which live elsewhere, though eventually, I'd still prefer the subsystem maintainer to take some responsibility for the tests run for their subsystems.
I wasn't sure if all subsystem maintainers actually want to maintain the tests too or just point someone else at it. I look at LTP as an example here. But I could be wrong.
version:
This field is probably unnecessary for test frameworks which live in the kernel tree.
Possibly. It was brought up at Plumbers, so I included it for completeness.
dependency: - dep1 - dep2
If we want to automate this in any way, we're going to need to work out a way of specifying these. Either we'd have to pick a distro's package names, or have our own mapping.
Agreed. I might lean on what 'perf' outputs. They do dependency detection and output suggested missing packages. Their auto detection of already included deps is rather complicated though.
(A part of me really likes the idea of having a small list of "known" dependencies: python, docker, etc, and trying to limit tests to using those dependencies. Though there are plenty of useful tests with more complicated dependencies, so that probably won't fly forever.)
Hehe. For Fedora/RHEL at least, python has hundreds of smaller library packages. That is tricky. And further some tests like to compile, which means a bunch of -devel packages. Each distro has different names for their -devel packages. :-/ But a side goal of this effort is to define some community standards. Perhaps we can influence things here to clean up this problem??
test: - path: tools/testing/kunit cmd: param: - path: cmd: param:
Is 'path' here supposed to be the path to the test binary, the working directory, etc? Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.
The thought was the command to copy-n-paste to run the test after installing it. I am thinking most tests might be a git-clone or exploded tarball, leaving the path to be from the install point. So maybe working_directory is more descriptive.
hardware: none
For KUnit, I'd imagine having a kunit.yaml, with something like this, including the KUnit tests in the 'kunit' and 'example' suites, and the 'kunit_tool_test.py' test script:
KUnit: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit.py param: run kunit - path: . cmd: tools/testing/kunit.py param: run example hardware: none KUnit Tool: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit_tool_test.py param: hardware: none
Obviously there's still some redundancy there, and I've not actually tried implementing something that could run it. It also lacks any information about the environment. In practice, I have about 20 different kunit.py invocations which run the tests with different configs and on different architectures. Though that might make sense to keep in a separate file to only run if the simpler tests pass. And equally, it'd be nice to have a 'common.yaml' file with basic patch and build tests which apply to almost everything (checkpatch, make defconfig, maybe even make allmodconfig, etc).
Nice, thanks for the more detailed example.
Cheers, Don
Cheers, -- David
On Sat, 19 Oct 2024 at 04:17, Donald Zickus dzickus@redhat.com wrote:
On Fri, Oct 18, 2024 at 3:22 AM David Gow davidgow@google.com wrote:
Hi Don,
Thanks for putting this together: the discussion at Plumbers was very useful.
On Tue, 15 Oct 2024 at 04:33, Donald Zickus dzickus@redhat.com wrote:
Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
I think that there are two (maybe three) separate problems here:
- What tests do we want to run (for a given patch/subsystem/environment/etc)?
My thinking is this is maintainer's choice. What would they like to see run on a patch to verify its correctness? I would like to think most maintainers already have scripts they run before commiting patches to their -next branch. All I am trying to do is expose what is already being done I believe.
Agreed.
- How do we describe those tests in such a way that running them can
be automated?
This is the tricky part. But I am going to assume that if most maintainers run tests before committing patches to their -next branch, then there is a good chance those tests are scripted and command line driven (this is the kernel community, right :-) ). So if we could expose those scripts and make the copy-and-pastable such that contributors or testers (including CI bots) can just copy and run them. Some maintainers have more complex environments and separating command line driven tests from the environment scripts might be tricky.
Does that sound reasonable?
Yeah: that's basically what I'd want.
- (Exactly what constitutes a 'test'? A single 'test', a whole suite
of tests, a test framework/tool? What about the environment: is, e.g., KUnit on UML different from KUnit on qemu-x86_64 different from KUnit on qemu-arm64?)
My gut feeling here is that (1) is technically quite easy: worst-case we just make every MAINTAINERS entry link to a document describing what tests should be run. Actually getting people to write these documents and then run the tests, though, is very difficult.
Well if I look at kunit or kselftest, really all you are doing as a subsystem maintainer is asking contributors or testers to run a 'make' command right? Everything else is already documented I think.
(2) is the area where I think this will be most useful. We have some arbitrary (probably .yaml) file which describes a series of tests to run in enough detail that we can automate it. My ideal outcome here would be to have a 'kunit.yaml' file which I can pass to a tool (either locally or automatically on some CI system) which will run all of the checks I'd run on an incoming patch. This would include everything from checkpatch, to test builds, to running KUnit tests and other test scripts. Ideally, it'd even run these across a bunch of different environments (architectures, emulators, hardware, etc) to catch issues which only show up on big-endian or 32-bit machines.
If this means I can publish that yaml file somewhere, and not only give contributors a way to check that those tests pass on their own machine before sending a patch out, but also have CI systems automatically run them (so the results are ready waiting before I manually review the patch), that'd be ideal.
Yes, that is exactly the goal of this exercise. :-) but instead of a kunit.yaml file, it is more of a test.yaml file with hundreds of subystems inside it (and probably a corresponding get_tests.pl script)[think how MAINTAINERS file operates and this is a sister file].
Inside the 'KUNIT' section would be a container of tests that would be expected to run (like you listed). Each test has its own command line and params.
Yeah. My hope is we can have a "run_tests" tool which parses that file/files and runs everything.
So whether that ends up being: run_tests --subsystem "KUNIT" --subsystem "MM" or run_test --file "kunit.yaml" --file "mm.yaml" or even run_test --patch "my_mm_and_kunit_change.patch"
A CI system can just run it against the changed files in patches, a user who wants to double check something specific can override it to force the tests for a subsystem which may be indirectly affected. And if you're working on some new tests, or some private internal ones, you can keep your own yaml file and pass that along too.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run. However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Feedback encouraged.
Cheers, Don
# List of tests by subsystem
I think we should split this up into several files, partly to avoid merge conflicts, partly to make it easy to maintain custom collections of tests separately.
For example, fs.yaml could contain entries for both xfstests and fs KUnit and selftests.
I am not opposed to the idea. But I am a fan of the user experience. So while an fs.yaml might sound good, is it obvious to a contributor or tester that given a patch, do they know if fs.yaml is the correct yaml file to parse when running tests? How do you map a patch to a yaml file? I was trying to use subsystems like MAINTAINERS (and get_maintainers.pl) as my mapping. Open to better suggestions.
One option would be to have multiple files, which still have the MAINTAINERS subsystems listed within, and worst-case a tool just parses all of the files in that directory until it finds a matching one. Maybe a bit slower than having everything in the one file, but it sidesteps merge conflicts well.
But ideally, I'd like (as mentioned below) to have a tool which I can use to run tests locally, and being able to run, e.g., ./run_tests --all -f fs.yaml If I want to specify the tests I want to run manually, personally I think a filename would be a bit nicer than having to pass through, e.g., subsystem names.
It's also probably going to be necessary to have separate sets of tests for different use-cases. For example, there might be a smaller, quicker set of tests to run on every patch, and a much longer, more expensive set which only runs every other day. So I don't think there'll even be a 1:1 mapping between 'test collections' (files) and subsystems. But an automated way of running "this collection of tests" would be very useful, particularly if it's more user-friendly than just writing a shell script (e.g., having nicely formatted output, being able to run things in parallel or remotely, etc).
I don't disagree. I am trying to start small to get things going and some momentum. I proposed a container of tests section. I would like to think adding another field in each individual test area like (short, medium, long OR mandatory, performance, nice-to-have) would be easy to add to the yaml file overall and attempt to accomplish what you are suggesting. Thoughts?
I think that'd be a great idea. Maybe a "stage" field could work, too, where later tests only run if the previous ones pass. For example: Stage 0: checkpatch, does it build Stage 1: KUnit tests, unit tests, single architecture Stage 2: Full boot tests, selftests, etc Stage 3: The above tests on other architectures, allyesconfig, randconfig, etc.
Regardless, it'd be useful to be able to name individual tests and/or configurations and manually trigger them and/or filter on them.
_Maybe_ it makes sense to split up the "what tests to run" and "how are they run" bits. The obvious split here would be to have the test catalogue just handle the former, and the "how they're run" bit entirely live in shell scripts. But if we're going to support running tests in parallel and nicely displaying results, maybe there'll be a need to have something more data driven than a shell script.
# # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation # # Subsystems (alphabetical)
KUNIT TEST:
For KUnit, it'll be interesting to draw the distinction between KUnit overall and individual KUnit suites. I'd lean towards having a separate entry for each subsystem's KUnit tests (including one for KUnit's own tests)
KUNIT may not have been the best 'common' test example due to its complexities across other subsystems. :-/
Yeah: I think KUnit tests are a good example of the sorts of tests which would be relatively easy to integrate, but KUnit as a subsystem can be a bit confusing as an example because no-one's sure if we're talking about KUnit-the-subsystem or KUnit-the-tests.
maintainer: - name: name1 email: email1 - name: name2 email: email2 list:
How important is it to have these in the case where they're already in the MAINTAINERS file? I can see it being important for tests which live elsewhere, though eventually, I'd still prefer the subsystem maintainer to take some responsibility for the tests run for their subsystems.
I wasn't sure if all subsystem maintainers actually want to maintain the tests too or just point someone else at it. I look at LTP as an example here. But I could be wrong.
Fair enough. Maybe we just make this optional, and if empty we "default" to the subsystem maintainer.
version:
This field is probably unnecessary for test frameworks which live in the kernel tree.
Possibly. It was brought up at Plumbers, so I included it for completeness.
Yeah. Again, good to have, but make it optional.
dependency: - dep1 - dep2
If we want to automate this in any way, we're going to need to work out a way of specifying these. Either we'd have to pick a distro's package names, or have our own mapping.
Agreed. I might lean on what 'perf' outputs. They do dependency detection and output suggested missing packages. Their auto detection of already included deps is rather complicated though.
Sounds good.
(A part of me really likes the idea of having a small list of "known" dependencies: python, docker, etc, and trying to limit tests to using those dependencies. Though there are plenty of useful tests with more complicated dependencies, so that probably won't fly forever.)
Hehe. For Fedora/RHEL at least, python has hundreds of smaller library packages. That is tricky. And further some tests like to compile, which means a bunch of -devel packages. Each distro has different names for their -devel packages. :-/ But a side goal of this effort is to define some community standards. Perhaps we can influence things here to clean up this problem??
That'd be nice. :-)
test: - path: tools/testing/kunit cmd: param: - path: cmd: param:
Is 'path' here supposed to be the path to the test binary, the working directory, etc? Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.
The thought was the command to copy-n-paste to run the test after installing it. I am thinking most tests might be a git-clone or exploded tarball, leaving the path to be from the install point. So maybe working_directory is more descriptive.
Sounds good. In the KUnit case, the tooling currently expects the working directory to be the root of the kernel checkout, and the command to be "./tools/testing/kunit/kunit.py"...
hardware: none
For KUnit, I'd imagine having a kunit.yaml, with something like this, including the KUnit tests in the 'kunit' and 'example' suites, and the 'kunit_tool_test.py' test script:
KUnit: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit.py param: run kunit - path: . cmd: tools/testing/kunit.py param: run example hardware: none KUnit Tool: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit_tool_test.py param: hardware: none
Obviously there's still some redundancy there, and I've not actually tried implementing something that could run it. It also lacks any information about the environment. In practice, I have about 20 different kunit.py invocations which run the tests with different configs and on different architectures. Though that might make sense to keep in a separate file to only run if the simpler tests pass. And equally, it'd be nice to have a 'common.yaml' file with basic patch and build tests which apply to almost everything (checkpatch, make defconfig, maybe even make allmodconfig, etc).
Nice, thanks for the more detailed example.
Thanks, -- David
Hi,
Thanks for the feedback. I created a more realistic test.yaml file to start (we can split it when more tests are added) and a parser. I was going to add patch support as input to mimic get_maintainers.pl output, but that might take some time. For now, you have to manually select a subsystem. I will try to find space on kernelci.org to grow this work but you can find a git tree here[0].
From the README.md """ An attempt to map kernel subsystems to kernel tests that should be run on patches or code by humans and CI systems.
Examples:
Find test info for a subsystem
./get_tests.py -s 'KUNIT TEST' --info
Subsystem: KUNIT TEST Maintainer: David Gow davidgow@google.com Mailing List: None Version: None Dependency: ['python3-mypy'] Test: smoke: Url: None Working Directory: None Cmd: ./tools/testing/kunit/kunit.py Env: None Param: run --kunitconfig lib/kunit Hardware: arm64, x86_64
Find copy-n-pastable tests for a subsystem
./get_tests.py -s 'KUNIT TEST'
./tools/testing/kunit/kunit.pyrun --kunitconfig lib/kunit """
Is this aligning with what people were expecting?
Cheers, Don
[0] - https://github.com/dzickusrh/test-catalog/tree/main
On Sat, Oct 19, 2024 at 2:36 AM David Gow davidgow@google.com wrote:
On Sat, 19 Oct 2024 at 04:17, Donald Zickus dzickus@redhat.com wrote:
On Fri, Oct 18, 2024 at 3:22 AM David Gow davidgow@google.com wrote:
Hi Don,
Thanks for putting this together: the discussion at Plumbers was very useful.
On Tue, 15 Oct 2024 at 04:33, Donald Zickus dzickus@redhat.com wrote:
Hi,
At Linux Plumbers, a few dozen of us gathered together to discuss how to expose what tests subsystem maintainers would like to run for every patch submitted or when CI runs tests. We agreed on a mock up of a yaml template to start gathering info. The yaml file could be temporarily stored on kernelci.org until a more permanent home could be found. Attached is a template to start the conversation.
I think that there are two (maybe three) separate problems here:
- What tests do we want to run (for a given patch/subsystem/environment/etc)?
My thinking is this is maintainer's choice. What would they like to see run on a patch to verify its correctness? I would like to think most maintainers already have scripts they run before commiting patches to their -next branch. All I am trying to do is expose what is already being done I believe.
Agreed.
- How do we describe those tests in such a way that running them can
be automated?
This is the tricky part. But I am going to assume that if most maintainers run tests before committing patches to their -next branch, then there is a good chance those tests are scripted and command line driven (this is the kernel community, right :-) ). So if we could expose those scripts and make the copy-and-pastable such that contributors or testers (including CI bots) can just copy and run them. Some maintainers have more complex environments and separating command line driven tests from the environment scripts might be tricky.
Does that sound reasonable?
Yeah: that's basically what I'd want.
- (Exactly what constitutes a 'test'? A single 'test', a whole suite
of tests, a test framework/tool? What about the environment: is, e.g., KUnit on UML different from KUnit on qemu-x86_64 different from KUnit on qemu-arm64?)
My gut feeling here is that (1) is technically quite easy: worst-case we just make every MAINTAINERS entry link to a document describing what tests should be run. Actually getting people to write these documents and then run the tests, though, is very difficult.
Well if I look at kunit or kselftest, really all you are doing as a subsystem maintainer is asking contributors or testers to run a 'make' command right? Everything else is already documented I think.
(2) is the area where I think this will be most useful. We have some arbitrary (probably .yaml) file which describes a series of tests to run in enough detail that we can automate it. My ideal outcome here would be to have a 'kunit.yaml' file which I can pass to a tool (either locally or automatically on some CI system) which will run all of the checks I'd run on an incoming patch. This would include everything from checkpatch, to test builds, to running KUnit tests and other test scripts. Ideally, it'd even run these across a bunch of different environments (architectures, emulators, hardware, etc) to catch issues which only show up on big-endian or 32-bit machines.
If this means I can publish that yaml file somewhere, and not only give contributors a way to check that those tests pass on their own machine before sending a patch out, but also have CI systems automatically run them (so the results are ready waiting before I manually review the patch), that'd be ideal.
Yes, that is exactly the goal of this exercise. :-) but instead of a kunit.yaml file, it is more of a test.yaml file with hundreds of subystems inside it (and probably a corresponding get_tests.pl script)[think how MAINTAINERS file operates and this is a sister file].
Inside the 'KUNIT' section would be a container of tests that would be expected to run (like you listed). Each test has its own command line and params.
Yeah. My hope is we can have a "run_tests" tool which parses that file/files and runs everything.
So whether that ends up being: run_tests --subsystem "KUNIT" --subsystem "MM" or run_test --file "kunit.yaml" --file "mm.yaml" or even run_test --patch "my_mm_and_kunit_change.patch"
A CI system can just run it against the changed files in patches, a user who wants to double check something specific can override it to force the tests for a subsystem which may be indirectly affected. And if you're working on some new tests, or some private internal ones, you can keep your own yaml file and pass that along too.
Longer story.
The current problem is CI systems are not unanimous about what tests they run on submitted patches or git branches. This makes it difficult to figure out why a test failed or how to reproduce. Further, it isn't always clear what tests a normal contributor should run before posting patches.
It has been long communicated that the tests LTP, xfstest and/or kselftests should be the tests to run. However, not all maintainers use those tests for their subsystems. I am hoping to either capture those tests or find ways to convince them to add their tests to the preferred locations.
The goal is for a given subsystem (defined in MAINTAINERS), define a set of tests that should be run for any contributions to that subsystem. The hope is the collective CI results can be triaged collectively (because they are related) and even have the numerous flakes waived collectively (same reason) improving the ability to find and debug new test failures. Because the tests and process are known, having a human help debug any failures becomes easier.
The plan is to put together a minimal yaml template that gets us going (even if it is not optimized yet) and aim for about a dozen or so subsystems. At that point we should have enough feedback to promote this more seriously and talk optimizations.
Feedback encouraged.
Cheers, Don
# List of tests by subsystem
I think we should split this up into several files, partly to avoid merge conflicts, partly to make it easy to maintain custom collections of tests separately.
For example, fs.yaml could contain entries for both xfstests and fs KUnit and selftests.
I am not opposed to the idea. But I am a fan of the user experience. So while an fs.yaml might sound good, is it obvious to a contributor or tester that given a patch, do they know if fs.yaml is the correct yaml file to parse when running tests? How do you map a patch to a yaml file? I was trying to use subsystems like MAINTAINERS (and get_maintainers.pl) as my mapping. Open to better suggestions.
One option would be to have multiple files, which still have the MAINTAINERS subsystems listed within, and worst-case a tool just parses all of the files in that directory until it finds a matching one. Maybe a bit slower than having everything in the one file, but it sidesteps merge conflicts well.
But ideally, I'd like (as mentioned below) to have a tool which I can use to run tests locally, and being able to run, e.g., ./run_tests --all -f fs.yaml If I want to specify the tests I want to run manually, personally I think a filename would be a bit nicer than having to pass through, e.g., subsystem names.
It's also probably going to be necessary to have separate sets of tests for different use-cases. For example, there might be a smaller, quicker set of tests to run on every patch, and a much longer, more expensive set which only runs every other day. So I don't think there'll even be a 1:1 mapping between 'test collections' (files) and subsystems. But an automated way of running "this collection of tests" would be very useful, particularly if it's more user-friendly than just writing a shell script (e.g., having nicely formatted output, being able to run things in parallel or remotely, etc).
I don't disagree. I am trying to start small to get things going and some momentum. I proposed a container of tests section. I would like to think adding another field in each individual test area like (short, medium, long OR mandatory, performance, nice-to-have) would be easy to add to the yaml file overall and attempt to accomplish what you are suggesting. Thoughts?
I think that'd be a great idea. Maybe a "stage" field could work, too, where later tests only run if the previous ones pass. For example: Stage 0: checkpatch, does it build Stage 1: KUnit tests, unit tests, single architecture Stage 2: Full boot tests, selftests, etc Stage 3: The above tests on other architectures, allyesconfig, randconfig, etc.
Regardless, it'd be useful to be able to name individual tests and/or configurations and manually trigger them and/or filter on them.
_Maybe_ it makes sense to split up the "what tests to run" and "how are they run" bits. The obvious split here would be to have the test catalogue just handle the former, and the "how they're run" bit entirely live in shell scripts. But if we're going to support running tests in parallel and nicely displaying results, maybe there'll be a need to have something more data driven than a shell script.
# # Tests should adhere to KTAP definitions for results # # Description of section entries # # maintainer: test maintainer - name <email> # list: mailing list for discussion # version: stable version of the test # dependency: necessary distro package for testing # test: # path: internal git path or url to fetch from # cmd: command to run; ability to run locally # param: additional param necessary to run test # hardware: hardware necessary for validation # # Subsystems (alphabetical)
KUNIT TEST:
For KUnit, it'll be interesting to draw the distinction between KUnit overall and individual KUnit suites. I'd lean towards having a separate entry for each subsystem's KUnit tests (including one for KUnit's own tests)
KUNIT may not have been the best 'common' test example due to its complexities across other subsystems. :-/
Yeah: I think KUnit tests are a good example of the sorts of tests which would be relatively easy to integrate, but KUnit as a subsystem can be a bit confusing as an example because no-one's sure if we're talking about KUnit-the-subsystem or KUnit-the-tests.
maintainer: - name: name1 email: email1 - name: name2 email: email2 list:
How important is it to have these in the case where they're already in the MAINTAINERS file? I can see it being important for tests which live elsewhere, though eventually, I'd still prefer the subsystem maintainer to take some responsibility for the tests run for their subsystems.
I wasn't sure if all subsystem maintainers actually want to maintain the tests too or just point someone else at it. I look at LTP as an example here. But I could be wrong.
Fair enough. Maybe we just make this optional, and if empty we "default" to the subsystem maintainer.
version:
This field is probably unnecessary for test frameworks which live in the kernel tree.
Possibly. It was brought up at Plumbers, so I included it for completeness.
Yeah. Again, good to have, but make it optional.
dependency: - dep1 - dep2
If we want to automate this in any way, we're going to need to work out a way of specifying these. Either we'd have to pick a distro's package names, or have our own mapping.
Agreed. I might lean on what 'perf' outputs. They do dependency detection and output suggested missing packages. Their auto detection of already included deps is rather complicated though.
Sounds good.
(A part of me really likes the idea of having a small list of "known" dependencies: python, docker, etc, and trying to limit tests to using those dependencies. Though there are plenty of useful tests with more complicated dependencies, so that probably won't fly forever.)
Hehe. For Fedora/RHEL at least, python has hundreds of smaller library packages. That is tricky. And further some tests like to compile, which means a bunch of -devel packages. Each distro has different names for their -devel packages. :-/ But a side goal of this effort is to define some community standards. Perhaps we can influence things here to clean up this problem??
That'd be nice. :-)
test: - path: tools/testing/kunit cmd: param: - path: cmd: param:
Is 'path' here supposed to be the path to the test binary, the working directory, etc? Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.
The thought was the command to copy-n-paste to run the test after installing it. I am thinking most tests might be a git-clone or exploded tarball, leaving the path to be from the install point. So maybe working_directory is more descriptive.
Sounds good. In the KUnit case, the tooling currently expects the working directory to be the root of the kernel checkout, and the command to be "./tools/testing/kunit/kunit.py"...
hardware: none
For KUnit, I'd imagine having a kunit.yaml, with something like this, including the KUnit tests in the 'kunit' and 'example' suites, and the 'kunit_tool_test.py' test script:
KUnit: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit.py param: run kunit - path: . cmd: tools/testing/kunit.py param: run example hardware: none KUnit Tool: maintainer: - name: David Gow email: davidgow@google.com - name: Brendan Higgins email: brendan.higgins@linux.dev list: kunit-dev@googlegroups.com dependency: - python3 test: - path: . cmd: tools/testing/kunit_tool_test.py param: hardware: none
Obviously there's still some redundancy there, and I've not actually tried implementing something that could run it. It also lacks any information about the environment. In practice, I have about 20 different kunit.py invocations which run the tests with different configs and on different architectures. Though that might make sense to keep in a separate file to only run if the simpler tests pass. And equally, it'd be nice to have a 'common.yaml' file with basic patch and build tests which apply to almost everything (checkpatch, make defconfig, maybe even make allmodconfig, etc).
Nice, thanks for the more detailed example.
Thanks, -- David
On Thu, 7 Nov 2024 at 01:01, Donald Zickus dzickus@redhat.com wrote:
Hi,
Thanks for the feedback. I created a more realistic test.yaml file to start (we can split it when more tests are added) and a parser. I was going to add patch support as input to mimic get_maintainers.pl output, but that might take some time. For now, you have to manually select a subsystem. I will try to find space on kernelci.org to grow this work but you can find a git tree here[0].
From the README.md """ An attempt to map kernel subsystems to kernel tests that should be run on patches or code by humans and CI systems.
Examples:
Find test info for a subsystem
./get_tests.py -s 'KUNIT TEST' --info
Subsystem: KUNIT TEST Maintainer: David Gow davidgow@google.com Mailing List: None Version: None Dependency: ['python3-mypy'] Test: smoke: Url: None Working Directory: None Cmd: ./tools/testing/kunit/kunit.py Env: None Param: run --kunitconfig lib/kunit Hardware: arm64, x86_64
Find copy-n-pastable tests for a subsystem
./get_tests.py -s 'KUNIT TEST'
./tools/testing/kunit/kunit.pyrun --kunitconfig lib/kunit """
Is this aligning with what people were expecting?
Awesome! I've been playing around a bit with this, and I think it's an excellent start.
There are definitely some more features I'd want in an ideal world (e.g., configuration matrices, etc), but this works well enough.
I've been playing around with a branch which adds the ability to actually run these tests, based on the 'run_checks.py' script we use for KUnit: https://github.com/sulix/test-catalog/tree/runtest-wip
In particular, this adds a '-r' option which runs the tests for the subsystem in parallel. This largely matches what I was doing manually — for instance, the KUnit section in test.yaml now has three different tests, and running it gives me this result: ../test-catalog/get_tests.py -r -s 'KUNIT TEST' Waiting on 3 checks (kunit-tool-test, uml, x86_64)... kunit-tool-test: PASSED x86_64: PASSED uml: PASSED
(Obviously, in the real world, I'd have more checks, including other architectures, checkpatch, etc, but this works as a proof-of-concept for me.)
I think the most interesting questions will be: - How do we make this work with more complicated dependencies (containers, special hardware, etc)? - How do we integrate it with CI systems — can we pull the subsystem name for a patch from MAINTAINERS and look it up here? - What about things like checkpatch, or general defconfig build tests which aren't subsystem-specific? - How can we support more complicated configurations or groups of configurations? - Do we add support for specific tools and/or parsing/combining output?
But I'm content to keep playing around with this a bit more for now.
Thanks, -- David
Hi David,
On Wed, Nov 20, 2024 at 3:16 AM David Gow davidgow@google.com wrote:
On Thu, 7 Nov 2024 at 01:01, Donald Zickus dzickus@redhat.com wrote:
Hi,
Thanks for the feedback. I created a more realistic test.yaml file to start (we can split it when more tests are added) and a parser. I was going to add patch support as input to mimic get_maintainers.pl output, but that might take some time. For now, you have to manually select a subsystem. I will try to find space on kernelci.org to grow this work but you can find a git tree here[0].
From the README.md """ An attempt to map kernel subsystems to kernel tests that should be run on patches or code by humans and CI systems.
Examples:
Find test info for a subsystem
./get_tests.py -s 'KUNIT TEST' --info
Subsystem: KUNIT TEST Maintainer: David Gow davidgow@google.com Mailing List: None Version: None Dependency: ['python3-mypy'] Test: smoke: Url: None Working Directory: None Cmd: ./tools/testing/kunit/kunit.py Env: None Param: run --kunitconfig lib/kunit Hardware: arm64, x86_64
Find copy-n-pastable tests for a subsystem
./get_tests.py -s 'KUNIT TEST'
./tools/testing/kunit/kunit.pyrun --kunitconfig lib/kunit """
Is this aligning with what people were expecting?
Awesome! I've been playing around a bit with this, and I think it's an excellent start.
There are definitely some more features I'd want in an ideal world (e.g., configuration matrices, etc), but this works well enough.
Yeah, I was trying to nail down the usability angle first before expanding with bells and whistles. I would like to think the yaml file is flexible enough to handle those features though??
I've been playing around with a branch which adds the ability to actually run these tests, based on the 'run_checks.py' script we use for KUnit: https://github.com/sulix/test-catalog/tree/runtest-wip
Thanks!
In particular, this adds a '-r' option which runs the tests for the subsystem in parallel. This largely matches what I was doing manually — for instance, the KUnit section in test.yaml now has three different tests, and running it gives me this result: ../test-catalog/get_tests.py -r -s 'KUNIT TEST' Waiting on 3 checks (kunit-tool-test, uml, x86_64)... kunit-tool-test: PASSED x86_64: PASSED uml: PASSED
Interesting. Originally I was thinking this would be done serially. I didn't think tests were safe enough to run in parallel. I am definitely open to this. My python isn't the best, but I think your PR looks reasonable.
(Obviously, in the real world, I'd have more checks, including other architectures, checkpatch, etc, but this works as a proof-of-concept for me.)
I think the most interesting questions will be:
- How do we make this work with more complicated dependencies
(containers, special hardware, etc)?
I was imagining a 'hw-requires' type line to handle the hardware requests as that seemed natural for a lot of the driver work. Run a quick check before running the test to see if the required hw is present or not and bail if it isn't. The containers piece is a little trickier and ties into the test environment I think. The script would have to create an environment and inject the tests into the environment and run them. I would imagine some of this would have to be static as the setup is complicated. For example, a 'container' label would execute custom code to setup a test environment inside a container. Open to ideas here.
- How do we integrate it with CI systems — can we pull the subsystem
name for a patch from MAINTAINERS and look it up here?
There are two thoughts. First is yes. As a developer you probably want to run something like 'get_maintainers.sh <patch> | get_tests.py -s -' or something to figure out what variety of tests you should run before posting. And a CI system could probably do something similar.
There is also another thought, you already know the subsystem you want to test. For example, a patch is usually written for a particular subsystem that happens to touch code from other subsystems. You primarily want to run it against a specified subsystem. I know Red Hat's CKI will run against a known subsystem git-tree and would fall into this category. While it does leave a gap in other subsystem testing, sometimes as a human you already know running those extra tests is mostly a no-op because it doesn't really change anything.
- What about things like checkpatch, or general defconfig build tests
which aren't subsystem-specific?
My initial thought is that this is another category of testing. A lot of CI tests are workload testing and have predefined configs. Whereas a generic testing CI system (think 0-day) would focus on those types of testing. So I would lean away from those checks in this approach or we could add a category 'general' too. I do know checkpatch rules vary from maintainer to maintainer.
- How can we support more complicated configurations or groups of
configurations?
Examples?
- Do we add support for specific tools and/or parsing/combining output?
Examples? I wasn't thinking of parsing test output, just providing what to run as a good first step. My initial thought was to help nudge tests towards the KTAP output??
But I'm content to keep playing around with this a bit more for now.
Thank you! Please do!
Cheers, Don
linux-kselftest-mirror@lists.linaro.org