Some months ago I started work on a document to formalize how kselftest implements the TAP specification. However, I didn't finish that work. Maybe it's time to do so now.
kselftest has developed a few differences from the original TAP specification, and some extensions that I believe are worth documenting.
Essentially, we have created our own KTAP (kernel TAP) format. I think it is worth documenting our conventions, in order to keep everyone on the same page.
Below is a partially completed document on my understanding of KTAP, based on examination of some of the kselftest test output. I have not reconciled this with the kunit output format, which I believe has some differences (which maybe we should resolve before we get too far into this).
I submit the document now, before it is finished, because a patch was recently introduced to alter one of the result conventions (from SKIP='not ok' to SKIP='ok').
See the document include inline below
====== start of ktap-doc-rfc.txt ====== Selftest preferred output format --------------------------------
The linux kernel selftest system uses TAP (Test Anything Protocol) output to make testing results consumable by automated systems. A number of Continuous Integration (CI) systems test the kernel every day. It is useful for these systems that output from selftest programs be consistent and machine-parsable.
At the same time, it is useful for test results to be human-readable as well.
The kernel test result format is based on a variation TAP TAP is a simple text-based format that is documented on the TAP home page (http://testanything.org/). There is an official TAP13 specification here: http://testanything.org/tap-version-13-specification.html
The kernel test result format consists of 5 major elements, 4 of which are line-based: * the output version line * the plan line * one or more test result lines (also called test result lines) * a possible "Bail out!" line
optional elements: * diagnostic data
The 5th element is diagnostic information, which is used to describe items running in the test, and possibly to explain test results. A sample test result is show below:
Some other lines may be placed the test harness, and are not emitted by individual test programs: * one or more test identification lines * a possible results summary line
Here is an example:
TAP version 13 1..1 # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 ok 1 selftests: cpufreq: main.sh
The output version line is: "TAP version 13"
The test plan is "1..1".
Element details ===============
Output version line ------------------- The output version line is always "TAP version 13".
Although the kernel test result format has some additions to the TAP13 format, the version line reported by kselftest tests is (currently) always the exact string "TAP version 13"
This is always the first line of test output.
Test plan line -------------- The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations: * the second line of test output, when the number of test cases is known in advance * as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
Test result lines ----------------- The test output consists of one or more test result lines that indicate the actual results for the test. These have the format:
<result> <number> <description> [<directive>] [<diagnostic data>]
The ''result'' must appear at the start of a line (except for when a test is nested, see below), and must consist of one of the following two phrases: * ok * not ok
If the test passed, then the result is reported as "ok". If the test failed, then the result is reported as "not ok". These must be in lower case, exactly as shown.
The ''number'' in the test result line represents the number of the test case being performed by the test program. This is often used by test harnesses as a unique identifier for each test case. The test number is a base-10 number, starting with 1. It should increase by one for each new test result line emitted. If possible the number for a test case should be kept the same over the lifetime of the test.
The ''description'' is a short description of the test case. This can be any string of words, but should avoid using colons (':') except as part of a fully qualifed test case name (see below).
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
A test may be skipped for a variety of reasons, ranging for insufficient privileges to missing features or resources required to execute that test case.
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Diagnostic data --------------- Diagnostic data is text that reports on test conditions or test operations, or that explains test results. In the kernel test result format, diagnostic data is placed on lines that start with a hash sign, followed by a space ('# ').
One special format of diagnostic data is a test identification line, that has the fully qualified name of a test case. Such a test identification line marks the start of test output for a test case.
In the example above, there are three lines that start with '#' which precede the test result line: # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 These are used to indicate diagnostic data for the test case 'selftests: cpufreq: main.sh'
Material in comments between the identification line and the test result line are diagnostic data that can help to interpret the results of the test.
The TAP specification indicates that automated test harnesses may ignore any line that is not one of the mandatory prescribed lines (that is, the output format version line, the plan line, a test result line, or a "Bail out!" line.)
Bail out! --------- If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
--- from here on is not-yet-organized material
Tip: - don't change the test plan based on skipped tests. - it is better to report that a test case was skipped, than to not report it - that is, don't adjust the number of test cases based on skipped tests
Other things to mention: TAP13 elements not used: - yaml for diagnostic messages - reason: try to keep things line-based, since output from other things may be interspersed with messages from the test itself - TODO directive
KTAP Extensions beyond TAP13: - nesting - via indentation - indentation makes it easier for humans to read - test identifier - multiple parts, separated by ':' - summary lines - can be skipped by CI systems that do their own calculations
Other notes: - automatic assignment of result status based on exit code
Tips: - do NOT describe the result in the test line - the test case description should be the same whether the test succeeds or fails - use diagnostic lines to describe or explain results, if this is desirable - test numbers are considered harmful - test harnesses should use the test description as the identifier - test numbers change when testcases are added or removed - which means that results can't be compared between different versions of the test - recommendations for diagnostic messages: - reason for failure - reason for skip - diagnostic data should always preceding the result line - problem: harness may emit result before test can do assessment to determine reason for result - this is what the kernel uses
Differences between kernel test result format and TAP13: - in KTAP the "# SKIP" directive is placed after the description on the test result line
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions: - is this document desired or not? - is it too long or too short? - if the document is desired, where should it be placed? I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome. - is this document accurate? I think KUNIT does a few things differently than this description. - is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Finally, - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)? See https://testanything.org/tap-version-13-specification.html
Regards, -- Tim
On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim Tim.Bird@sony.com wrote:
Some months ago I started work on a document to formalize how kselftest implements the TAP specification. However, I didn't finish that work. Maybe it's time to do so now.
kselftest has developed a few differences from the original TAP specification, and some extensions that I believe are worth documenting.
Essentially, we have created our own KTAP (kernel TAP) format. I think it is worth documenting our conventions, in order to keep everyone on the same page.
Below is a partially completed document on my understanding of KTAP, based on examination of some of the kselftest test output. I have not reconciled this with the kunit output format, which I believe has some differences (which maybe we should resolve before we get too far into this).
Thanks for doing this! This is something we've wanted to have for a while!
On the KUnit side of things, we've not (intentionally) deviated too much from TAP/kselftest It's certainly our intention to hew as close as possible to what kselftest is doing: I don't think there are any real conflicts conceptually (at least at the moment), but we're almost certainly handling a few details differently.
One other thing worth noting is that KUnit has a parser for our TAP results: './tools/testing/kunit/kunit.py parse' will do some basic parsing and print out results, a summary, etc.
A few other comments below:
I submit the document now, before it is finished, because a patch was recently introduced to alter one of the result conventions (from SKIP='not ok' to SKIP='ok').
See the document include inline below
====== start of ktap-doc-rfc.txt ====== Selftest preferred output format
The linux kernel selftest system uses TAP (Test Anything Protocol) output to make testing results consumable by automated systems. A number of Continuous Integration (CI) systems test the kernel every day. It is useful for these systems that output from selftest programs be consistent and machine-parsable.
At the same time, it is useful for test results to be human-readable as well.
The kernel test result format is based on a variation TAP TAP is a simple text-based format that is documented on the TAP home page (http://testanything.org/). There is an official TAP13 specification here: http://testanything.org/tap-version-13-specification.html
The kernel test result format consists of 5 major elements, 4 of which are line-based:
- the output version line
- the plan line
- one or more test result lines (also called test result lines)
- a possible "Bail out!" line
optional elements:
- diagnostic data
The 5th element is diagnostic information, which is used to describe items running in the test, and possibly to explain test results. A sample test result is show below:
Some other lines may be placed the test harness, and are not emitted by individual test programs:
- one or more test identification lines
- a possible results summary line
Here is an example:
TAP version 13 1..1 # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 ok 1 selftests: cpufreq: main.sh
The output version line is: "TAP version 13"
The test plan is "1..1".
Element details
Output version line
The output version line is always "TAP version 13".
Although the kernel test result format has some additions to the TAP13 format, the version line reported by kselftest tests is (currently) always the exact string "TAP version 13"
This is always the first line of test output.
KUnit is currently outputting "TAP version 14", as we were hoping some of our changes would get into the TAP14 spec. (Any comments, Brendan?) Maybe this should end up saying "KTAP version 1" or something?
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
KUnit is currently including the test plan line only for subtests, as the current version doesn't actually know how many test suites will run in advance. This is something there's work underway to fix, though.
Test result lines
The test output consists of one or more test result lines that indicate the actual results for the test. These have the format:
<result> <number> <description> [<directive>] [<diagnostic data>]
The ''result'' must appear at the start of a line (except for when a test is nested, see below), and must consist of one of the following two phrases:
- ok
- not ok
If the test passed, then the result is reported as "ok". If the test failed, then the result is reported as "not ok". These must be in lower case, exactly as shown.
The ''number'' in the test result line represents the number of the test case being performed by the test program. This is often used by test harnesses as a unique identifier for each test case. The test number is a base-10 number, starting with 1. It should increase by one for each new test result line emitted. If possible the number for a test case should be kept the same over the lifetime of the test.
The ''description'' is a short description of the test case. This can be any string of words, but should avoid using colons (':') except as part of a fully qualifed test case name (see below).
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
A test may be skipped for a variety of reasons, ranging for insufficient privileges to missing features or resources required to execute that test case.
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
We're in the process of supporting test skipping in KUnit at the moment[1], and haven't totally formalised what the syntax here should be. The only output issues thus far have been on the "ok"/"not ok" point (my in-progress patch is using 'ok', the previous RFC could output either). At the moment, the reason a test is skipped has to be on the same line as the result for the tools to pick it up (and the KUnit API always requests such a 'status comment', even if it ends up as the empty string).
We'll probably follow whatever kselftest does here, though, but will be able to do more with skip reasons on the reult line.
Diagnostic data
Diagnostic data is text that reports on test conditions or test operations, or that explains test results. In the kernel test result format, diagnostic data is placed on lines that start with a hash sign, followed by a space ('# ').
One special format of diagnostic data is a test identification line, that has the fully qualified name of a test case. Such a test identification line marks the start of test output for a test case.
In the example above, there are three lines that start with '#' which precede the test result line: # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 These are used to indicate diagnostic data for the test case 'selftests: cpufreq: main.sh'
Material in comments between the identification line and the test result line are diagnostic data that can help to interpret the results of the test.
The TAP specification indicates that automated test harnesses may ignore any line that is not one of the mandatory prescribed lines (that is, the output format version line, the plan line, a test result line, or a "Bail out!" line.)
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
may be interspersed with messages from the test itself
- reason: try to keep things line-based, since output from other things
We're not using this in KUnit, either.
- TODO directive
Ditto: the upcoming SKIP support leaves room for this to easily be added, though.
KTAP Extensions beyond TAP13:
- nesting
- via indentation
- indentation makes it easier for humans to read
We're using this a lot in KUnit, as all tests are split into suites. The syntax is basically a full nested TAP document, indented with four spaces. (There are a couple of tests which output some non-indented lines to our log, though.)
I've included some example output at the end of this email of what we're doing currently.
- test identifier
- multiple parts, separated by ':'
- summary lines
- can be skipped by CI systems that do their own calculations
We're not outputting any summary lines for the tests as a whole, but the success of a test suite is determined from the success of nested tests.
Other notes:
- automatic assignment of result status based on exit code
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
That's what we're planning to do with KUnit as well: clearly I didn't read the TAP13 spec as thoroughly as I'd intended, as I'd naively assumed that this was TAP13 spec compliant. Oops. I'm very much in favour of this change.
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
This is definitely a good thing for us: thanks a lot!
- is it too long or too short?
- if the document is desired, where should it be placed? I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
- is this document accurate? I think KUNIT does a few things differently than this description.
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
As above, we'd love to at least try to have kunit and kselftest using the same format.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
I have a very mild preference for 'ok': but it doesn't really matter much one way or the other. Our tooling throws the result away if it sees a SKIP.
Regards, -- Tim
Example KUnit output (including the in-progress "skip test" support): TAP version 14 # Subtest: kunit-try-catch-test 1..2 ok 1 - kunit_test_try_catch_successful_try_no_catch ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch ok 1 - kunit-try-catch-test # Subtest: example 1..2 # example_simple_test: initializing ok 1 - example_simple_test # example_skip_test: initializing ok 2 - example_skip_test # SKIP this test should be skipped ok 2 - example
[1]: https://lore.kernel.org/linux-kselftest/20200513042956.109987-1-davidgow@goo...
-----Original Message----- From: David Gow davidgow@google.com
On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim Tim.Bird@sony.com wrote:
Some months ago I started work on a document to formalize how kselftest implements the TAP specification. However, I didn't finish that work. Maybe it's time to do so now.
kselftest has developed a few differences from the original TAP specification, and some extensions that I believe are worth documenting.
Essentially, we have created our own KTAP (kernel TAP) format. I think it is worth documenting our conventions, in order to keep everyone on the same page.
Below is a partially completed document on my understanding of KTAP, based on examination of some of the kselftest test output. I have not reconciled this with the kunit output format, which I believe has some differences (which maybe we should resolve before we get too far into this).
Thanks for doing this! This is something we've wanted to have for a while!
On the KUnit side of things, we've not (intentionally) deviated too much from TAP/kselftest It's certainly our intention to hew as close as possible to what kselftest is doing: I don't think there are any real conflicts conceptually (at least at the moment), but we're almost certainly handling a few details differently.
One other thing worth noting is that KUnit has a parser for our TAP results: './tools/testing/kunit/kunit.py parse' will do some basic parsing and print out results, a summary, etc.
A few other comments below:
I submit the document now, before it is finished, because a patch was recently introduced to alter one of the result conventions (from SKIP='not ok' to SKIP='ok').
See the document include inline below
====== start of ktap-doc-rfc.txt ====== Selftest preferred output format
The linux kernel selftest system uses TAP (Test Anything Protocol) output to make testing results consumable by automated systems. A number of Continuous Integration (CI) systems test the kernel every day. It is useful for these systems that output from selftest programs be consistent and machine-parsable.
At the same time, it is useful for test results to be human-readable as well.
The kernel test result format is based on a variation TAP TAP is a simple text-based format that is documented on the TAP home page (http://testanything.org/). There is an official TAP13 specification here: http://testanything.org/tap-version-13-specification.html
The kernel test result format consists of 5 major elements, 4 of which are line-based:
- the output version line
- the plan line
- one or more test result lines (also called test result lines)
- a possible "Bail out!" line
optional elements:
- diagnostic data
The 5th element is diagnostic information, which is used to describe items running in the test, and possibly to explain test results. A sample test result is show below:
Some other lines may be placed the test harness, and are not emitted by individual test programs:
- one or more test identification lines
- a possible results summary line
Here is an example:
TAP version 13 1..1 # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 ok 1 selftests: cpufreq: main.sh
The output version line is: "TAP version 13"
The test plan is "1..1".
Element details
Output version line
The output version line is always "TAP version 13".
Although the kernel test result format has some additions to the TAP13 format, the version line reported by kselftest tests is (currently) always the exact string "TAP version 13"
This is always the first line of test output.
KUnit is currently outputting "TAP version 14", as we were hoping some of our changes would get into the TAP14 spec. (Any comments, Brendan?) Maybe this should end up saying "KTAP version 1" or something?
I don't know if this will break any existing results parsers or not. I hesitate to use "TAP version 14", as TAP appears to be a dormant initiative at the moment, and there's no guarantee that the kernel's changes will get adopted into an official spec.
If we are a strict super-set of TAP, then I suppose we could just start using TAP version 14, and unilaterally declare that our changes make a new spec. But since we don't control the web site this feels like a hostile takeover.
I'm most comfortable with calling our thing KTAP, and just referencing TAP as inspiration. I don't have a strong opinion on KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the same version line (if we can get them to use the same conventions).
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
KUnit is currently including the test plan line only for subtests, as the current version doesn't actually know how many test suites will run in advance. This is something there's work underway to fix, though.
Sounds good. You can just put the line at the bottom if it's obnoxious to calculate ahead of time.
Does this mean that KUnit treats each sub-test as an individual test case of the "super-test"?
In results summaries for a super-test, are all sub-test cases counted, or just the list of sub-tests?
Test result lines
The test output consists of one or more test result lines that indicate the actual results for the test. These have the format:
<result> <number> <description> [<directive>] [<diagnostic data>]
The ''result'' must appear at the start of a line (except for when a test is nested, see below), and must consist of one of the following two phrases:
- ok
- not ok
If the test passed, then the result is reported as "ok". If the test failed, then the result is reported as "not ok". These must be in lower case, exactly as shown.
The ''number'' in the test result line represents the number of the test case being performed by the test program. This is often used by test harnesses as a unique identifier for each test case. The test number is a base-10 number, starting with 1. It should increase by one for each new test result line emitted. If possible the number for a test case should be kept the same over the lifetime of the test.
The ''description'' is a short description of the test case. This can be any string of words, but should avoid using colons (':') except as part of a fully qualifed test case name (see below).
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
A test may be skipped for a variety of reasons, ranging for insufficient privileges to missing features or resources required to execute that test case.
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
We're in the process of supporting test skipping in KUnit at the moment[1], and haven't totally formalised what the syntax here should be. The only output issues thus far have been on the "ok"/"not ok" point (my in-progress patch is using 'ok', the previous RFC could output either).
I'll comment on this in my reply to Kees' email.
At the moment, the reason a test is skipped has to be on the same line as the result for the tools to pick it up (and the KUnit API always requests such a 'status comment', even if it ends up as the empty string).
OK - I think this is a good convention.
We'll probably follow whatever kselftest does here, though, but will be able to do more with skip reasons on the result line.
Diagnostic data
Diagnostic data is text that reports on test conditions or test operations, or that explains test results. In the kernel test result format, diagnostic data is placed on lines that start with a hash sign, followed by a space ('# ').
One special format of diagnostic data is a test identification line, that has the fully qualified name of a test case. Such a test identification line marks the start of test output for a test case.
In the example above, there are three lines that start with '#' which precede the test result line: # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 These are used to indicate diagnostic data for the test case 'selftests: cpufreq: main.sh'
Material in comments between the identification line and the test result line are diagnostic data that can help to interpret the results of the test.
The TAP specification indicates that automated test harnesses may ignore any line that is not one of the mandatory prescribed lines (that is, the output format version line, the plan line, a test result line, or a "Bail out!" line.)
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
may be interspersed with messages from the test itself
- reason: try to keep things line-based, since output from other things
We're not using this in KUnit, either.
- TODO directive
Ditto: the upcoming SKIP support leaves room for this to easily be added, though.
KTAP Extensions beyond TAP13:
- nesting
- via indentation
- indentation makes it easier for humans to read
We're using this a lot in KUnit, as all tests are split into suites. The syntax is basically a full nested TAP document, indented with four spaces. (There are a couple of tests which output some non-indented lines to our log, though.)
I've included some example output at the end of this email of what we're doing currently.
- test identifier
- multiple parts, separated by ':'
- summary lines
- can be skipped by CI systems that do their own calculations
We're not outputting any summary lines for the tests as a whole, but the success of a test suite is determined from the success of nested tests.
Other notes:
- automatic assignment of result status based on exit code
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
That's what we're planning to do with KUnit as well: clearly I didn't read the TAP13 spec as thoroughly as I'd intended, as I'd naively assumed that this was TAP13 spec compliant. Oops. I'm very much in favour of this change.
OK - thanks for the feedback. It's my preference also.
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
This is definitely a good thing for us: thanks a lot!
- is it too long or too short?
- if the document is desired, where should it be placed? I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
- is this document accurate? I think KUNIT does a few things differently than this description.
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
As above, we'd love to at least try to have kunit and kselftest using the same format.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
I have a very mild preference for 'ok': but it doesn't really matter much one way or the other. Our tooling throws the result away if it sees a SKIP.
Ok - thanks for the feedback. -- Tim
Regards, -- Tim
Example KUnit output (including the in-progress "skip test" support): TAP version 14 # Subtest: kunit-try-catch-test 1..2 ok 1 - kunit_test_try_catch_successful_try_no_catch ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch ok 1 - kunit-try-catch-test # Subtest: example 1..2 # example_simple_test: initializing ok 1 - example_simple_test # example_skip_test: initializing ok 2 - example_skip_test # SKIP this test should be skipped ok 2 - example
It's nice to have this example. Thanks. -- Tim
On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim Tim.Bird@sony.com wrote:
-----Original Message----- From: David Gow davidgow@google.com
On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim Tim.Bird@sony.com wrote:
[...]
KUnit is currently outputting "TAP version 14", as we were hoping some of our changes would get into the TAP14 spec. (Any comments, Brendan?) Maybe this should end up saying "KTAP version 1" or something?
I don't know if this will break any existing results parsers or not. I hesitate to use "TAP version 14", as TAP appears to be a dormant initiative at the moment, and there's no guarantee that the kernel's changes will get adopted into an official spec.
We were using "TAP version 14" since the "extensions" we are using were all proposed among the TAP people to go into the next version of TAP. Based on discussions among them they seem to like the subtest stuff:
https://github.com/TestAnything/testanything.github.io/pull/36
Anyway, I can still appreciate that they might change their minds.
If we are a strict super-set of TAP, then I suppose we could just start using TAP version 14, and unilaterally declare that our changes make a new spec. But since we don't control the web site this feels like a hostile takeover.
I just thought it was okay because it was already in their proposed TAP14 spec, but yeah, if we just decide amongst ourselves to use it, we should probably do something else.
I'm most comfortable with calling our thing KTAP, and just referencing TAP as inspiration. I don't have a strong opinion on
I am cool with that.
KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the same version line (if we can get them to use the same conventions).
Yeah, I agree: it would be better if there was just one version of (K)TAP in the kernel.
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
KUnit is currently including the test plan line only for subtests, as the current version doesn't actually know how many test suites will run in advance. This is something there's work underway to fix, though.
Sounds good. You can just put the line at the bottom if it's obnoxious to calculate ahead of time.
I thought that is not in the TAP spec?
I kind of like printing out ahead of time how many tests we expect to run, so if we crash we know how many tests weren't run.
In any case, until we get the change in that David is referencing, we cannot print out the test plan for the "super test" before or after because KUnit doesn't know when it is "done". So moving it to the bottom doesn't really help us.
Does this mean that KUnit treats each sub-test as an individual test case of the "super-test"?
Yes.
At the top level, we have all test suites. Each subtest in TAP is a test suite in KUnit. Each case in each subtest in TAP is a test case in KUnit.
In results summaries for a super-test, are all sub-test cases counted, or just the list of sub-tests?
Just the sub-tests. Each subtest is responsible for counting it's own cases:
https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-s...
Cheers
-----Original Message----- From: Brendan Higgins brendanhiggins@google.com
On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim Tim.Bird@sony.com wrote:
-----Original Message----- From: David Gow davidgow@google.com
On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim Tim.Bird@sony.com wrote:
[...]
KUnit is currently outputting "TAP version 14", as we were hoping some of our changes would get into the TAP14 spec. (Any comments, Brendan?) Maybe this should end up saying "KTAP version 1" or something?
I don't know if this will break any existing results parsers or not. I hesitate to use "TAP version 14", as TAP appears to be a dormant initiative at the moment, and there's no guarantee that the kernel's changes will get adopted into an official spec.
We were using "TAP version 14" since the "extensions" we are using were all proposed among the TAP people to go into the next version of TAP. Based on discussions among them they seem to like the subtest stuff:
https://github.com/TestAnything/testanything.github.io/pull/36
Anyway, I can still appreciate that they might change their minds.
I don't know if there's anyone left around to change their mind.
I have to agree with isaacs (who proposed TAP14 5 years ago), that the TAP specification is in a weird state.
https://github.com/TestAnything/testanything.github.io/pull/36#issuecomment-...
There have been commits to the github respository by a few different people recently (3 commits in the last 9 months). But there's no body to approve or disapprove of a new spec.
If we are a strict super-set of TAP, then I suppose we could just start using TAP version 14, and unilaterally declare that our changes make a new spec. But since we don't control the web site this feels like a hostile takeover.
I just thought it was okay because it was already in their proposed TAP14 spec, but yeah, if we just decide amongst ourselves to use it, we should probably do something else.
I'm most comfortable with calling our thing KTAP, and just referencing TAP as inspiration. I don't have a strong opinion on
I am cool with that.
I hate forks, but if we do go with declaring our own fork as KTAP, then we should probably pull in the TAP14 spec as a starting point. Since it has no home other than in a pull request, it seems a bit tentative. We may need to put the spec in the kernel source or something.
I think we're definitely not doing anything with the yaml blocks at the moment (or maybe ever), so there's one big point of departure.
KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the same version line (if we can get them to use the same conventions).
Yeah, I agree: it would be better if there was just one version of (K)TAP in the kernel.
One thing that needs to be rationalized between KUnit and selftest is the syntax for subtests. KUnit follows the TAP14 spec, and starts subtests with indented "# Subtest: name of the child test" and selftests just indents the output from the child test, so it starts with indented "TAP version 13". One issue I have with the TAP14/KUnit approach is that the output from the child is different depending on whether the test is called in the context of another test or not.
In KUnit, is it the parent test or the child test that prints out the "# Subtest: ..." line? -- Tim
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
KUnit is currently including the test plan line only for subtests, as the current version doesn't actually know how many test suites will run in advance. This is something there's work underway to fix, though.
Sounds good. You can just put the line at the bottom if it's obnoxious to calculate ahead of time.
I thought that is not in the TAP spec?
I kind of like printing out ahead of time how many tests we expect to run, so if we crash we know how many tests weren't run.
In any case, until we get the change in that David is referencing, we cannot print out the test plan for the "super test" before or after because KUnit doesn't know when it is "done". So moving it to the bottom doesn't really help us.
Does this mean that KUnit treats each sub-test as an individual test case of the "super-test"?
Yes.
At the top level, we have all test suites. Each subtest in TAP is a test suite in KUnit. Each case in each subtest in TAP is a test case in KUnit.
In results summaries for a super-test, are all sub-test cases counted, or just the list of sub-tests?
Just the sub-tests. Each subtest is responsible for counting it's own cases:
https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-s...
Cheers
On Tue, Jun 16, 2020 at 08:37:18PM +0000, Bird, Tim wrote:
One thing that needs to be rationalized between KUnit and selftest is the syntax for subtests. KUnit follows the TAP14 spec, and starts subtests with indented "# Subtest: name of the child test" and selftests just indents the output from the child test, so it starts with indented "TAP version 13". One issue I have with the TAP14/KUnit approach is that the output from the child is different depending on whether the test is called in the context of another test or not.
Right -- I'd *really* like the subtests to be "separable", since the primary issue is actually that a subtest may not know it is a subtest, and passing that knowledge in may be difficult/disruptive.
On Tue, Jun 16, 2020 at 1:37 PM Bird, Tim Tim.Bird@sony.com wrote:
-----Original Message----- From: Brendan Higgins brendanhiggins@google.com
On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim Tim.Bird@sony.com wrote:
-----Original Message----- From: David Gow davidgow@google.com
On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim Tim.Bird@sony.com wrote:
[...]
KUnit is currently outputting "TAP version 14", as we were hoping some of our changes would get into the TAP14 spec. (Any comments, Brendan?) Maybe this should end up saying "KTAP version 1" or something?
I don't know if this will break any existing results parsers or not. I hesitate to use "TAP version 14", as TAP appears to be a dormant initiative at the moment, and there's no guarantee that the kernel's changes will get adopted into an official spec.
We were using "TAP version 14" since the "extensions" we are using were all proposed among the TAP people to go into the next version of TAP. Based on discussions among them they seem to like the subtest stuff:
https://github.com/TestAnything/testanything.github.io/pull/36
Anyway, I can still appreciate that they might change their minds.
I don't know if there's anyone left around to change their mind.
I have to agree with isaacs (who proposed TAP14 5 years ago), that the TAP specification is in a weird state.
https://github.com/TestAnything/testanything.github.io/pull/36#issuecomment-...
There have been commits to the github respository by a few different people recently (3 commits in the last 9 months). But there's no body to approve or disapprove of a new spec.
If we are a strict super-set of TAP, then I suppose we could just start using TAP version 14, and unilaterally declare that our changes make a new spec. But since we don't control the web site this feels like a hostile takeover.
I just thought it was okay because it was already in their proposed TAP14 spec, but yeah, if we just decide amongst ourselves to use it, we should probably do something else.
I'm most comfortable with calling our thing KTAP, and just referencing TAP as inspiration. I don't have a strong opinion on
I am cool with that.
I hate forks, but if we do go with declaring our own fork as KTAP, then we should probably pull in the TAP14 spec as a starting point. Since it has no home other than in a pull request, it seems a bit tentative. We may need to put the spec in the kernel source or something.
Yeah, I didn't really want to fork either, but I don't see any other way around it given the current state of TAP. Seems like if we want an updated maintained spec that doesn't live in a pull request, then it is up to us to give it a new home.
I think we're definitely not doing anything with the yaml blocks at the moment (or maybe ever), so there's one big point of departure.
Yeah, seeing that everyone seems amenable to the expectation blocks that KUnit uses in our version of the spec, I don't see any other reason to keep YAML, and I agree it is probably the least desirable thing that they have in the spec. So yeah, I am on board with omitting YAML from KTAP.
KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the same version line (if we can get them to use the same conventions).
Yeah, I agree: it would be better if there was just one version of (K)TAP in the kernel.
One thing that needs to be rationalized between KUnit and selftest is the syntax for subtests. KUnit follows the TAP14 spec, and starts subtests with indented "# Subtest: name of the child test" and selftests just indents the output from the child test, so it starts with indented "TAP version 13". One issue I have with the TAP14/KUnit approach is that the output from the child is different depending on whether the test is called in the context of another test or not.
I can see (in later emails), why you would say that. On the flip side, I think that indenting and starting off with "# Subtest: name of the child test" makes it a lot more readable for a human.
In KUnit, is it the parent test or the child test that prints out the "# Subtest: ..." line?
Short answer: The child test.
Long answer: The answer is complicated. From KUnit's perspective the "parent test" is always just the collection of all test suites. Right now there is no loop that runs each test suite; each one is a late_initcall. However, that is something that we are working on fixing (I really need to get the next version of that patchset out).
Cheers
On 2020-06-16 15:03, Brendan Higgins wrote:
On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim Tim.Bird@sony.com wrote:
-----Original Message----- From: David Gow davidgow@google.com
On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim Tim.Bird@sony.com wrote:
[...]
KUnit is currently outputting "TAP version 14", as we were hoping some of our changes would get into the TAP14 spec. (Any comments, Brendan?) Maybe this should end up saying "KTAP version 1" or something?
I don't know if this will break any existing results parsers or not. I hesitate to use "TAP version 14", as TAP appears to be a dormant initiative at the moment, and there's no guarantee that the kernel's changes will get adopted into an official spec.
We were using "TAP version 14" since the "extensions" we are using were all proposed among the TAP people to go into the next version of TAP. Based on discussions among them they seem to like the subtest stuff:
https://github.com/TestAnything/testanything.github.io/pull/36
Anyway, I can still appreciate that they might change their minds.
If we are a strict super-set of TAP, then I suppose we could just start using TAP version 14, and unilaterally declare that our changes make a new spec. But since we don't control the web site this feels like a hostile takeover.
I just thought it was okay because it was already in their proposed TAP14 spec, but yeah, if we just decide amongst ourselves to use it, we should probably do something else.
I'm most comfortable with calling our thing KTAP, and just referencing TAP as inspiration. I don't have a strong opinion on
I am cool with that.
I like a KTAP specification, based on the proposed TAP 14, but with a good edit, and with the extensions and changes that are coming out of this conversation.
It would be great if we could just have a finished TAP 14, but my understanding is that efforts to move that forward are not making any progress. And if we fork to make KTAP, maybe that will be the incentive for TAP to move forward, and maybe KTAP could be merged back into TAP.
-Frank
KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the same version line (if we can get them to use the same conventions).
Yeah, I agree: it would be better if there was just one version of (K)TAP in the kernel.
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
KUnit is currently including the test plan line only for subtests, as the current version doesn't actually know how many test suites will run in advance. This is something there's work underway to fix, though.
Sounds good. You can just put the line at the bottom if it's obnoxious to calculate ahead of time.
I thought that is not in the TAP spec?
I kind of like printing out ahead of time how many tests we expect to run, so if we crash we know how many tests weren't run.
In any case, until we get the change in that David is referencing, we cannot print out the test plan for the "super test" before or after because KUnit doesn't know when it is "done". So moving it to the bottom doesn't really help us.
Does this mean that KUnit treats each sub-test as an individual test case of the "super-test"?
Yes.
At the top level, we have all test suites. Each subtest in TAP is a test suite in KUnit. Each case in each subtest in TAP is a test case in KUnit.
In results summaries for a super-test, are all sub-test cases counted, or just the list of sub-tests?
Just the sub-tests. Each subtest is responsible for counting it's own cases:
https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-s...
Cheers
On Sat, Jun 13, 2020 at 02:51:17PM +0800, David Gow wrote:
On Sat, Jun 13, 2020 at 6:36 AM Kees Cook keescook@chromium.org wrote:
Regarding output:
[ 36.611358] TAP version 14 [ 36.611953] # Subtest: overflow [ 36.611954] 1..3 ... [ 36.622914] # overflow_calculation_test: s64: 21 arithmetic tests [ 36.624020] ok 1 - overflow_calculation_test ... [ 36.731096] # overflow_shift_test: ok: (s64)(0 << 63) == 0 [ 36.731840] ok 2 - overflow_shift_test ... [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 ... [ 36.805350] # overflow_allocation_test: devm_kzalloc detected saturation [ 36.807763] ok 3 - overflow_allocation_test [ 36.807765] ok 1 - overflow
A few things here....
Tim Bird has just sent out an RFC for a "KTAP" specification, which we'll hope to support in KUnit:
Ah-ha! Thanks for the heads-up. :)
https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD8...
*thread split/merge*
This is coming from: https://lore.kernel.org/linux-kselftest/CABVgOSnofuJQ_fiCL-8KdKezg3Hnqk3A+X5... But I'm attempting a thread jump... ;)
That's probably where we'll end up trying to hash out exactly what this format should be. Fortunately, I don't think any of these will require any per-test work, just changes to the KUnit implementation.
Yup, good.
- On the outer test report, there is no "plan" line (I was expecting "1..1"). Technically it's optional, but it seems like the information is available. :)
There's work underway to support this, but it's hit a few minor snags: https://lkml.org/lkml/2020/2/27/2155
Okay, cool. It's not critical, I don't think.
- The subtest should have its own "TAP version 14" line, and it should be using the diagnostic line prefix for the top-level test (this is what kselftest is doing).
Alas, TAP itself hasn't standardised subtests. Personally, I think it doesn't fundamentally matter which way we do this (I actually prefer the way we're doing it currently slightly), but converging with what kselftest does would be ideal.
I see the KTAP RFC doesn't discuss subtests at all, but kselftest actually already handles subtests. I strongly feel that line-start formatting is the correct way to deal with this, with each subtest having it's own self-contained KTAP. This allows for several important features:
- the subtest, run on its own, needs no knowledge about its execution environment: it simply emits its standard KTAP output.
- subtest output can be externally parsed separably, without any knowledge or parsing of the enclosing test.
For example, with my recent series[1], "make -C seccomp run_tests" produces:
TAP version 13 1..2 # selftests: seccomp: seccomp_bpf # TAP version 13 # 1..77 # # Starting 77 tests from 6 test cases. # # RUN global.mode_strict_support ... # # OK global.mode_strict_support # ok 1 global.mode_strict_support ... # ok 77 TSYNC.two_siblings_not_under_filter # # FAILED: 73 / 77 tests passed. # # Totals: pass:72 fail:4 xfail:1 xpass:0 skip:0 error:0 not ok 1 selftests: seccomp: seccomp_bpf # exit=1 # selftests: seccomp: seccomp_benchmark # not ok 2 selftests: seccomp: seccomp_benchmark # TIMEOUT
- There is no way to distinguish top-level TAP output from kernel log lines. I think we should stick with the existing marker, which is "# ", so that kernel output has no way to be interpreted as TAP details -- unless it intentionally starts adding "#"s. ;)
At the moment, we're doing this in KUnit tool by stripping anything before "TAP version 14" (e.g., the timestamp), and then only incuding lines which parse correctly (are a test plan, result, or a diagnostic line beginning with '#'). This has worked pretty well thus far, and we do have the ability to get results from debugfs instead of the kernel log, which won't have the same problems.
It's worth considering, though, particularly since our parser should handle this anyway without any changes.
For the KTAP parsing, actually think it's very important to include kernel log lines in the test output (as diagnostic lines), since they are "unexpected" (they fail to have the correct indentation) and may provide additional context for test failures when reading log files. (As in the "vmalloc: allocation failure" line in the quoted section above, to be included as a diagnostic line for test #3.)
- There is no summary line (to help humans). For example, the kselftest API produces a final pass/fail report.
Currently, we're relying on the kunit.py script to produce this, but it shouldn't be impossible to add to the kernel, particularly once the "KUnit Executor" changes mentioned above land.
Cool. Yeah, it's not required, but I think there are two use cases: humans running a single test (where is a summary is valuable, especially for long tests that scroll off the screen), and automation (where it can ignore the summary, as it will produce its own in a regularized fashion).
Taken together, I was expecting the output to be:
[ 36.611358] # TAP version 14 [ 36.611953] # 1..1 [ 36.611958] # # TAP version 14 [ 36.611954] # # 1..3 ... [ 36.622914] # # # overflow_calculation_test: s64: 21 arithmetic tests [ 36.624020] # # ok 1 - overflow_calculation_test ... [ 36.731096] # # # overflow_shift_test: ok: (s64)(0 << 63) == 0 [ 36.731840] # # ok 2 - overflow_shift_test ... [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 ... [ 36.805350] # # # overflow_allocation_test: devm_kzalloc detected saturation [ 36.807763] # # ok 3 - overflow_allocation_test [ 36.807763] # # # overflow: PASS [ 36.807765] # ok 1 - overflow [ 36.807767] # # kunit: PASS
But I assume there are threads on this that I've not read... :)
These discussions all seem to be coming to a head now, so this is probably just the kick we need to prioritise this a bit more. The KTAP thread hasn't covered all of these (particularly the subtest stuff) yet, so I confess I hadn't realised the extent of the divergence between KUnit and kselftest here.
Cool. Yeah, I'd like to keep things as close as possible. In looking at this again, I wonder if perhaps it would be better to change the "indent" from "# " to " ", which might make things more readable for both dmesg and terminal output:
[ 36.611358] TAP version 14 [ 36.611953] 1..1 [ 36.611958] TAP version 14 [ 36.611954] 1..3 ... [ 36.622914] # overflow_calculation_test: s64: 21 arithmetic tests [ 36.624020] ok 1 - overflow_calculation_test ... [ 36.731096] # overflow_shift_test: ok: (s64)(0 << 63) == 0 [ 36.731840] ok 2 - overflow_shift_test ... [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 ... [ 36.805350] # overflow_allocation_test: devm_kzalloc detected saturation [ 36.807763] ok 3 - overflow_allocation_test [ 36.807763] # overflow: PASS [ 36.807765] ok 1 - overflow [ 36.807767] # kunit: PASS
As it happens, subtests are a bit rare in kselftests right now, but for kunit, the "common" output (IIUC) will fundamentally be a top-level test running all the subtests, so we should get it right. :)
-Kees
[1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=selfte...
-- Kees Cook
-----Original Message----- From: Kees Cook Sent: Sunday, June 14, 2020 12:18 PM To: David Gow davidgow@google.com Cc: Vitor Massaru Iha vitor@massaru.org; KUnit Development kunit-dev@googlegroups.com; Shuah Khan skhan@linuxfoundation.org; open list:KERNEL SELFTEST FRAMEWORK linux-kselftest@vger.kernel.org; Linux Kernel Mailing List linux-kernel@vger.kernel.org; Brendan Higgins brendanhiggins@google.com; linux-kernel-mentees@lists.linuxfoundation.org; linux@rasmusvillemoes.dk Subject: Re: RFC - kernel selftest result documentation (KTAP)
On Sat, Jun 13, 2020 at 02:51:17PM +0800, David Gow wrote:
On Sat, Jun 13, 2020 at 6:36 AM Kees Cook keescook@chromium.org wrote:
Regarding output:
[ 36.611358] TAP version 14 [ 36.611953] # Subtest: overflow [ 36.611954] 1..3 ... [ 36.622914] # overflow_calculation_test: s64: 21 arithmetic tests [ 36.624020] ok 1 - overflow_calculation_test ... [ 36.731096] # overflow_shift_test: ok: (s64)(0 << 63) == 0 [ 36.731840] ok 2 - overflow_shift_test ... [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
nodemask=(null),cpuset=/,mems_allowed=0
... [ 36.805350] # overflow_allocation_test: devm_kzalloc detected saturation [ 36.807763] ok 3 - overflow_allocation_test [ 36.807765] ok 1 - overflow
A few things here....
Tim Bird has just sent out an RFC for a "KTAP" specification, which we'll hope to support in KUnit:
Ah-ha! Thanks for the heads-up. :)
kselftest/CY4PR13MB1175B804E31E502221BC8163FD830@CY4PR13MB1175.namprd13.prod.outlook.com/T/#u
*thread split/merge*
This is coming from: https://lore.kernel.org/linux-kselftest/CABVgOSnofuJQ_fiCL-8KdKezg3Hnqk3A+X5... But I'm attempting a thread jump... ;)
That's probably where we'll end up trying to hash out exactly what this format should be. Fortunately, I don't think any of these will require any per-test work, just changes to the KUnit implementation.
Yup, good.
- On the outer test report, there is no "plan" line (I was expecting "1..1"). Technically it's optional, but it seems like the information is available. :)
There's work underway to support this, but it's hit a few minor snags: https://lkml.org/lkml/2020/2/27/2155
Okay, cool. It's not critical, I don't think.
- The subtest should have its own "TAP version 14" line, and it should be using the diagnostic line prefix for the top-level test (this is what kselftest is doing).
Alas, TAP itself hasn't standardised subtests. Personally, I think it doesn't fundamentally matter which way we do this (I actually prefer the way we're doing it currently slightly), but converging with what kselftest does would be ideal.
I see the KTAP RFC doesn't discuss subtests at all, but kselftest actually already handles subtests. I strongly feel that line-start formatting is the correct way to deal with this, with each subtest having it's own self-contained KTAP. This allows for several important features:
the subtest, run on its own, needs no knowledge about its execution environment: it simply emits its standard KTAP output.
subtest output can be externally parsed separably, without any knowledge or parsing of the enclosing test.
I agree with both of these, but will save additional comments for the thread on my KTAP doc RFC.
For example, with my recent series[1], "make -C seccomp run_tests" produces:
TAP version 13 1..2 # selftests: seccomp: seccomp_bpf # TAP version 13 # 1..77 # # Starting 77 tests from 6 test cases. # # RUN global.mode_strict_support ... # # OK global.mode_strict_support # ok 1 global.mode_strict_support ... # ok 77 TSYNC.two_siblings_not_under_filter # # FAILED: 73 / 77 tests passed. # # Totals: pass:72 fail:4 xfail:1 xpass:0 skip:0 error:0 not ok 1 selftests: seccomp: seccomp_bpf # exit=1 # selftests: seccomp: seccomp_benchmark # not ok 2 selftests: seccomp: seccomp_benchmark # TIMEOUT
- There is no way to distinguish top-level TAP output from kernel log lines. I think we should stick with the existing marker, which is "# ", so that kernel output has no way to be interpreted as TAP details -- unless it intentionally starts adding "#"s. ;)
At the moment, we're doing this in KUnit tool by stripping anything before "TAP version 14" (e.g., the timestamp), and then only incuding lines which parse correctly (are a test plan, result, or a diagnostic line beginning with '#'). This has worked pretty well thus far, and we do have the ability to get results from debugfs instead of the kernel log, which won't have the same problems.
It's worth considering, though, particularly since our parser should handle this anyway without any changes.
For the KTAP parsing, actually think it's very important to include kernel log lines in the test output (as diagnostic lines), since they are "unexpected" (they fail to have the correct indentation) and may provide additional context for test failures when reading log files. (As in the "vmalloc: allocation failure" line in the quoted section above, to be included as a diagnostic line for test #3.)
- There is no summary line (to help humans). For example, the kselftest API produces a final pass/fail report.
Currently, we're relying on the kunit.py script to produce this, but it shouldn't be impossible to add to the kernel, particularly once the "KUnit Executor" changes mentioned above land.
Cool. Yeah, it's not required, but I think there are two use cases: humans running a single test (where is a summary is valuable, especially for long tests that scroll off the screen), and automation (where it can ignore the summary, as it will produce its own in a regularized fashion).
Taken together, I was expecting the output to be:
[ 36.611358] # TAP version 14 [ 36.611953] # 1..1 [ 36.611958] # # TAP version 14 [ 36.611954] # # 1..3 ... [ 36.622914] # # # overflow_calculation_test: s64: 21 arithmetic tests [ 36.624020] # # ok 1 - overflow_calculation_test ... [ 36.731096] # # # overflow_shift_test: ok: (s64)(0 << 63) == 0 [ 36.731840] # # ok 2 - overflow_shift_test ... [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
nodemask=(null),cpuset=/,mems_allowed=0
... [ 36.805350] # # # overflow_allocation_test: devm_kzalloc detected saturation [ 36.807763] # # ok 3 - overflow_allocation_test [ 36.807763] # # # overflow: PASS [ 36.807765] # ok 1 - overflow [ 36.807767] # # kunit: PASS
But I assume there are threads on this that I've not read... :)
These discussions all seem to be coming to a head now, so this is probably just the kick we need to prioritise this a bit more. The KTAP thread hasn't covered all of these (particularly the subtest stuff) yet, so I confess I hadn't realised the extent of the divergence between KUnit and kselftest here.
Cool. Yeah, I'd like to keep things as close as possible. In looking at this again, I wonder if perhaps it would be better to change the "indent" from "# " to " ", which might make things more readable for both dmesg and terminal output:
[ 36.611358] TAP version 14 [ 36.611953] 1..1 [ 36.611958] TAP version 14 [ 36.611954] 1..3 ... [ 36.622914] # overflow_calculation_test: s64: 21 arithmetic tests [ 36.624020] ok 1 - overflow_calculation_test ... [ 36.731096] # overflow_shift_test: ok: (s64)(0 << 63) == 0 [ 36.731840] ok 2 - overflow_shift_test ... [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 ... [ 36.805350] # overflow_allocation_test: devm_kzalloc detected saturation [ 36.807763] ok 3 - overflow_allocation_test [ 36.807763] # overflow: PASS [ 36.807765] ok 1 - overflow [ 36.807767] # kunit: PASS
As it happens, subtests are a bit rare in kselftests right now, but for kunit, the "common" output (IIUC) will fundamentally be a top-level test running all the subtests, so we should get it right. :)
Personally, as a human I find the space prefix a bit easier to read. However, I think that in "normal" kernel log output it is unusual for a line to be prefixed with a hash (#), so this might be easier to both visually distinguish and for automated parsing. So I'm torn. I don't have a strong opinion on space vs. hash prefix for indicating sub-test. I think the KUnit convention of matching whatever was the prefix of the "TAP version 14" line is clever, but it would be nice to just pick a prefix and be done with it.
IMHO. -- Tim
-Kees
[1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=selfte...
-- Kees Cook
On Mon, Jun 15, 2020 at 05:45:28PM +0000, Bird, Tim wrote:
Personally, as a human I find the space prefix a bit easier to read. However, I think that in "normal" kernel log output it is unusual for a line to be prefixed with a hash (#), so this might be easier to both visually distinguish and for automated parsing. So I'm torn. I don't have a strong opinion on space vs. hash prefix for indicating sub-test. I think the KUnit convention of matching whatever was the prefix of the "TAP version 14" line is clever, but it would be nice to just pick a prefix and be done with it.
Are there plans in kernelci for doing any parsing of subtests? (As in, what do we break if we move from "# " to " "?)
I'm really thinking " " makes sense now. :)
On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
The kernel test result format consists of 5 major elements, 4 of which are line-based:
- the output version line
- the plan line
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
- one or more test result lines (also called test result lines)
- a possible "Bail out!" line
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
optional elements:
- diagnostic data
nit: diagnostic lines (not data)
The 5th element is diagnostic information, which is used to describe items running in the test, and possibly to explain test results. A sample test result is show below:
Some other lines may be placed the test harness, and are not emitted by individual test programs:
- one or more test identification lines
- a possible results summary line
Here is an example:
TAP version 13 1..1 # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 ok 1 selftests: cpufreq: main.sh
Nit: for examples, I this should should show more than one test. (Preferably, it should show all the possible cases, ok, not ok, SKIP, etc.)
The output version line is: "TAP version 13"
The test plan is "1..1".
Element details
Output version line
The output version line is always "TAP version 13".
Although the kernel test result format has some additions to the TAP13 format, the version line reported by kselftest tests is (currently) always the exact string "TAP version 13"
This is always the first line of test output.
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
"... must be ..." ?
Test result lines
The test output consists of one or more test result lines that indicate the actual results for the test. These have the format:
<result> <number> <description> [<directive>] [<diagnostic data>]
This should be:
<result> <number> <description> [# [<directive> ][<diagnostic data>]]
The ''result'' must appear at the start of a line (except for when a test is nested, see below), and must consist of one of the following two phrases:
- ok
- not ok
If the test passed, then the result is reported as "ok". If the test failed, then the result is reported as "not ok". These must be in lower case, exactly as shown.
The ''number'' in the test result line represents the number of the test case being performed by the test program. This is often used by test harnesses as a unique identifier for each test case. The test number is a base-10 number, starting with 1. It should increase by one for each new test result line emitted. If possible the number for a test case should be kept the same over the lifetime of the test.
The ''description'' is a short description of the test case. This can be any string of words, but should avoid using colons (':')
Must also avoid "#".
except as part of a fully qualifed test case name (see below).
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
This is what TAP14 changed, I think (i.e. directive follows description now).
A test may be skipped for a variety of reasons, ranging for insufficient privileges to missing features or resources required to execute that test case.
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Diagnostic data
Diagnostic data is text that reports on test conditions or test operations, or that explains test results. In the kernel test result format, diagnostic data is placed on lines that start with a hash sign, followed by a space ('# ').
One special format of diagnostic data is a test identification line, that has the fully qualified name of a test case. Such a test identification line marks the start of test output for a test case.
In the example above, there are three lines that start with '#' which precede the test result line: # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 These are used to indicate diagnostic data for the test case 'selftests: cpufreq: main.sh'
Material in comments between the identification line and the test result line are diagnostic data that can help to interpret the results of the test.
The TAP specification indicates that automated test harnesses may ignore any line that is not one of the mandatory prescribed lines (that is, the output format version line, the plan line, a test result line, or a "Bail out!" line.)
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Yes, totally.
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
may be interspersed with messages from the test itself
- reason: try to keep things line-based, since output from other things
Agreed: the yaml extension is not sensible for our use.
- TODO directive
Agreed: SKIP should cover everything TODO does.
KTAP Extensions beyond TAP13:
- nesting
(I would call this 'subtests')
- via indentation
- indentation makes it easier for humans to read
And allows for separable parsing of subtests.
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
- summary lines
- can be skipped by CI systems that do their own calculations
Right -- I think per-test summary line should be included for the humans reading a single test (which may scroll off the screen).
Other notes:
- automatic assignment of result status based on exit code
This is, I think, a matter for the kselftest running infrastructure, not the KTAP output?
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
Right.
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
Right.
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Right.
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#" diagnostic lines and the subtest handling.
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
Yes.
- is it too long or too short?
Should be slightly longer: more examples.
- if the document is desired, where should it be placed? I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
Yes Documentation/*.rst Not sure on name yet, but where do kselftest docs live? :)
- is this document accurate? I think KUNIT does a few things differently than this description.
Let's fix it. :)
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Yes please.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
Kees,
Thanks for the great feedback. See comments inline below.
-----Original Message----- From: Kees Cook keescook@chromium.org
On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
The kernel test result format consists of 5 major elements, 4 of which are line-based:
- the output version line
- the plan line
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
Noted. In re-reading my doc, I've conflated my sections. The first section is "single-line", and the next section is "optional". ??? I'll fix that.
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
- one or more test result lines (also called test result lines)
- a possible "Bail out!" line
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
optional elements:
- diagnostic data
nit: diagnostic lines (not data)
OK.
The 5th element is diagnostic information, which is used to describe items running in the test, and possibly to explain test results. A sample test result is show below:
Some other lines may be placed the test harness, and are not emitted by individual test programs:
- one or more test identification lines
- a possible results summary line
Here is an example:
TAP version 13 1..1 # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 ok 1 selftests: cpufreq: main.sh
Nit: for examples, I this should should show more than one test. (Preferably, it should show all the possible cases, ok, not ok, SKIP, etc.)
Agree. I will expand this.
The output version line is: "TAP version 13"
The test plan is "1..1".
Element details
Output version line
The output version line is always "TAP version 13".
Although the kernel test result format has some additions to the TAP13 format, the version line reported by kselftest tests is (currently) always the exact string "TAP version 13"
This is always the first line of test output.
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
"... must be ..." ?
Test result lines
The test output consists of one or more test result lines that indicate the actual results for the test. These have the format:
<result> <number> <description> [<directive>] [<diagnostic data>]
This should be:
<result> <number> <description> [# [<directive> ][<diagnostic data>]]
The ''result'' must appear at the start of a line (except for when a test is nested, see below), and must consist of one of the following two phrases:
- ok
- not ok
If the test passed, then the result is reported as "ok". If the test failed, then the result is reported as "not ok". These must be in lower case, exactly as shown.
The ''number'' in the test result line represents the number of the test case being performed by the test program. This is often used by test harnesses as a unique identifier for each test case. The test number is a base-10 number, starting with 1. It should increase by one for each new test result line emitted. If possible the number for a test case should be kept the same over the lifetime of the test.
The ''description'' is a short description of the test case. This can be any string of words, but should avoid using colons (':')
Must also avoid "#".
ok.
except as part of a fully qualifed test case name (see below).
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
This is what TAP14 changed, I think (i.e. directive follows description now).
A test may be skipped for a variety of reasons, ranging for insufficient privileges to missing features or resources required to execute that test case.
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Diagnostic data
Diagnostic data is text that reports on test conditions or test operations, or that explains test results. In the kernel test result format, diagnostic data is placed on lines that start with a hash sign, followed by a space ('# ').
One special format of diagnostic data is a test identification line, that has the fully qualified name of a test case. Such a test identification line marks the start of test output for a test case.
In the example above, there are three lines that start with '#' which precede the test result line: # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 These are used to indicate diagnostic data for the test case 'selftests: cpufreq: main.sh'
Material in comments between the identification line and the test result line are diagnostic data that can help to interpret the results of the test.
The TAP specification indicates that automated test harnesses may ignore any line that is not one of the mandatory prescribed lines (that is, the output format version line, the plan line, a test result line, or a "Bail out!" line.)
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Yes, totally.
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
may be interspersed with messages from the test itself
- reason: try to keep things line-based, since output from other things
Agreed: the yaml extension is not sensible for our use.
- TODO directive
Agreed: SKIP should cover everything TODO does.
KTAP Extensions beyond TAP13:
- nesting
(I would call this 'subtests')
Sounds good. Will do.
- via indentation
- indentation makes it easier for humans to read
And allows for separable parsing of subtests.
Agree. I'll try to work that into the doc.
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
Yes. KernelCI and Fuego have the notions of a testcase namespace hierarchy. As the number of tests expands it is helpful to have the name-space for a sub-test be limited, just like a filesystem hierarchy provides scope for the names of objects (directories and files) that it contains.
- summary lines
- can be skipped by CI systems that do their own calculations
Right -- I think per-test summary line should be included for the humans reading a single test (which may scroll off the screen).
Other notes:
- automatic assignment of result status based on exit code
This is, I think, a matter for the kselftest running infrastructure, not the KTAP output?
Agreed. This doesn't have anything to do with the API between the tests and the results processor. I'll take it out.
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
Right.
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
Right.
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Right.
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#" diagnostic lines and the subtest handling.
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
Yes.
Great. I'll make this a priority to work on.
- is it too long or too short?
Should be slightly longer: more examples.
- if the document is desired, where should it be placed? I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
Yes Documentation/*.rst Not sure on name yet, but where do kselftest docs live? :)
Documentation/dev-tools/kselftest.rst
I'll put this at: Documentation/dev-tools/test-results-format.rst
- is this document accurate? I think KUNIT does a few things differently than this description.
Let's fix it. :)
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Yes please.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think making the plan mandatory is a good idea. "Late plans" work very well for cases where you cannot know in advance the number of tests (for example in filters that produce TAP from other output), and provide an additional safety net.
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
+1.
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I think no mandatory dash is better (or even mandatory no-dash!). We can suggest removing it when formatting TAP output.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
How so? The description comes first, but there can be a description of the directive.
not ok 4 - Summarized correctly # TODO Not written yet
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
True. However, "Bail out!" allow to distinguish issues with the harness (such as ENOSPC) from test aborts.
- TODO directive
Agreed: SKIP should cover everything TODO does.
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
It's important to notice in the spec that the TODO directive inverts the direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by "not ok # TODO").
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
What about "/" instead?
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
Paolo
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
-----Original Message----- From: Paolo Bonzini pbonzini@redhat.com
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
By TAP14, I was referring to the current, undocumented, KUnit conventions.
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think making the plan mandatory is a good idea. "Late plans" work very well for cases where you cannot know in advance the number of tests (for example in filters that produce TAP from other output), and provide an additional safety net.
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
+1.
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I think no mandatory dash is better (or even mandatory no-dash!). We can suggest removing it when formatting TAP output.
My personal preference is to have the dash. I think it's more human readable. I note that the TAP spec has examples of result lines both with and without the dash, so even the spec is ambiguous on this. I think not mandating it either way is probably best. For regex parsers, it's easy to ignore with '[-]?' outside the pattern groups that grab the number and description.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
How so? The description comes first, but there can be a description of the directive.
None of the examples of skips in the TAP13 spec have a test descriptions before the '# SKIP' directive. But maybe I read too much into the examples. There is a format example, and a list of items in a result line that both have the test description before the directive. So maybe I read this wrong.
not ok 4 - Summarized correctly # TODO Not written yet
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
True. However, "Bail out!" allow to distinguish issues with the harness (such as ENOSPC) from test aborts.
- TODO directive
Agreed: SKIP should cover everything TODO does.
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
It's important to notice in the spec that the TODO directive inverts the direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by "not ok # TODO").
The TAP13 spec is not explicit about the result for TODO (and only provides one example), but the text *does* say a TODO can represent a bug to be fixed. This makes it the equivalent of XFAIL. I hadn't noticed this before. Thanks.
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
What about "/" instead?
In my experience, / is used in a lot of test descriptions (when quoting file paths) (not in kselftest, but in lots of other tests). Both Fuego and KernelCI use colons, and that's what kselftest already uses, so it seems like a good choice.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
Paolo
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim Tim.Bird@sony.com wrote:
Apologies for taking so long to get to this. I have been busy with some stuff internally at Google.
-----Original Message----- From: Paolo Bonzini pbonzini@redhat.com
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
By TAP14, I was referring to the current, undocumented, KUnit conventions.
Not so. TAP14 is the proposed next version of TAP13:
https://github.com/TestAnything/testanything.github.io/pull/36 https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-s...
Based on the discussion, it seems like most of the things we wanted from TAP14 would probably make it in if TAP ever accepts another pull request.
Our only real extension is how we print out a violated exception/assertion.
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think making the plan mandatory is a good idea. "Late plans" work very well for cases where you cannot know in advance the number of tests (for example in filters that produce TAP from other output), and provide an additional safety net.
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
+1.
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I think no mandatory dash is better (or even mandatory no-dash!). We can suggest removing it when formatting TAP output.
My personal preference is to have the dash. I think it's more human readable. I note that the TAP spec has examples of result lines both with and without the dash, so even the spec is ambiguous on this. I think not mandating it either way is probably best. For regex parsers, it's easy to ignore with '[-]?' outside the pattern groups that grab the number and description.
I don't think we care, because we don't use it.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
How so? The description comes first, but there can be a description of the directive.
None of the examples of skips in the TAP13 spec have a test descriptions before the '# SKIP' directive. But maybe I read too much into the examples. There is a format example, and a list of items in a result line that both have the test description before the directive. So maybe I read this wrong.
not ok 4 - Summarized correctly # TODO Not written yet
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
True. However, "Bail out!" allow to distinguish issues with the harness (such as ENOSPC) from test aborts.
- TODO directive
Agreed: SKIP should cover everything TODO does.
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
It's important to notice in the spec that the TODO directive inverts the direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by "not ok # TODO").
The TAP13 spec is not explicit about the result for TODO (and only provides one example), but the text *does* say a TODO can represent a bug to be fixed. This makes it the equivalent of XFAIL. I hadn't noticed this before. Thanks.
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
What about "/" instead?
In my experience, / is used in a lot of test descriptions (when quoting file paths) (not in kselftest, but in lots of other tests). Both Fuego and KernelCI use colons, and that's what kselftest already uses, so it seems like a good choice.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
Paolo
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
-----Original Message----- From: Brendan Higgins
On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim Tim.Bird@sony.com wrote:
Apologies for taking so long to get to this. I have been busy with some stuff internally at Google.
-----Original Message----- From: Paolo Bonzini pbonzini@redhat.com
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
By TAP14, I was referring to the current, undocumented, KUnit conventions.
Not so. TAP14 is the proposed next version of TAP13:
https://github.com/TestAnything/testanything.github.io/pull/36 https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-s...
Thanks! I thought there was a draft spec somewhere that hadn't been pulled or approved or whatever. I searched but couldn't find it. Thanks for the links.
Based on the discussion, it seems like most of the things we wanted from TAP14 would probably make it in if TAP ever accepts another pull request.
Our only real extension is how we print out a violated exception/assertion.
I guess I need to examine the TAP14 spec and see what it's got in it. I didn't realize it had subtest nesting (it's been literally years since I read it, and I didn't realize that KUnit was based on it).
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think making the plan mandatory is a good idea. "Late plans" work very well for cases where you cannot know in advance the number of tests (for example in filters that produce TAP from other output), and provide an additional safety net.
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
+1.
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I think no mandatory dash is better (or even mandatory no-dash!). We can suggest removing it when formatting TAP output.
My personal preference is to have the dash. I think it's more human readable. I note that the TAP spec has examples of result lines both with and without the dash, so even the spec is ambiguous on this. I think not mandating it either way is probably best. For regex parsers, it's easy to ignore with '[-]?' outside the pattern groups that grab the number and description.
I don't think we care, because we don't use it.
OK. If we decide to support both, it would be good to document that.
Thanks, -- Tim
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
How so? The description comes first, but there can be a description of the directive.
None of the examples of skips in the TAP13 spec have a test descriptions before the '# SKIP' directive. But maybe I read too much into the examples. There is a format example, and a list of items in a result line that both have the test description before the directive. So maybe I read this wrong.
not ok 4 - Summarized correctly # TODO Not written yet
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
True. However, "Bail out!" allow to distinguish issues with the harness (such as ENOSPC) from test aborts.
- TODO directive
Agreed: SKIP should cover everything TODO does.
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
It's important to notice in the spec that the TODO directive inverts the direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by "not ok # TODO").
The TAP13 spec is not explicit about the result for TODO (and only provides one example), but the text *does* say a TODO can represent a bug to be fixed. This makes it the equivalent of XFAIL. I hadn't noticed this before. Thanks.
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
What about "/" instead?
In my experience, / is used in a lot of test descriptions (when quoting file paths) (not in kselftest, but in lots of other tests). Both Fuego and KernelCI use colons, and that's what kselftest already uses, so it seems like a good choice.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
Paolo
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
On Tue, Jun 16, 2020 at 12:44:28PM -0700, Brendan Higgins wrote:
On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim Tim.Bird@sony.com wrote:
From: Paolo Bonzini pbonzini@redhat.com
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
By TAP14, I was referring to the current, undocumented, KUnit conventions.
Not so. TAP14 is the proposed next version of TAP13:
https://github.com/TestAnything/testanything.github.io/pull/36 https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-s...
I was reading this (I haven't compared to the blob above):
https://github.com/TestAnything/Specification/blob/tap-14-specification/spec...
Based on the discussion, it seems like most of the things we wanted from TAP14 would probably make it in if TAP ever accepts another pull request.
Were our leading diagnostic lines part of their latest spec? I thought we were pretty far off in left field for that particular bit.
My personal preference is to have the dash. I think it's more human readable. I note that the TAP spec has examples of result lines both with and without the dash, so even the spec is ambiguous on this. I think not mandating it either way is probably best. For regex parsers, it's easy to ignore with '[-]?' outside the pattern groups that grab the number and description.
I don't think we care, because we don't use it.
Yeah, I'm in the same place. I don't care -- I would just like a determination. (The "implied" nature of it in TAP14 bothers me.)
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
I proposed fixing that recently[1]. seccomp uses XFAIL for "I have detected you lack the config to test this, so I can't say it's working or not, because it only looks like a failure without the config."
[1] https://lore.kernel.org/lkml/20200611224028.3275174-7-keescook@chromium.org/
On 2020-06-16 18:58, Kees Cook wrote:
On Tue, Jun 16, 2020 at 12:44:28PM -0700, Brendan Higgins wrote:
On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim Tim.Bird@sony.com wrote:
From: Paolo Bonzini pbonzini@redhat.com
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
By TAP14, I was referring to the current, undocumented, KUnit conventions.
Not so. TAP14 is the proposed next version of TAP13:
https://github.com/TestAnything/testanything.github.io/pull/36 https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-s...
I was reading this (I haven't compared to the blob above):
https://github.com/TestAnything/Specification/blob/tap-14-specification/spec...
Based on the discussion, it seems like most of the things we wanted from TAP14 would probably make it in if TAP ever accepts another pull request.
Were our leading diagnostic lines part of their latest spec? I thought we were pretty far off in left field for that particular bit.
My personal preference is to have the dash. I think it's more human readable. I note that the TAP spec has examples of result lines both with and without the dash, so even the spec is ambiguous on this. I think not mandating it either way is probably best. For regex parsers, it's easy to ignore with '[-]?' outside the pattern groups that grab the number and description.
I don't think we care, because we don't use it.
Yeah, I'm in the same place. I don't care -- I would just like a determination. (The "implied" nature of it in TAP14 bothers me.)
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
I proposed fixing that recently[1]. seccomp uses XFAIL for "I have detected you lack the config to test this, so I can't say it's working or not, because it only looks like a failure without the config."
Based on that description, the case sounds like it should be a skip.
Or if the entire test depends on the missing config then Bail out might be appropriate.
[1] https://lore.kernel.org/lkml/20200611224028.3275174-7-keescook@chromium.org/
On Fri, Jun 19, 2020 at 01:47:29PM -0500, Frank Rowand wrote:
On 2020-06-16 18:58, Kees Cook wrote:
I proposed fixing that recently[1]. seccomp uses XFAIL for "I have detected you lack the config to test this, so I can't say it's working or not, because it only looks like a failure without the config."
Based on that description, the case sounds like it should be a skip.
hrm hrm. Yeah. Thinking more about this, I agree. I think it came about this way because the kselftest_harness.h API (not TAP output) is different from the kselftest.h API (TAP output), and so the tests were written with what was available in kselftest_harness.h which has no "SKIP" idea.
The series linked was mostly built to bring kselftest_harness.h into the TAP output universe, so the XFAIL -> SKIP mapping needs to be included as well.
Or if the entire test depends on the missing config then Bail out might be appropriate.
[1] https://lore.kernel.org/lkml/20200611224028.3275174-7-keescook@chromium.org/
I will rework this series.
On 19/06/20 20:47, Frank Rowand wrote:
Or if the entire test depends on the missing config then Bail out might be appropriate.
No, in that case you want
1..0 # SKIP: unsupported configuration
The spec is not clear if "Bail out!" is an error condition or just a warning that only part of the test was run, but prove(1) and Automake both treat it as the former, for example.
For example, an ENOSPC error creating a temporary file could be turned into a bail-out, while an ENOSYS would be a skip.
Paolo
On 2020-06-19 17:58, Paolo Bonzini wrote:
On 19/06/20 20:47, Frank Rowand wrote:
Or if the entire test depends on the missing config then Bail out might be appropriate.
No, in that case you want
1..0 # SKIP: unsupported configuration
The spec is not clear if "Bail out!" is an error condition or just a warning that only part of the test was run, but prove(1) and Automake both treat it as the former, for example.
For example, an ENOSPC error creating a temporary file could be turned into a bail-out, while an ENOSYS would be a skip.
Paolo
I think that "Bail out!" needs to be more clearly defined by the spec, and we should come up with an intent of what it is intended to mean.
The TAP 13 spec gives a "Bail out!" example of "MySQL is not running." The spec gives this example after the general guidance of (among other things) "missing dependencies". A missing config _could_ be thought of as a missing dependency.
To me, the key differentiator for "Bail out!" is "that further tests are useless". In other words, the detected problem means that every subsequent part of the tests will fail for the same reason.
If _some_ subsequent parts of the tests will NOT fail for the same reason then SKIP becomes more reasonable. What percent of subsequent parts of the tests is large enough to comprise "_some_" is probably a value judgement.
For the case that I was commenting on, with the limited amount of description provided, my preference is also SKIP, which is what I was suggesting.
On 2020-06-16 11:42, Bird, Tim wrote:
-----Original Message----- From: Paolo Bonzini pbonzini@redhat.com
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
By TAP14, I was referring to the current, undocumented, KUnit conventions.
When "TAP14" is used in this discussion, let's please use the current proposed TAP14 spec which Brendan has provided a link to. If you want to describe current KUnit conventions, please say current KUnit conventions.
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think making the plan mandatory is a good idea. "Late plans" work very well for cases where you cannot know in advance the number of tests (for example in filters that produce TAP from other output), and provide an additional safety net.
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
+1.
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I think no mandatory dash is better (or even mandatory no-dash!). We can suggest removing it when formatting TAP output.
My personal preference is to have the dash. I think it's more human readable. I note that the TAP spec has examples of result lines both with and without the dash, so even the spec is ambiguous on this. I think not mandating it either way is probably best. For regex parsers, it's easy to ignore with '[-]?' outside the pattern groups that grab the number and description.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
How so? The description comes first, but there can be a description of the directive.
None of the examples of skips in the TAP13 spec have a test descriptions before the '# SKIP' directive. But maybe I read too much into the examples. There is a format example, and a list of items in a result line that both have the test description before the directive. So maybe I read this wrong.
Yes, I think you read too much into the examples. I think the TAP spec is very hard to read in the current form (v13 and proposed v14). If we create a KTAP spec, I can do editing to clean up some of the issues or give review comments on what issues about clarity that I see and how to fix them.
I read the spec as saying that the description is optional, but if the description exists in a test line that contains "# TODO explanation" then the description will precede the "#".
(Yes, you are seeing "I volunteer" because the current spec is so frustrating to me.)
not ok 4 - Summarized correctly # TODO Not written yet
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
True. However, "Bail out!" allow to distinguish issues with the harness (such as ENOSPC) from test aborts.
- TODO directive
Agreed: SKIP should cover everything TODO does.
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
It's important to notice in the spec that the TODO directive inverts the direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by "not ok # TODO").
The TAP13 spec is not explicit about the result for TODO (and only provides one example), but the text *does* say a TODO can represent a bug to be fixed. This makes it the equivalent of XFAIL. I hadn't noticed this before. Thanks.
TAP 13 spec:
"Note that if the TODO has an explanation it must be separated from TODO by a space. These tests represent a feature to be implemented or a bug to be fixed and act as something of an executable “things to do” list. They are not expected to succeed. Should a todo test point begin succeeding, the harness should report it as a bonus. This indicates that whatever you were supposed to do has been done and you should promote this to a normal test point."
That seems very clear and explicit to me about what the intent and usage of TODO is.
- test identifier
- multiple parts, separated by ':'
I don't find "identifier" when I grep for it in KUnit related places. And it has been a while since I've read through the KUnit code. So I am a little lost about what a "test identifier" is. Is this part of the test line "Description"? Any pointers to what this is?
-Frank
This is interesting... is the goal to be able to report test status over time by name?
What about "/" instead?
In my experience, / is used in a lot of test descriptions (when quoting file paths) (not in kselftest, but in lots of other tests). Both Fuego and KernelCI use colons, and that's what kselftest already uses, so it seems like a good choice.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
Paolo
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
On 2020-06-16 07:08, Paolo Bonzini wrote:
On 15/06/20 21:07, Bird, Tim wrote:
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
As an aside, where is TAP14?
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think making the plan mandatory is a good idea. "Late plans" work very well for cases where you cannot know in advance the number of tests (for example in filters that produce TAP from other output), and provide an additional safety net.
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
+1.
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I think no mandatory dash is better (or even mandatory no-dash!). We can suggest removing it when formatting TAP output.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
How so? The description comes first, but there can be a description of the directive.
not ok 4 - Summarized correctly # TODO Not written yet
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
True. However, "Bail out!" allow to distinguish issues with the harness (such as ENOSPC) from test aborts.
- TODO directive
Agreed: SKIP should cover everything TODO does.
XFAIL/XPASS are different from SKIP. I personally don't have a need for them, but kselftests includes XFAIL/XPASS exit codes and they aren't reflected into selftests/kselftest/runner.sh.
Likewise, kselftest.h has ksft_inc_xfail_cnt but not ksft_test_result_xfail/ksft_test_result_xpass.
It has ksft_exit_xfail() and ksft_exit_xpass(). But at 5.8-rc1 I do not see any callers of those two functions.
It's important to notice in the spec that the TODO directive inverts the direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by "not ok # TODO").
For a slightly more verbose explanation, the mindset behind pytest "xfail" and TAP "not ok # TODO explanation" is different, though they both provide essentially the same information.
pytest "xfail" (if I understand correctly) means the test is expected to fail, and it does indeed fail.
pytest "xpass" means the test is expected to fail, but it does not fail.
TAP "not ok # TODO explanation" means the test is expected to fail, and it does indeed fail.
TAP "ok # TODO explanation" means the test is expected to fail, but it does not fail.
It does not seem to be a good idea to have two different ways of reporting the same thing.
The TAP method seems cleaner to me. It is consistently reported whether or not the test passed, no matter what the expectation of pass/fail is. "# TODO ..." is a way for the code maintainer to convey information to the test community that the test is expected to fail due to a known bug. If the output of the test changes to "ok # TODO explanation", then the tester can report that as an issue to the code maintainer, and the code maintainer can then determine whether the bug got fixed (and thus the "# TODO explanation" should be removed from the test code).
Can we just remove ksft_exit_xfail() and ksft_exit_xpass() since no one is using it, and there is an equivalent way to express the same information in TAP?
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
What about "/" instead?
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
No, SKIP should not be considered a success. It should also not be considered a failure. Please do not blur the lines between success, failure, and skipped.
-Frank
Paolo
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
.
On Sat, Jun 20, 2020 at 1:58 AM Frank Rowand frowand.list@gmail.com wrote:
On 2020-06-16 07:08, Paolo Bonzini wrote:
On 15/06/20 21:07, Bird, Tim wrote:
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
No, SKIP should not be considered a success. It should also not be considered a failure. Please do not blur the lines between success, failure, and skipped.
I agree that skipped tests should be their own thing, separate from success and failure, but the way they tend to behave tends to be closer to a success than a failure.
I guess the important note here is that a suite of tests, some of which are SKIPped, can be listed as having passed, so long as none of them failed. So, the rule for "bubbling up" test results is that any failures cause the parent to fail, the parent is marked as skipped if _all_ subtests are skipped, and otherwise is marked as having succeeded. (Reversing the last part: having a suite be marked as skipped if _any_ of the subtests are skipped also makes sense, and has its advantages, but anecdotally seems less common in other systems.)
The other really brave thing one could do to break from the TAP specification would be to add a "skipped" value alongside "ok" and "not ok", and get rid of the whole "SKIP" directive/comment stuff. Possibly not worth the departure from the spec, but it would sidestep part of the problem.
Cheers, -- David
On 2020-06-20 01:44, David Gow wrote:
On Sat, Jun 20, 2020 at 1:58 AM Frank Rowand frowand.list@gmail.com wrote:
On 2020-06-16 07:08, Paolo Bonzini wrote:
On 15/06/20 21:07, Bird, Tim wrote:
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
No, SKIP should not be considered a success. It should also not be considered a failure. Please do not blur the lines between success, failure, and skipped.
I agree that skipped tests should be their own thing, separate from success and failure, but the way they tend to behave tends to be closer to a success than a failure.
I guess the important note here is that a suite of tests, some of which are SKIPped, can be listed as having passed, so long as none of them failed. So, the rule for "bubbling up" test results is that any failures cause the parent to fail, the parent is marked as skipped if _all_ subtests are skipped, and otherwise is marked as having succeeded. (Reversing the last part: having a suite be marked as skipped if _any_ of the subtests are skipped also makes sense, and has its advantages, but anecdotally seems less common in other systems.)
That really caught my attention as something to be captured in the spec.
My initial response was that bubbling up results is the domain of the test analysis tools, not the test code.
If I were writing a test analysis tool, I would want the user to have the ability to configure the bubble up rules. Different use cases would desire different rules.
My second response was to start thinking about whether the tests themselves should have any sort of bubble up implemented. I think it is a very interesting question. My current mindset is that each test is independent, and their is not a concept of an umbrella test that is the union of a set of subtests. But maybe there is value to umbrella tests. If there is a concept of umbrella tests then I think the spec should define how skip bubbles up.
The other really brave thing one could do to break from the TAP specification would be to add a "skipped" value alongside "ok" and "not ok", and get rid of the whole "SKIP" directive/comment stuff. Possibly not worth the departure from the spec, but it would sidestep part of the problem.
I like being brave in this case. Elevating SKIP to be a peer of "ok" and "not ok" provides a more clear model that SKIP is a first class citizen. It also removes the muddled thinking that the current model promotes.
Cheers, -- David
On Sat, Jun 20, 2020 at 11:03 PM Frank Rowand frowand.list@gmail.com wrote:
On 2020-06-20 01:44, David Gow wrote:
On Sat, Jun 20, 2020 at 1:58 AM Frank Rowand frowand.list@gmail.com wrote:
On 2020-06-16 07:08, Paolo Bonzini wrote:
On 15/06/20 21:07, Bird, Tim wrote:
> Finally, > - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)? > See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
See above for XFAIL.
I initially raised the issue with "SKIP" because I have a lot of tests that depend on hardware availability---for example, a test that does not run on some processor kinds (e.g. on AMD, or old Intel)---and for those SKIP should be considered a success.
No, SKIP should not be considered a success. It should also not be considered a failure. Please do not blur the lines between success, failure, and skipped.
I agree that skipped tests should be their own thing, separate from success and failure, but the way they tend to behave tends to be closer to a success than a failure.
I guess the important note here is that a suite of tests, some of which are SKIPped, can be listed as having passed, so long as none of them failed. So, the rule for "bubbling up" test results is that any failures cause the parent to fail, the parent is marked as skipped if _all_ subtests are skipped, and otherwise is marked as having succeeded. (Reversing the last part: having a suite be marked as skipped if _any_ of the subtests are skipped also makes sense, and has its advantages, but anecdotally seems less common in other systems.)
That really caught my attention as something to be captured in the spec.
My initial response was that bubbling up results is the domain of the test analysis tools, not the test code.
KUnit is actually sitting in the middle. Results are bubbled up from individual tests to the test suites in-kernel (by the common KUnit code), as the suites are TAP tests (individual test cases being subtests), and so need to provide results. The kunit.py script then bubbles those results up (using the same rules) to print a summary.
If I were writing a test analysis tool, I would want the user to have the ability to configure the bubble up rules. Different use cases would desire different rules.
I tend to agree: it'd be nice if test analysis tools could implement different rules here. If we're using TAP subtests, though, the parent tests do need to return a result in the test code, so either that needs to be test-specific (if the parent test is not just a simple union of its subtests), or it could be ignored by an analysis tool which would follow its own rules. (In either case, it may make sense to be able to configure a test analysis tool to always fail or mark tests with failed or skipped subtests, even if its result is "ok", but not vice-versa -- a test which failed would stay failed, even if all its subtests passed.)
My second response was to start thinking about whether the tests themselves should have any sort of bubble up implemented. I think it is a very interesting question. My current mindset is that each test is independent, and their is not a concept of an umbrella test that is the union of a set of subtests. But maybe there is value to umbrella tests. If there is a concept of umbrella tests then I think the spec should define how skip bubbles up.
KUnit suites are definitely that kind of "umbrella test" at the moment.
The other really brave thing one could do to break from the TAP specification would be to add a "skipped" value alongside "ok" and "not ok", and get rid of the whole "SKIP" directive/comment stuff. Possibly not worth the departure from the spec, but it would sidestep part of the problem.
I like being brave in this case. Elevating SKIP to be a peer of "ok" and "not ok" provides a more clear model that SKIP is a first class citizen. It also removes the muddled thinking that the current model promotes.
Cheers, -- David
On Mon, Jun 15, 2020 at 07:07:34PM +0000, Bird, Tim wrote:
From: Kees Cook keescook@chromium.org
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
[...] With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think we should require one of:
- starting plan line - ending plan line - ending with something that indicates "I'm done, but I have no idea how many tests actually ran" (Maybe "1..?")
To me, the point of the plan line is to be able to say "this test did, in fact, finish". So even if some test can't even count how many tests it _ran_, it can at least say "I am now finished".
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I find the dash to be distracting -- it doesn't help me scan it, and it doesn't help a parser (which only needs to find "#").
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
I sent a bunch of clean-ups for kselftest.h recently[1], but it looks like we'll need more for adding "description" to skip (right now it only prints the SKIP reason).
[1] https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/
Yes Documentation/*.rst Not sure on name yet, but where do kselftest docs live? :)
Documentation/dev-tools/kselftest.rst
I'll put this at: Documentation/dev-tools/test-results-format.rst
Sounds good!
On 2020-06-16 18:52, Kees Cook wrote:
On Mon, Jun 15, 2020 at 07:07:34PM +0000, Bird, Tim wrote:
From: Kees Cook keescook@chromium.org
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
[...] With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think we should require one of:
- starting plan line
- ending plan line
- ending with something that indicates "I'm done, but I have no idea how many tests actually ran" (Maybe "1..?")
I understand the desire to be able to say "I don't know how many tests actually ran", but the use of it should be discouraged, even if available.
-Frank
To me, the point of the plan line is to be able to say "this test did, in fact, finish". So even if some test can't even count how many tests it _ran_, it can at least say "I am now finished".
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I find the dash to be distracting -- it doesn't help me scan it, and it doesn't help a parser (which only needs to find "#").
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
I sent a bunch of clean-ups for kselftest.h recently[1], but it looks like we'll need more for adding "description" to skip (right now it only prints the SKIP reason).
[1] https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/
Yes Documentation/*.rst Not sure on name yet, but where do kselftest docs live? :)
Documentation/dev-tools/kselftest.rst
I'll put this at: Documentation/dev-tools/test-results-format.rst
Sounds good!
On Tue, Jun 16, 2020 at 4:52 PM Kees Cook keescook@chromium.org wrote:
On Mon, Jun 15, 2020 at 07:07:34PM +0000, Bird, Tim wrote:
From: Kees Cook keescook@chromium.org
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
[...] With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
I think we should require one of:
- starting plan line
- ending plan line
- ending with something that indicates "I'm done, but I have no idea how many tests actually ran" (Maybe "1..?")
To me, the point of the plan line is to be able to say "this test did, in fact, finish". So even if some test can't even count how many tests it _ran_, it can at least say "I am now finished".
So the counting is actually not the hard part for us, it's figuring out when we have finished. Again, the change that I am working on (I REALLY need to get that out) should fix that, but until we get that upstream, KUnit doesn't actually know when it is done running tests.
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required. I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
I find the dash to be distracting -- it doesn't help me scan it, and it doesn't help a parser (which only needs to find "#").
Yeah, I also prefer spaces and/or "#". I am okay if spaces are optional only to aid human readability. And honestly I don't care about this point too much. Just offering my 2 cents.
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
I sent a bunch of clean-ups for kselftest.h recently[1], but it looks like we'll need more for adding "description" to skip (right now it only prints the SKIP reason).
[1] https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/
Yes Documentation/*.rst Not sure on name yet, but where do kselftest docs live? :)
Documentation/dev-tools/kselftest.rst
I'll put this at: Documentation/dev-tools/test-results-format.rst
Sounds good!
-- Kees Cook
On 2020-06-15 14:07, Bird, Tim wrote:
Kees,
Thanks for the great feedback. See comments inline below.
-----Original Message----- From: Kees Cook keescook@chromium.org
On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
The kernel test result format consists of 5 major elements, 4 of which are line-based:
- the output version line
- the plan line
Note: making the plan line required differs from TAP13 and TAP14. I think it's the right choice, but we should be clear.
Noted. In re-reading my doc, I've conflated my sections. The first section is "single-line", and the next section is "optional". ??? I'll fix that.
With regards to making it optional or not, I don't have a strong preference. The extra info seems helpful in some circumstances. I don't know if it's too onerous to make it a requirement or not. I'd prefer if it was always there (either at the beginning or the end), but if there is some situation where it's quite difficult to calculate, then it would be best not to mandate it. I can't think of any impossible situations at the moment.
- one or more test result lines (also called test result lines)
- a possible "Bail out!" line
"Bail out!" to be moved to "optional" elements, since it may not appear. And we should clarify TAP13 and TAP14's language to say it should only appear when the test is aborting without running later tests -- for this reason, I think the optional "description" following "Bail out!" should be made required. I.e. it must be: "Bail out! $reason"
I'll make sure this is listed as optional. I like adding a mandatory reason.
optional elements:
- diagnostic data
nit: diagnostic lines (not data)
OK.
The 5th element is diagnostic information, which is used to describe items running in the test, and possibly to explain test results. A sample test result is show below:
Some other lines may be placed the test harness, and are not emitted by individual test programs:
- one or more test identification lines
- a possible results summary line
Here is an example:
TAP version 13 1..1 # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 ok 1 selftests: cpufreq: main.sh
Nit: for examples, I this should should show more than one test. (Preferably, it should show all the possible cases, ok, not ok, SKIP, etc.)
Agree. I will expand this.
The output version line is: "TAP version 13"
The test plan is "1..1".
Element details
Output version line
The output version line is always "TAP version 13".
Although the kernel test result format has some additions to the TAP13 format, the version line reported by kselftest tests is (currently) always the exact string "TAP version 13"
This is always the first line of test output.
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
"... must be ..." ?
Test result lines
The test output consists of one or more test result lines that indicate the actual results for the test. These have the format:
<result> <number> <description> [<directive>] [<diagnostic data>]
This should be:
<result> <number> <description> [# [<directive> ][<diagnostic data>]]
The ''result'' must appear at the start of a line (except for when a test is nested, see below), and must consist of one of the following two phrases:
- ok
- not ok
If the test passed, then the result is reported as "ok". If the test failed, then the result is reported as "not ok". These must be in lower case, exactly as shown.
The ''number'' in the test result line represents the number of the test case being performed by the test program. This is often used by test harnesses as a unique identifier for each test case. The test number is a base-10 number, starting with 1. It should increase by one for each new test result line emitted. If possible the number for a test case should be kept the same over the lifetime of the test.
The ''description'' is a short description of the test case. This can be any string of words, but should avoid using colons (':')
Must also avoid "#".
ok.
except as part of a fully qualifed test case name (see below).
TAP13/14 makes description optional, are we making it required (I think we should). There seems to be a TAP13/14 "convention" of starting <description> with "- ", which I'm on the fence about it. It does make parsing maybe a little easier.
I would like the description to be required.
Agree, description should be required.
-Frank
I don't have a strong opinion on the dash. I'm OK with either one (dash or no dash), but we should make kselftest and KUnit consistent.
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
This is what TAP14 changed, I think (i.e. directive follows description now).
A test may be skipped for a variety of reasons, ranging for insufficient privileges to missing features or resources required to execute that test case.
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Diagnostic data
Diagnostic data is text that reports on test conditions or test operations, or that explains test results. In the kernel test result format, diagnostic data is placed on lines that start with a hash sign, followed by a space ('# ').
One special format of diagnostic data is a test identification line, that has the fully qualified name of a test case. Such a test identification line marks the start of test output for a test case.
In the example above, there are three lines that start with '#' which precede the test result line: # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 These are used to indicate diagnostic data for the test case 'selftests: cpufreq: main.sh'
Material in comments between the identification line and the test result line are diagnostic data that can help to interpret the results of the test.
The TAP specification indicates that automated test harnesses may ignore any line that is not one of the mandatory prescribed lines (that is, the output format version line, the plan line, a test result line, or a "Bail out!" line.)
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
I think the required syntax should be:
Bail out! <reason>
And to make it clear that this is optionally used to indicate an early abort. (Though with a leading plan line, a parser should be able to determine this on its own.)
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Yes, totally.
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
may be interspersed with messages from the test itself
- reason: try to keep things line-based, since output from other things
Agreed: the yaml extension is not sensible for our use.
- TODO directive
Agreed: SKIP should cover everything TODO does.
KTAP Extensions beyond TAP13:
- nesting
(I would call this 'subtests')
Sounds good. Will do.
- via indentation
- indentation makes it easier for humans to read
And allows for separable parsing of subtests.
Agree. I'll try to work that into the doc.
- test identifier
- multiple parts, separated by ':'
This is interesting... is the goal to be able to report test status over time by name?
Yes. KernelCI and Fuego have the notions of a testcase namespace hierarchy. As the number of tests expands it is helpful to have the name-space for a sub-test be limited, just like a filesystem hierarchy provides scope for the names of objects (directories and files) that it contains.
- summary lines
- can be skipped by CI systems that do their own calculations
Right -- I think per-test summary line should be included for the humans reading a single test (which may scroll off the screen).
Other notes:
- automatic assignment of result status based on exit code
This is, I think, a matter for the kselftest running infrastructure, not the KTAP output?
Agreed. This doesn't have anything to do with the API between the tests and the results processor. I'll take it out.
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
Right.
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
Right.
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Right.
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#" diagnostic lines and the subtest handling.
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
Yes.
Great. I'll make this a priority to work on.
- is it too long or too short?
Should be slightly longer: more examples.
- if the document is desired, where should it be placed? I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
Yes Documentation/*.rst Not sure on name yet, but where do kselftest docs live? :)
Documentation/dev-tools/kselftest.rst
I'll put this at: Documentation/dev-tools/test-results-format.rst
- is this document accurate? I think KUNIT does a few things differently than this description.
Let's fix it. :)
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Yes please.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP relate? Neither SKIP nor XFAIL count toward failure, though, so both should be "ok"? I guess we should change it to "ok".
I have the same initial impression. In my mind, a skip is "not ok", since the test didn't run. However, a SKIP and should be treated differently from either "ok" or "not ok" by the results interpreter, so I don't think it matters. Originally I was averse to changing the SKIP result to "ok" (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to change the parser in Fuego, and it would make the kernel results format match the TAP13 spec. I don't see a strong reason for us to be different from TAP13 here.
I raised this issue on our automated testing conference call last week (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and so people should be chiming in if their parser will have a problem with this change.)
[1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
Thanks very much for the feedback. -- Tim
On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
Some months ago I started work on a document to formalize how kselftest implements the TAP specification. However, I didn't finish that work. Maybe it's time to do so now.
kselftest has developed a few differences from the original TAP specification, and some extensions that I believe are worth documenting.
Essentially, we have created our own KTAP (kernel TAP) format. I think it is worth documenting our conventions, in order to keep everyone on the same page.
Below is a partially completed document on my understanding of KTAP, based on examination of some of the kselftest test output. I have not reconciled this with the kunit output format, which I believe has some differences (which maybe we should resolve before we get too far into this).
I submit the document now, before it is finished, because a patch was recently introduced to alter one of the result conventions (from SKIP='not ok' to SKIP='ok').
See the document include inline below
====== start of ktap-doc-rfc.txt ======
[...]
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
We talked about this before, but I would like some way to get failed expectation/assertion information in the test in a consistent machine parsible way. Currently we do the following:
# Subtest: example 1..1 # example_simple_test: initializing # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29 Expected 1 + 1 == 3, but 1 + 1 == 2 3 == 3 not ok 1 - example_simple_test not ok 5 - example
Technically not TAP compliant, but no one seems to mind. I am okay with keeping it the way it is, but if we don't want it in the KTAP spec, we will need some kind of recourse.
- reason: try to keep things line-based, since output from other things
may be interspersed with messages from the test itself
- TODO directive
Is this more of stating a fact or desire? We don't use TODO either, but it looks like it could be useful.
KTAP Extensions beyond TAP13:
- nesting
- via indentation
- indentation makes it easier for humans to read
- test identifier
- multiple parts, separated by ':'
Can you elabroate on this more? I am not sure what you mean.
- summary lines
- can be skipped by CI systems that do their own calculations
Other notes:
- automatic assignment of result status based on exit code
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
- is it too long or too short?
- if the document is desired, where should it be placed?
I like it. I don't think we can rely on the TAP people updating their stuff based on my interactions with them. So having a spec which is actually maintained would be nice.
Maybe in Documentation/dev-tools/ ?
I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
- is this document accurate? I think KUNIT does a few things differently than this description.
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Yeah, I think it would be nice if all test frameworks/libraries for the kernel output tests in the same language.
Cheers
-----Original Message----- From: Brendan Higgins
On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
Some months ago I started work on a document to formalize how kselftest implements the TAP specification. However, I didn't finish that work. Maybe it's time to do so now.
kselftest has developed a few differences from the original TAP specification, and some extensions that I believe are worth documenting.
Essentially, we have created our own KTAP (kernel TAP) format. I think it is worth documenting our conventions, in order to keep everyone on the same page.
Below is a partially completed document on my understanding of KTAP, based on examination of some of the kselftest test output. I have not reconciled this with the kunit output format, which I believe has some differences (which maybe we should resolve before we get too far into this).
I submit the document now, before it is finished, because a patch was recently introduced to alter one of the result conventions (from SKIP='not ok' to SKIP='ok').
See the document include inline below
====== start of ktap-doc-rfc.txt ======
[...]
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
We talked about this before, but I would like some way to get failed expectation/assertion information in the test in a consistent machine parsible way. Currently we do the following:
# Subtest: example 1..1 # example_simple_test: initializing # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29 Expected 1 + 1 == 3, but 1 + 1 == 2 3 == 3 not ok 1 - example_simple_test not ok 5 - example
Technically not TAP compliant, but no one seems to mind. I am okay with keeping it the way it is, but if we don't want it in the KTAP spec, we will need some kind of recourse.
So far, most of the CI systems don't parse out diagnostic data, so it doesn't really matter what the format is. If it's useful for humans, it's valuable as is. However, it would be nice if that could change. But without some formalization of the format of the diagnostic data, it's an intractable problem for CI systems to parse it. So it's really a chicken and egg problem. To solve it, we would have to determine what exactly needs to be provided on a consistent basis for diagnostic data across many tests. I think that it's too big a problem to handle right now. I'm not opposed to migrating to some structure with yaml in the future, but free form text output seems OK for now.
- reason: try to keep things line-based, since output from other things
may be interspersed with messages from the test itself
- TODO directive
Is this more of stating a fact or desire? We don't use TODO either, but it looks like it could be useful.
Just stating a fact. I didn't find TODO in either KUnit or selftest in November when I initially wrote this up. If TODO serves as a kind of XFAIL, it could be useful. I have nothing against it.
KTAP Extensions beyond TAP13:
- nesting
- via indentation
- indentation makes it easier for humans to read
- test identifier
- multiple parts, separated by ':'
Can you elabroate on this more? I am not sure what you mean.
An individual test case can have a name that is scoped by a containing test or test suite. For example: selftests: cpufreq: main.sh This test identifier consists of the test system (selftests), the test area (cpufreq), and the test case name (main.sh). This one's a bit weird because the test case name is just the name of the program in that test area. The program itself doesn't output data in TAP format, and the harness uses it's exit code to detect PASS/FAIL. if main.sh had multiple test cases, it might produce test identifiers like this: selftests: cpufreq: main: check_change_afinity_mask selftests: cpufreq: main: check_permissions_for_mask_operation (Or it might just produce the last part of these strings, the testcase names, and the testcase id might be something generated by the harness or CI system.)
The value of having a single string to identify the testcase (like a uniform resource locator), is that it's easier to use the string to correlate results produced from different CI system that are executing the same test.
- summary lines
- can be skipped by CI systems that do their own calculations
Other notes:
- automatic assignment of result status based on exit code
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
- is it too long or too short?
- if the document is desired, where should it be placed?
I like it. I don't think we can rely on the TAP people updating their stuff based on my interactions with them. So having a spec which is actually maintained would be nice.
Maybe in Documentation/dev-tools/ ?
I'm leaning towards Documentation/dev-tools/test-results_format.rst
I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
- is this document accurate? I think KUNIT does a few things differently than this description.
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Yeah, I think it would be nice if all test frameworks/libraries for the kernel output tests in the same language.
Agreed.
-- Tim
On Tue, Jun 16, 2020 at 09:16:01PM +0000, Bird, Tim wrote:
So far, most of the CI systems don't parse out diagnostic data, so it doesn't really matter what the format is. If it's useful for humans, it's valuable as is. However, it would be nice if that could change. But without some formalization of the format of the diagnostic data, it's an intractable problem for CI systems to parse it. So it's really a chicken and egg problem. To solve it, we would have to determine what exactly needs to be provided on a consistent basis for diagnostic data across many tests. I think that it's too big a problem to handle right now. I'm not opposed to migrating to some structure with yaml in the future, but free form text output seems OK for now.
For a CI system, if I see a test has failed, I expect to be able to click a link to get the log of that test, which includes the diagnostic lines. The other reason to have them there is to show progress during a manual run.
Yeah, I think it would be nice if all test frameworks/libraries for the kernel output tests in the same language.
Agreed.
$ git grep "TAP version" exec/binfmt_script:print("TAP version 1.3") kselftest.h: printf("TAP version 13\n"); kselftest/runner.sh: echo "TAP version 13" resctrl/resctrl_tests.c: printf("TAP version 13\n"); size/get_size.c: print("TAP version 13\n");
Looks like there are 2 tests to convert to kselftest.h, and then we can just change the version to 14 in the header and the runner. ;)
-----Original Message----- From: Kees Cook
On Tue, Jun 16, 2020 at 09:16:01PM +0000, Bird, Tim wrote:
So far, most of the CI systems don't parse out diagnostic data, so it doesn't really matter what the format is. If it's useful for humans, it's valuable as is. However, it would be nice if that could change. But without some formalization of the format of the diagnostic data, it's an intractable problem for CI systems to parse it. So it's really a chicken and egg problem. To solve it, we would have to determine what exactly needs to be provided on a consistent basis for diagnostic data across many tests. I think that it's too big a problem to handle right now. I'm not opposed to migrating to some structure with yaml in the future, but free form text output seems OK for now.
For a CI system, if I see a test has failed, I expect to be able to click a link to get the log of that test, which includes the diagnostic lines. The other reason to have them there is to show progress during a manual run.
Agreed. You only need machine-parsable data if you expect the CI system to do something more with the data than just present it. What that would be, that would be common for all tests (or at least many test), is unclear. Maybe there are patterns in the diagnostic data that could lead to higher-level analysis, or even automated fixes, that don't become apparent if the data is unstructured. But it's hard to know until you have lots of data. I think just getting the other things consistent is a good priority right now. -- Tim
Yeah, I think it would be nice if all test frameworks/libraries for the kernel output tests in the same language.
Agreed.
$ git grep "TAP version" exec/binfmt_script:print("TAP version 1.3") kselftest.h: printf("TAP version 13\n"); kselftest/runner.sh: echo "TAP version 13" resctrl/resctrl_tests.c: printf("TAP version 13\n"); size/get_size.c: print("TAP version 13\n");
Looks like there are 2 tests to convert to kselftest.h, and then we can just change the version to 14 in the header and the runner. ;)
-- Kees Cook
On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
Agreed. You only need machine-parsable data if you expect the CI system to do something more with the data than just present it. What that would be, that would be common for all tests (or at least many test), is unclear. Maybe there are patterns in the diagnostic data that could lead to higher-level analysis, or even automated fixes, that don't become apparent if the data is unstructured. But it's hard to know until you have lots of data. I think just getting the other things consistent is a good priority right now.
Yeah. I think the main place for this is performance analysis, but I think that's a separate system entirely. TAP is really strictly yes/no, where as performance analysis a whole other thing. The only other thing I can think of is some kind of feature analysis, but that would be built out of the standard yes/no output. i.e. if I create a test that checks for specific security mitigation features (*cough*LKDTM*cough*), having a dashboard that shows features down one axis and architectures and/or kernel versions on other axes, then I get a pretty picture. But it's still being built out of the yes/no info.
*shrug*
I think diagnostic should be expressly non-machine-oriented.
On Wed, Jun 17, 2020 at 11:36 AM Kees Cook keescook@chromium.org wrote:
On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
Agreed. You only need machine-parsable data if you expect the CI system to do something more with the data than just present it. What that would be, that would be common for all tests (or at least many test), is unclear. Maybe there are patterns in the diagnostic data that could lead to higher-level analysis, or even automated fixes, that don't become apparent if the data is unstructured. But it's hard to know until you have lots of data. I think just getting the other things consistent is a good priority right now.
Yeah. I think the main place for this is performance analysis, but I think that's a separate system entirely. TAP is really strictly yes/no, where as performance analysis a whole other thing. The only other thing I can think of is some kind of feature analysis, but that would be built out of the standard yes/no output. i.e. if I create a test that checks for specific security mitigation features (*cough*LKDTM*cough*), having a dashboard that shows features down one axis and architectures and/or kernel versions on other axes, then I get a pretty picture. But it's still being built out of the yes/no info.
*shrug*
I think diagnostic should be expressly non-machine-oriented.
So from the KUnit side, we sort-of have three kinds of diagnostic lines: - Lines printed directly from tests (typically using kunit_info() or similar functions): as I understand it, these are basically the equivalent of what kselftest typically uses diagnostics for -- test-specific, human-readable messages. I don't think we need/want to parse these much. - Kernel messages during test execution. If we get the results from scraping the kernel log (which is still the default for KUnit, though there is also a debugfs info), other kernel logs can be interleaved with the results. Sometimes these are irrelevant things happening on another thread, sometimes they're something directly related to the test which we'd like to capture (KASAN errors, for instance). I don't think we want these to be machine oriented, but we may want to be able to filter them out. - Expectation failures: as Brendan mentioned, KUnit will print some diagnostic messages for individual assertion/expectation failures, including the expected and actual values. We'd ideally like to be able to identify and parse these, but keeping them human-readable is definitely also a goal.
Now, to be honest, I doubt that the distinction here would be of much use to kselftest, but it could be nice to not go out of our way to make parsing some diagnostic lines possible. That being said, personally I'm all for avoiding the yaml for diagnostic messages stuff and sticking to something simple and line-based, possibly standardising a the format of a few common diagnostic measurements (e.g., assertions/expected values/etc) in a way that's both human-readable and parsable if possible.
I agree that there's a lot of analysis that is possible with just the yes/no data. There's probably some fancy correlation one could do even with unstructured diagnostic logs, so I don't think overstructuring things is a necessity by any means. Where we have different tests doing similar sorts of things, though, consistency in message formatting could help even if things are not explicitly parsed. Ensuring that helper functions that log and the like are spitting things out in the same format is probably a good starting step down that path.
Cheers, -- David
On Tue, Jun 16, 2020 at 9:06 PM David Gow davidgow@google.com wrote:
On Wed, Jun 17, 2020 at 11:36 AM Kees Cook keescook@chromium.org wrote:
On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
Agreed. You only need machine-parsable data if you expect the CI system to do something more with the data than just present it. What that would be, that would be common for all tests (or at least many test), is unclear. Maybe there are patterns in the diagnostic data that could lead to higher-level analysis, or even automated fixes, that don't become apparent if the data is unstructured. But it's hard to know until you have lots of data. I think just getting the other things consistent is a good priority right now.
Yeah. I think the main place for this is performance analysis, but I think that's a separate system entirely. TAP is really strictly yes/no, where as performance analysis a whole other thing. The only other thing I can think of is some kind of feature analysis, but that would be built out of the standard yes/no output. i.e. if I create a test that checks for specific security mitigation features (*cough*LKDTM*cough*), having a dashboard that shows features down one axis and architectures and/or kernel versions on other axes, then I get a pretty picture. But it's still being built out of the yes/no info.
*shrug*
I think diagnostic should be expressly non-machine-oriented.
So from the KUnit side, we sort-of have three kinds of diagnostic lines:
- Lines printed directly from tests (typically using kunit_info() or
similar functions): as I understand it, these are basically the equivalent of what kselftest typically uses diagnostics for -- test-specific, human-readable messages. I don't think we need/want to parse these much.
- Kernel messages during test execution. If we get the results from
scraping the kernel log (which is still the default for KUnit, though there is also a debugfs info), other kernel logs can be interleaved with the results. Sometimes these are irrelevant things happening on another thread, sometimes they're something directly related to the test which we'd like to capture (KASAN errors, for instance). I don't think we want these to be machine oriented, but we may want to be able to filter them out.
- Expectation failures: as Brendan mentioned, KUnit will print some
diagnostic messages for individual assertion/expectation failures, including the expected and actual values. We'd ideally like to be able to identify and parse these, but keeping them human-readable is definitely also a goal.
Seems like a fair characterization to me.
Now, to be honest, I doubt that the distinction here would be of much use to kselftest, but it could be nice to not go out of our way to make parsing some diagnostic lines possible. That being said, personally I'm all for avoiding the yaml for diagnostic messages stuff and sticking to something simple and line-based, possibly standardising a the format of a few common diagnostic measurements (e.g., assertions/expected values/etc) in a way that's both human-readable and parsable if possible.
Agreed.
I agree that there's a lot of analysis that is possible with just the yes/no data. There's probably some fancy correlation one could do even with unstructured diagnostic logs, so I don't think overstructuring things is a necessity by any means. Where we have different tests doing similar sorts of things, though, consistency in message formatting could help even if things are not explicitly parsed. Ensuring that helper functions that log and the like are spitting things out in the same format is probably a good starting step down that path.
Super agreed. I am sure that there are people who can do some very clever things with what is being described here, but I would prefer not to overcomplicate the spec right out of the gate. It feels like we only have two TAP versions to reconcile here, and we are mostly in agreement on how to do that.
On 2020-06-16 23:05, David Gow wrote:
On Wed, Jun 17, 2020 at 11:36 AM Kees Cook keescook@chromium.org wrote:
On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
Agreed. You only need machine-parsable data if you expect the CI system to do something more with the data than just present it. What that would be, that would be common for all tests (or at least many test), is unclear. Maybe there are patterns in the diagnostic data that could lead to higher-level analysis, or even automated fixes, that don't become apparent if the data is unstructured. But it's hard to know until you have lots of data. I think just getting the other things consistent is a good priority right now.
Yeah. I think the main place for this is performance analysis, but I think that's a separate system entirely. TAP is really strictly yes/no, where as performance analysis a whole other thing. The only other thing I can think of is some kind of feature analysis, but that would be built out of the standard yes/no output. i.e. if I create a test that checks for specific security mitigation features (*cough*LKDTM*cough*), having a dashboard that shows features down one axis and architectures and/or kernel versions on other axes, then I get a pretty picture. But it's still being built out of the yes/no info.
*shrug*
I think diagnostic should be expressly non-machine-oriented.
So from the KUnit side, we sort-of have three kinds of diagnostic lines:
- Lines printed directly from tests (typically using kunit_info() or
similar functions): as I understand it, these are basically the equivalent of what kselftest typically uses diagnostics for -- test-specific, human-readable messages. I don't think we need/want to parse these much.
- Kernel messages during test execution. If we get the results from
scraping the kernel log (which is still the default for KUnit, though there is also a debugfs info), other kernel logs can be interleaved with the results. Sometimes these are irrelevant things happening on another thread, sometimes they're something directly related to the test which we'd like to capture (KASAN errors, for instance). I don't think we want these to be machine oriented, but we may want to be able to filter them out.
This is an important conceptual difference between testing a user space program (which is the environment that TAP initially was created for) and testing kernel code. This difference should be addressed in the KTAP standard. As noted above, a kernel test case may call into other kernel code, where the other kernel code generates messages that get into the test output.
One issue with the kernel issues is that they may be warnings or errors, and to anyone other than the test creator it is probably hard to determine whether the warnings and errors are reporting bugs or whether they are expected results triggered by the test.
I created a solution to report what error(s) were expected for a test, and a tool to validate whether the error(s) occurred or not. This is currently in the devicetree unittests, but the exact implementation should be discussed in the KUnit context, and it should be included in the KTAP spec.
I can describe the current implementation and start a discussion of any issues in this thread or I can start a new thread. Whichever seems appropriate to everyone.
-Frank
- Expectation failures: as Brendan mentioned, KUnit will print some
diagnostic messages for individual assertion/expectation failures, including the expected and actual values. We'd ideally like to be able to identify and parse these, but keeping them human-readable is definitely also a goal.
Now, to be honest, I doubt that the distinction here would be of much use to kselftest, but it could be nice to not go out of our way to make parsing some diagnostic lines possible. That being said, personally I'm all for avoiding the yaml for diagnostic messages stuff and sticking to something simple and line-based, possibly standardising a the format of a few common diagnostic measurements (e.g., assertions/expected values/etc) in a way that's both human-readable and parsable if possible.
I agree that there's a lot of analysis that is possible with just the yes/no data. There's probably some fancy correlation one could do even with unstructured diagnostic logs, so I don't think overstructuring things is a necessity by any means. Where we have different tests doing similar sorts of things, though, consistency in message formatting could help even if things are not explicitly parsed. Ensuring that helper functions that log and the like are spitting things out in the same format is probably a good starting step down that path.
Cheers, -- David
Just a quick note that there's been a lot of good discussion.
I have an updated draft of the document, but I need to review the flurry of comments today, and I'm busy getting my slides ready for a conference. So I just wanted to give a heads up that I'll be working on this (responding to comments and hopefully posting an updated draft version) early next week.
Thanks for the feedback. -- Tim
-----Original Message----- From: Frank Rowand frowand.list@gmail.com
On 2020-06-16 23:05, David Gow wrote:
On Wed, Jun 17, 2020 at 11:36 AM Kees Cook keescook@chromium.org wrote:
On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
Agreed. You only need machine-parsable data if you expect the CI system to do something more with the data than just present it. What that would be, that would be common for all tests (or at least many test), is unclear. Maybe there are patterns in the diagnostic data that could lead to higher-level analysis, or even automated fixes, that don't become apparent if the data is unstructured. But it's hard to know until you have lots of data. I think just getting the other things consistent is a good priority right now.
Yeah. I think the main place for this is performance analysis, but I think that's a separate system entirely. TAP is really strictly yes/no, where as performance analysis a whole other thing. The only other thing I can think of is some kind of feature analysis, but that would be built out of the standard yes/no output. i.e. if I create a test that checks for specific security mitigation features (*cough*LKDTM*cough*), having a dashboard that shows features down one axis and architectures and/or kernel versions on other axes, then I get a pretty picture. But it's still being built out of the yes/no info.
*shrug*
I think diagnostic should be expressly non-machine-oriented.
So from the KUnit side, we sort-of have three kinds of diagnostic lines:
- Lines printed directly from tests (typically using kunit_info() or
similar functions): as I understand it, these are basically the equivalent of what kselftest typically uses diagnostics for -- test-specific, human-readable messages. I don't think we need/want to parse these much.
- Kernel messages during test execution. If we get the results from
scraping the kernel log (which is still the default for KUnit, though there is also a debugfs info), other kernel logs can be interleaved with the results. Sometimes these are irrelevant things happening on another thread, sometimes they're something directly related to the test which we'd like to capture (KASAN errors, for instance). I don't think we want these to be machine oriented, but we may want to be able to filter them out.
This is an important conceptual difference between testing a user space program (which is the environment that TAP initially was created for) and testing kernel code. This difference should be addressed in the KTAP standard. As noted above, a kernel test case may call into other kernel code, where the other kernel code generates messages that get into the test output.
One issue with the kernel issues is that they may be warnings or errors, and to anyone other than the test creator it is probably hard to determine whether the warnings and errors are reporting bugs or whether they are expected results triggered by the test.
I created a solution to report what error(s) were expected for a test, and a tool to validate whether the error(s) occurred or not. This is currently in the devicetree unittests, but the exact implementation should be discussed in the KUnit context, and it should be included in the KTAP spec.
I can describe the current implementation and start a discussion of any issues in this thread or I can start a new thread. Whichever seems appropriate to everyone.
-Frank
- Expectation failures: as Brendan mentioned, KUnit will print some
diagnostic messages for individual assertion/expectation failures, including the expected and actual values. We'd ideally like to be able to identify and parse these, but keeping them human-readable is definitely also a goal.
Now, to be honest, I doubt that the distinction here would be of much use to kselftest, but it could be nice to not go out of our way to make parsing some diagnostic lines possible. That being said, personally I'm all for avoiding the yaml for diagnostic messages stuff and sticking to something simple and line-based, possibly standardising a the format of a few common diagnostic measurements (e.g., assertions/expected values/etc) in a way that's both human-readable and parsable if possible.
I agree that there's a lot of analysis that is possible with just the yes/no data. There's probably some fancy correlation one could do even with unstructured diagnostic logs, so I don't think overstructuring things is a necessity by any means. Where we have different tests doing similar sorts of things, though, consistency in message formatting could help even if things are not explicitly parsed. Ensuring that helper functions that log and the like are spitting things out in the same format is probably a good starting step down that path.
Cheers, -- David
On Tue, Jun 16, 2020 at 2:16 PM Bird, Tim Tim.Bird@sony.com wrote:
-----Original Message----- From: Brendan Higgins
On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
Some months ago I started work on a document to formalize how kselftest implements the TAP specification. However, I didn't finish that work. Maybe it's time to do so now.
kselftest has developed a few differences from the original TAP specification, and some extensions that I believe are worth documenting.
Essentially, we have created our own KTAP (kernel TAP) format. I think it is worth documenting our conventions, in order to keep everyone on the same page.
Below is a partially completed document on my understanding of KTAP, based on examination of some of the kselftest test output. I have not reconciled this with the kunit output format, which I believe has some differences (which maybe we should resolve before we get too far into this).
I submit the document now, before it is finished, because a patch was recently introduced to alter one of the result conventions (from SKIP='not ok' to SKIP='ok').
See the document include inline below
====== start of ktap-doc-rfc.txt ======
[...]
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
We talked about this before, but I would like some way to get failed expectation/assertion information in the test in a consistent machine parsible way. Currently we do the following:
# Subtest: example 1..1 # example_simple_test: initializing # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29 Expected 1 + 1 == 3, but 1 + 1 == 2 3 == 3 not ok 1 - example_simple_test
not ok 5 - example
Technically not TAP compliant, but no one seems to mind. I am okay with keeping it the way it is, but if we don't want it in the KTAP spec, we will need some kind of recourse.
So far, most of the CI systems don't parse out diagnostic data, so it doesn't really matter what the format is. If it's useful for humans, it's valuable as is. However, it would be nice if that could change. But without some formalization of the format of the diagnostic data, it's an intractable problem for CI systems to parse it. So it's really a chicken and egg problem. To solve it, we would have to determine what exactly needs to be provided on a consistent basis for diagnostic data across many tests. I think that it's too big a problem to handle right now. I'm not opposed to migrating to some structure with yaml in the future, but free form text output seems OK for now.
Well as long as everyone is cool with it for now we can put the problem for later.
- reason: try to keep things line-based, since output from other things
may be interspersed with messages from the test itself
- TODO directive
Is this more of stating a fact or desire? We don't use TODO either, but it looks like it could be useful.
Just stating a fact. I didn't find TODO in either KUnit or selftest in November when I initially wrote this up. If TODO serves as a kind of XFAIL, it could be useful. I have nothing against it.
Fair enough.
KTAP Extensions beyond TAP13:
- nesting
- via indentation
- indentation makes it easier for humans to read
- test identifier
- multiple parts, separated by ':'
Can you elabroate on this more? I am not sure what you mean.
An individual test case can have a name that is scoped by a containing test or test suite. For example: selftests: cpufreq: main.sh This test identifier consists of the test system (selftests), the test area (cpufreq), and the test case name (main.sh). This one's a bit weird because the test case name is just the name of the program in that test area. The program itself doesn't output data in TAP format, and the harness uses it's exit code to detect PASS/FAIL. if main.sh had multiple test cases, it might produce test identifiers like this: selftests: cpufreq: main: check_change_afinity_mask selftests: cpufreq: main: check_permissions_for_mask_operation (Or it might just produce the last part of these strings, the testcase names, and the testcase id might be something generated by the harness or CI system.)
+Alan Maguire
Aha, that is something we (Alan, David, Kees, and myself) were talking about on another thread:
https://lore.kernel.org/linux-kselftest/CABVgOSnjrzfFOMF0VE1-5Ks-e40fc0XZsNZ...
I think that makes a lot of sense if it isn't too hard in practice.
The value of having a single string to identify the testcase (like a uniform resource locator), is that it's easier to use the string to correlate results produced from different CI system that are executing the same test.
Makes sense.
- summary lines
- can be skipped by CI systems that do their own calculations
Other notes:
- automatic assignment of result status based on exit code
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
- is it too long or too short?
- if the document is desired, where should it be placed?
I like it. I don't think we can rely on the TAP people updating their stuff based on my interactions with them. So having a spec which is actually maintained would be nice.
Maybe in Documentation/dev-tools/ ?
I'm leaning towards Documentation/dev-tools/test-results_format.rst
SGTM.
I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
- is this document accurate? I think KUNIT does a few things differently than this description.
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Yeah, I think it would be nice if all test frameworks/libraries for the kernel output tests in the same language.
Agreed.
Cheers
On 2020-06-10 13:11, Bird, Tim wrote:
Some months ago I started work on a document to formalize how kselftest implements the TAP specification. However, I didn't finish that work. Maybe it's time to do so now.
kselftest has developed a few differences from the original TAP specification, and some extensions that I believe are worth documenting.
Essentially, we have created our own KTAP (kernel TAP) format. I think it is worth documenting our conventions, in order to keep everyone on the same page.
Below is a partially completed document on my understanding of KTAP, based on examination of some of the kselftest test output. I have not reconciled this with the kunit output format, which I believe has some differences (which maybe we should resolve before we get too far into this).
I submit the document now, before it is finished, because a patch was recently introduced to alter one of the result conventions (from SKIP='not ok' to SKIP='ok').
See the document include inline below
====== start of ktap-doc-rfc.txt ====== Selftest preferred output format
The linux kernel selftest system uses TAP (Test Anything Protocol) output to make testing results consumable by automated systems. A number of Continuous Integration (CI) systems test the kernel every day. It is useful for these systems that output from selftest programs be consistent and machine-parsable.
At the same time, it is useful for test results to be human-readable as well.
The kernel test result format is based on a variation TAP TAP is a simple text-based format that is documented on the TAP home page (http://testanything.org/). There is an official TAP13 specification here: http://testanything.org/tap-version-13-specification.html
The kernel test result format consists of 5 major elements, 4 of which are line-based:
This document should use the terminology used by the TAP spec, if at all reasonable and possible. It is very hard to go between this document and the TAP specs (v13 and v14) when they use different names for the same thing. For example, the TAP spec calls this "tests lines" instead of "test result".
-Frank
- the output version line
- the plan line
- one or more test result lines (also called test result lines)
- a possible "Bail out!" line
optional elements:
- diagnostic data
The 5th element is diagnostic information, which is used to describe items running in the test, and possibly to explain test results. A sample test result is show below:
Some other lines may be placed the test harness, and are not emitted by individual test programs:
- one or more test identification lines
- a possible results summary line
Here is an example:
TAP version 13 1..1 # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 ok 1 selftests: cpufreq: main.sh
The output version line is: "TAP version 13"
The test plan is "1..1".
Element details
Output version line
The output version line is always "TAP version 13".
Although the kernel test result format has some additions to the TAP13 format, the version line reported by kselftest tests is (currently) always the exact string "TAP version 13"
This is always the first line of test output.
Test plan line
The test plan indicates the number of individual test cases intended to be executed by the test. It always starts with "1.." and is followed by the number of tests cases. In the example above, 1..1", indicates that this test reports only 1 test case.
The test plan line can be placed in two locations:
- the second line of test output, when the number of test cases is known in advance
- as the last line of test output, when the number of test cases is not known in advance.
Most often, the number of test cases is known in advance, and the test plan line appears as the second line of test output, immediately following the output version line. The number of test cases might not be known in advance if the number of tests is calculated from runtime data. In this case, the test plan line is emitted as the last line of test output.
Test result lines
The test output consists of one or more test result lines that indicate the actual results for the test. These have the format:
<result> <number> <description> [<directive>] [<diagnostic data>]
The ''result'' must appear at the start of a line (except for when a test is nested, see below), and must consist of one of the following two phrases:
- ok
- not ok
If the test passed, then the result is reported as "ok". If the test failed, then the result is reported as "not ok". These must be in lower case, exactly as shown.
The ''number'' in the test result line represents the number of the test case being performed by the test program. This is often used by test harnesses as a unique identifier for each test case. The test number is a base-10 number, starting with 1. It should increase by one for each new test result line emitted. If possible the number for a test case should be kept the same over the lifetime of the test.
The ''description'' is a short description of the test case. This can be any string of words, but should avoid using colons (':') except as part of a fully qualifed test case name (see below).
Finally, it is possible to use a test directive to indicate another possible outcome for a test: that it was skipped. To report that a test case was skipped, the result line should start with the result "not ok", and the directive "# SKIP" should be placed after the test description. (Note that this deviates from the TAP13 specification).
A test may be skipped for a variety of reasons, ranging for insufficient privileges to missing features or resources required to execute that test case.
It is usually helpful if a diagnostic message is emitted to explain the reasons for the skip. If the message is a single line and is short, the diagnostic message may be placed after the '# SKIP' directive on the same line as the test result. However, if it is not on the test result line, it should precede the test line (see diagnostic data, next).
Diagnostic data
Diagnostic data is text that reports on test conditions or test operations, or that explains test results. In the kernel test result format, diagnostic data is placed on lines that start with a hash sign, followed by a space ('# ').
One special format of diagnostic data is a test identification line, that has the fully qualified name of a test case. Such a test identification line marks the start of test output for a test case.
In the example above, there are three lines that start with '#' which precede the test result line: # selftests: cpufreq: main.sh # pid 8101's current affinity mask: fff # pid 8101's new affinity mask: 1 These are used to indicate diagnostic data for the test case 'selftests: cpufreq: main.sh'
Material in comments between the identification line and the test result line are diagnostic data that can help to interpret the results of the test.
The TAP specification indicates that automated test harnesses may ignore any line that is not one of the mandatory prescribed lines (that is, the output format version line, the plan line, a test result line, or a "Bail out!" line.)
Bail out!
If a line in the test output starts with 'Bail out!', it indicates that the test was aborted for some reason. It indicates that the test is unable to proceed, and no additional tests will be performed.
This can be used at the very beginning of a test, or anywhere in the middle of the test, to indicate that the test can not continue.
--- from here on is not-yet-organized material
Tip:
- don't change the test plan based on skipped tests.
- it is better to report that a test case was skipped, than to not report it
- that is, don't adjust the number of test cases based on skipped tests
Other things to mention: TAP13 elements not used:
- yaml for diagnostic messages
may be interspersed with messages from the test itself
- reason: try to keep things line-based, since output from other things
- TODO directive
KTAP Extensions beyond TAP13:
- nesting
- via indentation
- indentation makes it easier for humans to read
- test identifier
- multiple parts, separated by ':'
- summary lines
- can be skipped by CI systems that do their own calculations
Other notes:
- automatic assignment of result status based on exit code
Tips:
- do NOT describe the result in the test line
- the test case description should be the same whether the test succeeds or fails
- use diagnostic lines to describe or explain results, if this is desirable
- test numbers are considered harmful
- test harnesses should use the test description as the identifier
- test numbers change when testcases are added or removed
- which means that results can't be compared between different versions of the test
- recommendations for diagnostic messages:
- reason for failure
- reason for skip
- diagnostic data should always preceding the result line
- problem: harness may emit result before test can do assessment to determine reason for result
- this is what the kernel uses
Differences between kernel test result format and TAP13:
- in KTAP the "# SKIP" directive is placed after the description on the test result line
====== start of ktap-doc-rfc.txt ====== OK - that's the end of the RFC doc.
Here are a few questions:
- is this document desired or not?
- is it too long or too short?
- if the document is desired, where should it be placed? I assume somewhere under Documentation, and put into .rst format. Suggestions for a name and location are welcome.
- is this document accurate? I think KUNIT does a few things differently than this description.
- is the intent to have kunit and kselftest have the same output format? if so, then these should be rationalized.
Finally,
- Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html
Regards, -- Tim
linux-kselftest-mirror@lists.linaro.org