We are looking to further standardise the output format used by kernel test frameworks like kselftest and KUnit. Thus far we have used the TAP (Test Anything Protocol) specification, but it has been extended in many different ways, so we would like to agree on a common "Kernel TAP" (KTAP) format to resolve these differences. Thus, below is a draft of a specification of KTAP. Note that this specification is largely based on the current format of test results for KUnit tests.
Additionally, this specification was heavily inspired by the KTAP specification draft by Tim Bird (https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD8...). However, there are some notable differences to his specification. One such difference is the format of nested tests is more fully specified in the following specification. However, they are specified in a way which may not be compatible with many kselftest nested tests.
===================== Specification of KTAP =====================
TAP, or the Test Anything Protocol is a format for specifying test results used by a number of projects. It's website and specification are found at: https://testanything.org/. The Linux Kernel uses TAP output for test results. However, KUnit (and other Kernel testing frameworks such as kselftest) have some special needs for test results which don't gel perfectly with the original TAP specification. Thus, a "Kernel TAP" (KTAP) format is specified to extend and alter TAP to support these use-cases.
KTAP Output consists of 5 major elements (all line-based): - The version line - Plan lines - Test case result lines - Diagnostic lines - A bail out line
An important component in this specification of KTAP is the specification of the format of nested tests. This can be found in the section on nested tests below.
The version line ----------------
The first line of KTAP output must be the version line. As this specification documents the first version of KTAP, the recommended version line is "KTAP version 1". However, since all kernel testing frameworks use TAP version lines, "TAP version 14" and "TAP version 13" are all acceptable version lines. Version lines with other versions of TAP or KTAP will not cause the parsing of the test results to fail but it will produce an error.
Plan lines ----------
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
Test case result lines ----------------------
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
The result can be either "ok", which indicates the test case passed, or "not ok", which indicates that the test case failed.
The number represents the number of the test case or suite being performed. The first test case or suite must have the number 1 and the number must increase by 1 for each additional test case or result at the same level and within the same testing suite.
The "-" character is optional.
The description is a description of the test, generally the name of the test, and can be any string of words (can't include #). The description is optional.
The directive is used to indicate if a test was skipped. The format for the directive is: "# SKIP [<skip_description>]". The skip_description is optional and can be any string of words to describe why the test was skipped. The result of the test case result line can be either "ok" or "not ok" if the skip directive is used. Finally, note that TAP 14 specification includes TODO directives but these are not supported for KTAP.
Examples of test case result lines:
Test passed:
ok 1 - test_case_name
Test was skipped:
not ok 1 - test_case_name # SKIP test_case_name should be skipped
Test failed:
not_ok 1 - test_case_name
Diagnostic lines ----------------
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format. Diagnostic lines can be anywhere in the test output after the first two lines. There are a few special diagnostic lines. Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
Bail out line -------------
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
Nested tests ------------
The new specification for KTAP will support an arbitrary number of nested subtests. Thus, tests can now have subtests and those subtests can have subtests. This can be useful to further categorize tests and organize test results.
The new required format for a test with subtests consists of: a subtest header line, a plan line, all subtests, and a final test result line.
The first line of the test must be the subtest header line with the format: "# Subtest: <test_name>".
The second line of the test must be the plan line, which is formatted as "1..N", where N is the number of subtests.
Following the plan line, all lines pertaining to the subtests will follow.
Finally, the last line of the test is a final test result line with the format: "(ok|not ok) <number> [-] [<description>] [<directive>] [<diagnostic data>]", which follows the same format as the general test result lines described in this section. The result line should indicate the result of the subtests. Thus, if one of the subtests fail, the test should fail. The description in the final test result line should match the name of the test in the subtest header.
An example format:
KTAP version 1 1..1 # Subtest: test_suite 1..2 ok 1 - test_1 ok 2 - test_2 ok 1 - test_suite
An example format with multiple levels of nested testing:
KTAP version 1 1..1 # Subtest: test_suite 1..2 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 2 - test ok 1 - test_suite
In the instance that the plan line is missing, the end of the test will be denoted by the final result line containing a description that matches the name of the test given in the subtest header. Note that thus, if the plan line is missing and one of the subtests have a matching name to the test suite this will cause errors.
Lastly, indentation is also recommended for improved readability.
Major differences between TAP 14 and KTAP specification -------------------------------------------------------
Note that the major differences between TAP 14 and KTAP specification: - yaml and json are not allowed in diagnostic messages - TODO directive not allowed - KTAP allows for an arbitrary number of tests to be nested with specified nested test format
Example of KTAP ---------------
KTAP version 1 1..1 # Subtest: test_suite 1..1 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 1 - test_suite
========================================= Note on incompatibilities with kselftests =========================================
To my knowledge, the above specification seems to generally accept the TAP format of many non-nested test results of kselftests.
An example of a common kselftests TAP format for non-nested test results that are accepted by the above specification:
TAP version 13 1..2 # selftests: vDSO: vdso_test_gettimeofday # The time is 1628024856.096879 ok 1 selftests: vDSO: vdso_test_gettimeofday # selftests: vDSO: vdso_test_getcpu # Could not find __vdso_getcpu ok 2 selftests: vDSO: vdso_test_getcpu # SKIP
However, one major difference noted with kselftests is the use of more directives than the "# SKIP" directive. kselftest also supports XPASS and XFAIL directives. Some additional examples found in kselftests:
not ok 5 selftests: netfilter: nft_concat_range.sh # TIMEOUT 45 seconds
not ok 45 selftests: kvm: kvm_binary_stats_test # exit=127
Should the specification be expanded to include these directives?
However, the general format for kselftests with nested test results seems to differ from the above specification. It seems that a general format for nested tests is as follows:
TAP version 13 1..2 # selftests: membarrier: membarrier_test_single_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 1 selftests: membarrier: membarrier_test_single_thread # selftests: membarrier: membarrier_test_multi_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 2 selftests: membarrier: membarrier_test_multi_thread
The major differences here, that do not match the above specification, are use of "# " as an indentation and using a TAP version line to denote a new test with subtests rather than the subtest header line described above. If these are widely utilized formats in kselftests, should we include both versions in the specification or should we attempt to agree on a single format for nested tests? I personally believe we should try to agree on a single format for nested tests. This would allow for a cleaner specification of KTAP and would reduce possible confusion.
====
So what do people think about the above specification? How should we handle the differences with kselftests? If this specification is accepted, where should the specification be documented?
On Tue, Aug 10, 2021 at 4:25 PM Rae Moar rmoar@google.com wrote:
We are looking to further standardise the output format used by kernel test frameworks like kselftest and KUnit. Thus far we have used the TAP (Test Anything Protocol) specification, but it has been extended in many different ways, so we would like to agree on a common "Kernel TAP" (KTAP) format to resolve these differences. Thus, below is a draft of a specification of KTAP. Note that this specification is largely based on the current format of test results for KUnit tests.
Additionally, this specification was heavily inspired by the KTAP specification draft by Tim Bird (https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD8...). However, there are some notable differences to his specification. One such difference is the format of nested tests is more fully specified in the following specification. However, they are specified in a way which may not be compatible with many kselftest nested tests.
Thanks for putting this together, Rae! I would definitely like to see us moving forward with standardizing fully on the KTAP spec.
I am surprised to not yet see any comments here.
I was thinking that this might make a good BoF topic at LPC, but I guess that depends on how interested people are in this topic.
[+kernelci, +Guillaume]
Hi!
Please keep me in CC for these kinds of proposals. And thanks for looking at this again! Please understand that while I may be coming across as rather negative here, I would like to have a rational and documented specification for the kernel's test output, too. My main objection here is that we already _have_ a specification, and it's already being parsed by machines, so making changes without strong justification is going to be met with resistance. :) So, with that in mind, on to my reply...
On Tue, Aug 10, 2021 at 04:25:03PM -0700, Rae Moar wrote:
We are looking to further standardise the output format used by kernel test frameworks like kselftest and KUnit. Thus far we have used the TAP (Test Anything Protocol) specification, but it has been extended in many different ways, so we would like to agree on a common "Kernel TAP" (KTAP) format to resolve these differences. Thus, below is a draft of a specification of KTAP. Note that this specification is largely based on the current format of test results for KUnit tests.
The kernel's largest producer of TAP is kselftest, and the largest consumer is LAVA[1]. I would want buy-in from at least those responsible for either side of those two things. (And I'm one of the people working on both sides of it.)
The fundamental purpose of the kernel's TAP is to have many independent tests runnable and parseable, specifically without any embedded framework knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses (kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well as several hand-rolled instances in Shell, Python, and C. There used to be more, but I have been steadily fixing their syntax[2] and merging[3] separate implementations for a while now.
[1] https://github.com/Linaro/test-definitions/commit/8bd338bbcfa5a03efcf1d12e25... [2] https://git.kernel.org/linus/b0df366bbd701c45e93af0dcb87ce22398589d1d [3] https://git.kernel.org/linus/e80068be21824e4d2726eeea41cac6178d212415
Additionally, this specification was heavily inspired by the KTAP specification draft by Tim Bird (https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD8...). However, there are some notable differences to his specification. One such difference is the format of nested tests is more fully specified in the following specification. However, they are specified in a way which may not be compatible with many kselftest nested tests.
I commented extensively on that thread. :)
===================== Specification of KTAP =====================
TAP, or the Test Anything Protocol is a format for specifying test results used by a number of projects. It's website and specification are found at: https://testanything.org/. The Linux Kernel uses TAP output for test results. However, KUnit (and other Kernel testing frameworks such as kselftest) have some special needs for test results which don't gel perfectly with the original TAP specification. Thus, a "Kernel TAP" (KTAP) format is specified to extend and alter TAP to support these use-cases.
KTAP Output consists of 5 major elements (all line-based):
- The version line
- Plan lines
- Test case result lines
These are required.
- Diagnostic lines
This is optional.
- A bail out line
Bail out should be optional, and possibly not included at all. (See below.)
An important component in this specification of KTAP is the specification of the format of nested tests. This can be found in the section on nested tests below.
The version line
The first line of KTAP output must be the version line. As this specification documents the first version of KTAP, the recommended version line is "KTAP version 1". However, since all kernel testing frameworks use TAP version lines, "TAP version 14" and "TAP version 13" are all acceptable version lines. Version lines with other versions of TAP or KTAP will not cause the parsing of the test results to fail but it will produce an error.
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing parsers.
Plan lines
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested tests".
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
I suggest:
<result> <number>[ [- ]<description>][# <directive> [<directive details>]]
The result can be either "ok", which indicates the test case passed, or "not ok", which indicates that the test case failed.
Yes.
The number represents the number of the test case or suite being performed. The first test case or suite must have the number 1 and the number must increase by 1 for each additional test case or result at the same level and within the same testing suite.
Yes, though parsers should handle ordering failures and missed test results (this is the purpose of the "plan" line).
The "-" character is optional.
The description is a description of the test, generally the name of the test, and can be any string of words (can't include #). The description is optional.
Yes, though the optional "- " is strictly part of the optional description.
The directive is used to indicate if a test was skipped. The format
The directive is a single word -- currently only "SKIP" is recognized by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and error. (Though I would love to hear what "error" is intended to be used for, if different from regular "not ok".)
(This could also be called "context" rather than "directive".)
for the directive is: "# SKIP [<skip_description>]". The skip_description is optional and can be any string of words to describe why the test was skipped.
I would call this "directive details".
The result of the test case result line can be either "ok" or "not ok" if the skip directive is used. Finally, note that TAP 14 specification includes TODO directives but these are not supported for KTAP.
Examples of test case result lines:
Test passed:
ok 1 - test_case_name
Test was skipped:
not ok 1 - test_case_name # SKIP test_case_name should be skipped
Test failed:
not_ok 1 - test_case_name
This isn't valid. No "_" is recognized for "ok" vs "not ok".
Diagnostic lines
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
Diagnostic lines can be anywhere in the test output after the first two lines.
I don't see a reason for this strictness. They can be anywhere.
There are a few special diagnostic lines.
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
Also, parsers should treat a new "TAP version..." line at the same nesting level as indication that the prior test has ended (and any tests without result lines should be considered failed).
Nested tests
The new specification for KTAP will support an arbitrary number of nested subtests. Thus, tests can now have subtests and those subtests can have subtests. This can be useful to further categorize tests and organize test results.
The new required format for a test with subtests consists of: a subtest header line, a plan line, all subtests, and a final test result line.
The first line of the test must be the subtest header line with the format: "# Subtest: <test_name>".
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
The second line of the test must be the plan line, which is formatted as "1..N", where N is the number of subtests.
Following the plan line, all lines pertaining to the subtests will follow.
Finally, the last line of the test is a final test result line with the format: "(ok|not ok) <number> [-] [<description>] [<directive>] [<diagnostic data>]", which follows the same format as the general test result lines described in this section. The result line should indicate the result of the subtests. Thus, if one of the subtests fail, the test should fail. The description in the final test result line should match the name of the test in the subtest header.
An example format:
KTAP version 1 1..1 # Subtest: test_suite 1..2 ok 1 - test_1 ok 2 - test_2 ok 1 - test_suite
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
An example format with multiple levels of nested testing:
KTAP version 1 1..1 # Subtest: test_suite 1..2 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 2 - test ok 1 - test_suite
In the instance that the plan line is missing, the end of the test will be denoted by the final result line containing a description that matches the name of the test given in the subtest header. Note that thus, if the plan line is missing and one of the subtests have a matching name to the test suite this will cause errors.
A plan line is required. No special cases are needed. :)
If nesting level is lost, a parser will understand the nested test to have ended, but probably so did the test runner:
TAP version 13 1..1 TAP version 13 1..2 TAP version 13 1..3 ok 1 - test_1 ok 2 test_2 not ok 1 - sub test unfinished plan ok 2 - test not ok 1 - test_suite
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
Whether this is "# " or " " I don't really care, as long as the change is coordinated. " " is easier for humans to read, but may make parsing of "unknown" lines more difficult for machines.
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
- TODO directive not allowed
I would just say "unrecognized".
- KTAP allows for an arbitrary number of tests to be nested with
specified nested test format
Yup.
Example of KTAP
KTAP version 1 1..1 # Subtest: test_suite 1..1 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 1 - test_suite
For a good example, please include all the possible combinations (SKIP, not ok, diagnostics, etc etc)
========================================= Note on incompatibilities with kselftests =========================================
To my knowledge, the above specification seems to generally accept the TAP format of many non-nested test results of kselftests.
An example of a common kselftests TAP format for non-nested test results that are accepted by the above specification:
TAP version 13 1..2 # selftests: vDSO: vdso_test_gettimeofday # The time is 1628024856.096879 ok 1 selftests: vDSO: vdso_test_gettimeofday # selftests: vDSO: vdso_test_getcpu # Could not find __vdso_getcpu ok 2 selftests: vDSO: vdso_test_getcpu # SKIP
However, one major difference noted with kselftests is the use of more directives than the "# SKIP" directive. kselftest also supports XPASS and XFAIL directives. Some additional examples found in kselftests:
not ok 5 selftests: netfilter: nft_concat_range.sh # TIMEOUT 45 seconds not ok 45 selftests: kvm: kvm_binary_stats_test # exit=127
Should the specification be expanded to include these directives?
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
However, the general format for kselftests with nested test results seems to differ from the above specification. It seems that a general format for nested tests is as follows:
TAP version 13 1..2 # selftests: membarrier: membarrier_test_single_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 1 selftests: membarrier: membarrier_test_single_thread # selftests: membarrier: membarrier_test_multi_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 2 selftests: membarrier: membarrier_test_multi_thread
The major differences here, that do not match the above specification, are use of "# " as an indentation and using a TAP version line to denote a new test with subtests rather than the subtest header line described above. If these are widely utilized formats in kselftests, should we include both versions in the specification or should we attempt to agree on a single format for nested tests? I personally believe we should try to agree on a single format for nested tests. This would allow for a cleaner specification of KTAP and would reduce possible confusion.
We already use "# " and the nested TAP lines to denote subtests. Without a good reason to change it, we should avoid the churn with the existing parsers.
====
So what do people think about the above specification? How should we handle the differences with kselftests?
I'm probably a broken record by now, but kselftests _is_ the specification. ;) What about it needs changing, and why?
If this specification is accepted, where should the specification be documented?
I imagine some .rst file under Documentation/dev-tools/, linked to from kselftest.rst and kunit/...rst
On Sat, Aug 28, 2021 at 1:20 AM Kees Cook keescook@chromium.org wrote:
[+kernelci, +Guillaume]
Hi!
Please keep me in CC for these kinds of proposals. And thanks for looking
Oh sorry, we will make sure to do that in the future. I figured most people who were interested who weren't directly CC'ed were on the kselftest list.
at this again! Please understand that while I may be coming across as rather negative here, I would like to have a rational and documented specification for the kernel's test output, too. My main objection here is that we already _have_ a specification, and it's already being parsed
We do?! Where is it? Can you link it here?
by machines, so making changes without strong justification is going to be met with resistance. :) So, with that in mind, on to my reply...
Of course, but if there is no specification or it is not being universally followed, then changes will need to be made, no?
On Tue, Aug 10, 2021 at 04:25:03PM -0700, Rae Moar wrote:
We are looking to further standardise the output format used by kernel test frameworks like kselftest and KUnit. Thus far we have used the TAP (Test Anything Protocol) specification, but it has been extended in many different ways, so we would like to agree on a common "Kernel TAP" (KTAP) format to resolve these differences. Thus, below is a draft of a specification of KTAP. Note that this specification is largely based on the current format of test results for KUnit tests.
The kernel's largest producer of TAP is kselftest, and the largest consumer is LAVA[1]. I would want buy-in from at least those responsible for either side of those two things. (And I'm one of the people working on both sides of it.)
Of course, that's why we sent it to the kselftest list. In terms of LAVA as the biggest consumer, thank you for CCing them. My apologies for failing to do so.
The fundamental purpose of the kernel's TAP is to have many independent tests runnable and parseable, specifically without any embedded framework knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses (kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well as several hand-rolled instances in Shell, Python, and C. There used to be more, but I have been steadily fixing their syntax[2] and merging[3] separate implementations for a while now.
Yep! And I believe there are still some inconsistencies - hence part of the motivation.
[1] https://github.com/Linaro/test-definitions/commit/8bd338bbcfa5a03efcf1d12e25... [2] https://git.kernel.org/linus/b0df366bbd701c45e93af0dcb87ce22398589d1d [3] https://git.kernel.org/linus/e80068be21824e4d2726eeea41cac6178d212415
Additionally, this specification was heavily inspired by the KTAP specification draft by Tim Bird (https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD8...). However, there are some notable differences to his specification. One such difference is the format of nested tests is more fully specified in the following specification. However, they are specified in a way which may not be compatible with many kselftest nested tests.
I commented extensively on that thread. :)
Yep, and if I recall correctly, there was no resolution to that thread, so we are trying to pick that conversation back up here.
===================== Specification of KTAP =====================
TAP, or the Test Anything Protocol is a format for specifying test results used by a number of projects. It's website and specification are found at: https://testanything.org/. The Linux Kernel uses TAP output for test results. However, KUnit (and other Kernel testing frameworks such as kselftest) have some special needs for test results which don't gel perfectly with the original TAP specification. Thus, a "Kernel TAP" (KTAP) format is specified to extend and alter TAP to support these use-cases.
KTAP Output consists of 5 major elements (all line-based):
- The version line
- Plan lines
- Test case result lines
These are required.
- Diagnostic lines
This is optional.
- A bail out line
Bail out should be optional, and possibly not included at all. (See below.)
An important component in this specification of KTAP is the specification of the format of nested tests. This can be found in the section on nested tests below.
The version line
The first line of KTAP output must be the version line. As this specification documents the first version of KTAP, the recommended version line is "KTAP version 1". However, since all kernel testing frameworks use TAP version lines, "TAP version 14" and "TAP version 13" are all acceptable version lines. Version lines with other versions of TAP or KTAP will not cause the parsing of the test results to fail but it will produce an error.
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing parsers.
"TAP version Linux.1" would not break existing parsers? If so, that would be great, I just expect that it would break them too.
Plan lines
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested tests".
Why not? I got the impression on the previous thread that everyone was amenable to this?
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
I suggest:
<result> <number>[ [- ]<description>][# <directive> [<directive details>]]
I think that is probably fine, but I think Rae would probably have a better idea.
The result can be either "ok", which indicates the test case passed, or "not ok", which indicates that the test case failed.
Yes.
The number represents the number of the test case or suite being performed. The first test case or suite must have the number 1 and the number must increase by 1 for each additional test case or result at the same level and within the same testing suite.
Yes, though parsers should handle ordering failures and missed test results (this is the purpose of the "plan" line).
Right.
The "-" character is optional.
The description is a description of the test, generally the name of the test, and can be any string of words (can't include #). The description is optional.
Yes, though the optional "- " is strictly part of the optional description.
The directive is used to indicate if a test was skipped. The format
The directive is a single word -- currently only "SKIP" is recognized by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and error. (Though I would love to hear what "error" is intended to be used for, if different from regular "not ok".)
Can you explain (or link to documentation) on what these other directives mean?
(This could also be called "context" rather than "directive".)
for the directive is: "# SKIP [<skip_description>]". The skip_description is optional and can be any string of words to describe why the test was skipped.
I would call this "directive details".
The result of the test case result line can be either "ok" or "not ok" if the skip directive is used. Finally, note that TAP 14 specification includes TODO directives but these are not supported for KTAP.
Examples of test case result lines:
Test passed:
ok 1 - test_case_name
Test was skipped:
not ok 1 - test_case_name # SKIP test_case_name should be skipped
Test failed:
not_ok 1 - test_case_name
This isn't valid. No "_" is recognized for "ok" vs "not ok".
Oh yeah, I think that was a typo.
Diagnostic lines
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
Interesting. I think David probably would have some thoughts here.
Diagnostic lines can be anywhere in the test output after the first two lines.
I don't see a reason for this strictness. They can be anywhere.
There are a few special diagnostic lines.
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
Cool, any ideas on what we should do then? I personally don't have strong feelings on it.
Also, parsers should treat a new "TAP version..." line at the same nesting level as indication that the prior test has ended (and any tests without result lines should be considered failed).
OK, that is definitely different. I think this will require some further discussion.
Nested tests
The new specification for KTAP will support an arbitrary number of nested subtests. Thus, tests can now have subtests and those subtests can have subtests. This can be useful to further categorize tests and organize test results.
The new required format for a test with subtests consists of: a subtest header line, a plan line, all subtests, and a final test result line.
The first line of the test must be the subtest header line with the format: "# Subtest: <test_name>".
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
Well it doesn't have to, right? The parent test can include the directive. In anycase, something has to know enough to print a test plan.
The second line of the test must be the plan line, which is formatted as "1..N", where N is the number of subtests.
Following the plan line, all lines pertaining to the subtests will follow.
Finally, the last line of the test is a final test result line with the format: "(ok|not ok) <number> [-] [<description>] [<directive>] [<diagnostic data>]", which follows the same format as the general test result lines described in this section. The result line should indicate the result of the subtests. Thus, if one of the subtests fail, the test should fail. The description in the final test result line should match the name of the test in the subtest header.
An example format:
KTAP version 1 1..1 # Subtest: test_suite 1..2 ok 1 - test_1 ok 2 - test_2 ok 1 - test_suite
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
What spec? I didn't see a final spec come from the previous discussion.
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
An example format with multiple levels of nested testing:
KTAP version 1 1..1 # Subtest: test_suite 1..2 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 2 - test ok 1 - test_suite
In the instance that the plan line is missing, the end of the test will be denoted by the final result line containing a description that matches the name of the test given in the subtest header. Note that thus, if the plan line is missing and one of the subtests have a matching name to the test suite this will cause errors.
A plan line is required. No special cases are needed. :)
If nesting level is lost, a parser will understand the nested test to have ended, but probably so did the test runner:
TAP version 13 1..1 TAP version 13 1..2 TAP version 13 1..3 ok 1 - test_1 ok 2 test_2 not ok 1 - sub test unfinished plan ok 2 - test not ok 1 - test_suite
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
KUnit cannot support indentation being required. We print test results to dmesg. At anytime something could get printed to dmesg that could potentially mess up indentation.
Whether this is "# " or " " I don't really care, as long as the change is coordinated. " " is easier for humans to read, but may make parsing of "unknown" lines more difficult for machines.
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
- TODO directive not allowed
I would just say "unrecognized".
- KTAP allows for an arbitrary number of tests to be nested with
specified nested test format
Yup.
Example of KTAP
KTAP version 1 1..1 # Subtest: test_suite 1..1 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 1 - test_suite
For a good example, please include all the possible combinations (SKIP, not ok, diagnostics, etc etc)
========================================= Note on incompatibilities with kselftests =========================================
To my knowledge, the above specification seems to generally accept the TAP format of many non-nested test results of kselftests.
An example of a common kselftests TAP format for non-nested test results that are accepted by the above specification:
TAP version 13 1..2 # selftests: vDSO: vdso_test_gettimeofday # The time is 1628024856.096879 ok 1 selftests: vDSO: vdso_test_gettimeofday # selftests: vDSO: vdso_test_getcpu # Could not find __vdso_getcpu ok 2 selftests: vDSO: vdso_test_getcpu # SKIP
However, one major difference noted with kselftests is the use of more directives than the "# SKIP" directive. kselftest also supports XPASS and XFAIL directives. Some additional examples found in kselftests:
not ok 5 selftests: netfilter: nft_concat_range.sh # TIMEOUT 45 seconds not ok 45 selftests: kvm: kvm_binary_stats_test # exit=127
Should the specification be expanded to include these directives?
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
However, the general format for kselftests with nested test results seems to differ from the above specification. It seems that a general format for nested tests is as follows:
TAP version 13 1..2 # selftests: membarrier: membarrier_test_single_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 1 selftests: membarrier: membarrier_test_single_thread # selftests: membarrier: membarrier_test_multi_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 2 selftests: membarrier: membarrier_test_multi_thread
The major differences here, that do not match the above specification, are use of "# " as an indentation and using a TAP version line to denote a new test with subtests rather than the subtest header line described above. If these are widely utilized formats in kselftests, should we include both versions in the specification or should we attempt to agree on a single format for nested tests? I personally believe we should try to agree on a single format for nested tests. This would allow for a cleaner specification of KTAP and would reduce possible confusion.
We already use "# " and the nested TAP lines to denote subtests. Without a good reason to change it, we should avoid the churn with the existing parsers.
====
So what do people think about the above specification? How should we handle the differences with kselftests?
I'm probably a broken record by now, but kselftests _is_ the specification. ;) What about it needs changing, and why?
Wait, what?! An implementation is not a specification. I thought Tim's attempt at standardizing the TAP that exists under kselftest, KUnit, and elsewhere was recognized as important or at least worthwhile.
The problem that was recognized, as I understand, was that there are multiple "interpretations" of TAP floating around the kernel and that goes against the original point of trying to standardize around TAP.
I know KUnit's usage is pretty minor in comparison to kselftest, but people do use it and there is no point in us, KUnit, purporting to use TAP and remain compatible with any particular version of it if it is incompatible with kselftest's TAP.
Additionally, there is no way that we are going to be able to stay on a compatible implementation of TAP unless we specify what TAP is separate from the implementation.
Maybe there is something I am missing here. If so, please explain.
If this specification is accepted, where should the specification be documented?
I imagine some .rst file under Documentation/dev-tools/, linked to from kselftest.rst and kunit/...rst
-- Kees Cook
-----Original Message----- From: Kees Cook keescook@chromium.org
[+kernelci, +Guillaume]
Hi!
Please keep me in CC for these kinds of proposals. And thanks for looking at this again! Please understand that while I may be coming across as rather negative here, I would like to have a rational and documented specification for the kernel's test output, too. My main objection here is that we already _have_ a specification, and it's already being parsed by machines, so making changes without strong justification is going to be met with resistance. :) So, with that in mind, on to my reply...
On Tue, Aug 10, 2021 at 04:25:03PM -0700, Rae Moar wrote:
We are looking to further standardise the output format used by kernel test frameworks like kselftest and KUnit. Thus far we have used the TAP (Test Anything Protocol) specification, but it has been extended in many different ways, so we would like to agree on a common "Kernel TAP" (KTAP) format to resolve these differences. Thus, below is a draft of a specification of KTAP. Note that this specification is largely based on the current format of test results for KUnit tests.
The kernel's largest producer of TAP is kselftest, and the largest consumer is LAVA[1]. I would want buy-in from at least those responsible for either side of those two things. (And I'm one of the people working on both sides of it.)
The fundamental purpose of the kernel's TAP is to have many independent tests runnable and parseable, specifically without any embedded framework knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses (kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well as several hand-rolled instances in Shell, Python, and C. There used to be more, but I have been steadily fixing their syntax[2] and merging[3] separate implementations for a while now.
[1] https://github.com/Linaro/test-definitions/commit/8bd338bbcfa5a03efcf1d12e25... [2] https://git.kernel.org/linus/b0df366bbd701c45e93af0dcb87ce22398589d1d [3] https://git.kernel.org/linus/e80068be21824e4d2726eeea41cac6178d212415
Additionally, this specification was heavily inspired by the KTAP specification draft by Tim Bird (https://lore.kernel.org/linux-
kselftest/CY4PR13MB1175B804E31E502221BC8163FD830@CY4PR13MB1175.namprd13.prod.outlook.com/T/).
However, there are some notable differences to his specification. One such difference is the format of nested tests is more fully specified in the following specification. However, they are specified in a way which may not be compatible with many kselftest nested tests.
I commented extensively on that thread. :)
===================== Specification of KTAP =====================
TAP, or the Test Anything Protocol is a format for specifying test results used by a number of projects. It's website and specification are found at: https://testanything.org/. The Linux Kernel uses TAP output for test results. However, KUnit (and other Kernel testing frameworks such as kselftest) have some special needs for test results which don't gel perfectly with the original TAP specification. Thus, a "Kernel TAP" (KTAP) format is specified to extend and alter TAP to support these use-cases.
KTAP Output consists of 5 major elements (all line-based):
- The version line
- Plan lines
- Test case result lines
These are required.
- Diagnostic lines
This is optional.
- A bail out line
Bail out should be optional, and possibly not included at all. (See below.)
An important component in this specification of KTAP is the specification of the format of nested tests. This can be found in the section on nested tests below.
The version line
The first line of KTAP output must be the version line. As this specification documents the first version of KTAP, the recommended version line is "KTAP version 1". However, since all kernel testing frameworks use TAP version lines, "TAP version 14" and "TAP version 13" are all acceptable version lines. Version lines with other versions of TAP or KTAP will not cause the parsing of the test results to fail but it will produce an error.
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing parsers.
Plan lines
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested tests".
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
IMHO it's useful to keep this called 'diagnostic data'.
The directive is optional, and having this be diagnostic data means you can put explanatory text, or result details, that are independent of the description, on the same line as the result. This is convenient in some cases.
I think diagnostic data should always be prefixed by a hash sign. Or, in the case where diagnostic data appears on the result line, the hash serves as the delimiter. This means that the description text must not include a hash character.
For example: ok 5 - check return code # rcode=0 or not ok 5 - check return code # rcode=17
I suggest:
<result> <number>[ [- ]<description>][# <directive> [<directive details>]]
The result can be either "ok", which indicates the test case passed, or "not ok", which indicates that the test case failed.
Yes.
The number represents the number of the test case or suite being performed. The first test case or suite must have the number 1 and the number must increase by 1 for each additional test case or result at the same level and within the same testing suite.
Yes, though parsers should handle ordering failures and missed test results (this is the purpose of the "plan" line).
The "-" character is optional.
The description is a description of the test, generally the name of the test, and can be any string of words (can't include #). The description is optional.
Yes, though the optional "- " is strictly part of the optional description.
It's mildly annoying that "-" is optional. It's trivial to deal with in the parser to just ignore it if it's present. But it has no semantic meaning whatsoever. IMHO it would be nice to either mandate it or remove it, for consistency's sake. This could be based solely on the consensus for whether it added or detracted from human readability, since parsers don't care.
The directive is used to indicate if a test was skipped. The format
The directive is a single word -- currently only "SKIP" is recognized by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and error. (Though I would love to hear what "error" is intended to be used for, if different from regular "not ok".)
ERROR usually means that there was some kind of failure in the testing process that makes the testcase result indeterminate. (like loss of connection to a server, or crash of the program executing the testcase). This can be due to a bug in the test or the test system. I'm, not sure if ERROR is used this way in kselftest. It is, IMHO, a useful distinction to make. TIMEOUT is a special case of ERROR, IMHO.
As an aside, I've never liked XFAIL or XPASS. Semantically, I believe they are quite confusing, and depend on the level of testcase execution. XPASS is also redundant. Isn't the default case always XPASS?
In the case of XFAIL, does it mean that an operation that was expected to fail all the time, failed, and so the test passed? Or does it mean that this is a known bug that a particular developer or site has decided to ignore (as in "we know this is a bug, but we expected it to fail, so just ignore it for now). The latter, I really don't think should be encoded in the tests themselves, but should be put into a layer above the test and parser, that is applied according to developer or testlab policy.
(This could also be called "context" rather than "directive".)
for the directive is: "# SKIP [<skip_description>]". The skip_description is optional and can be any string of words to describe why the test was skipped.
I would call this "directive details".
I would not, since it is human readable text, and may be present whether the directive is there or not, I would still call it 'diagnostic data'.
The result of the test case result line can be either "ok" or "not ok" if the skip directive is used. Finally, note that TAP 14 specification includes TODO directives but these are not supported for KTAP.
Examples of test case result lines:
Test passed:
ok 1 - test_case_name
Test was skipped:
not ok 1 - test_case_name # SKIP test_case_name should be skipped
Test failed:
not_ok 1 - test_case_name
This isn't valid. No "_" is recognized for "ok" vs "not ok".
Diagnostic lines
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
I think this would be a very good distinction to make. You might have unknown lines that (inadvertently?) start with a hash, but trying to make sure that the test itself always outputs diagnostic data with a preceding hash gives a reader potentially more information about where the message came from, and could be useful.
Diagnostic lines can be anywhere in the test output after the first two lines.
I don't see a reason for this strictness. They can be anywhere.
There are a few special diagnostic lines.
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
I agree it is handy to have the test name for sub-tests. However, I don't see any semantic difference whether we define sub-test declarations via diagnostic lines with special syntax or by creating a new syntax separate from diagnostic lines. Either way, we have a new syntax for parsers to understand.
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
How consistent is it for selftests to produce test summary lines?
Also, parsers should treat a new "TAP version..." line at the same nesting level as indication that the prior test has ended (and any tests without result lines should be considered failed).
Agreed.
Nested tests
The new specification for KTAP will support an arbitrary number of nested subtests. Thus, tests can now have subtests and those subtests can have subtests. This can be useful to further categorize tests and organize test results.
The new required format for a test with subtests consists of: a subtest header line, a plan line, all subtests, and a final test result line.
The first line of the test must be the subtest header line with the format: "# Subtest: <test_name>".
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
I agree that the sub-test should not know about this line. I thought the "parent" test wrote the line.
The second line of the test must be the plan line, which is formatted as "1..N", where N is the number of subtests.
Following the plan line, all lines pertaining to the subtests will follow.
Finally, the last line of the test is a final test result line with the format: "(ok|not ok) <number> [-] [<description>] [<directive>] [<diagnostic data>]", which follows the same format as the general test result lines described in this section. The result line should indicate the result of the subtests. Thus, if one of the subtests fail, the test should fail. The description in the final test result line should match the name of the test in the subtest header.
An example format:
KTAP version 1 1..1 # Subtest: test_suite 1..2 ok 1 - test_1 ok 2 - test_2 ok 1 - test_suite
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
I prefer spaces, but whatever. Is the prefix on the 'TAP version 13' line guaranteed to be the prefix on all lines for the (sub-)test?
An example format with multiple levels of nested testing:
KTAP version 1 1..1 # Subtest: test_suite 1..2 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 2 - test ok 1 - test_suite
In the instance that the plan line is missing, the end of the test will be denoted by the final result line containing a description that matches the name of the test given in the subtest header. Note that thus, if the plan line is missing and one of the subtests have a matching name to the test suite this will cause errors.
A plan line is required. No special cases are needed. :)
If nesting level is lost, a parser will understand the nested test to have ended, but probably so did the test runner:
TAP version 13 1..1 TAP version 13 1..2 TAP version 13 1..3 ok 1 - test_1 ok 2 test_2 not ok 1 - sub test unfinished plan ok 2 - test not ok 1 - test_suite
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
I strongly agree. You can't disambiguate some outputs, unless the indentation is present.
Whether this is "# " or " " I don't really care, as long as the change is coordinated. " " is easier for humans to read, but may make parsing of "unknown" lines more difficult for machines.
I prefer spaces for human readability.
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
I thought that some tests put the diagnostic data before the result line.
I don't care whether it is before or after (and indeed Fuego's parser can handle either case). But it can't handle it when some testcases put diagnostic data before the result line, and some testcases put it after, in the same test. There's no delimiter for the separation between the two different testcases diagnostic data.
Could we possibly have a rule about diagnostic lines positioning relative to the testcase result line? Maybe a rule like this: For a single test, diagnostic lines should either always preceded the testcase they are related to, or follow the testcase they are related to.
- TODO directive not allowed
I would just say "unrecognized".
- KTAP allows for an arbitrary number of tests to be nested with
specified nested test format
Yup.
Example of KTAP
KTAP version 1 1..1 # Subtest: test_suite 1..1 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 1 - test_suite
For a good example, please include all the possible combinations (SKIP, not ok, diagnostics, etc etc)
========================================= Note on incompatibilities with kselftests =========================================
To my knowledge, the above specification seems to generally accept the TAP format of many non-nested test results of kselftests.
An example of a common kselftests TAP format for non-nested test results that are accepted by the above specification:
TAP version 13 1..2 # selftests: vDSO: vdso_test_gettimeofday # The time is 1628024856.096879 ok 1 selftests: vDSO: vdso_test_gettimeofday # selftests: vDSO: vdso_test_getcpu # Could not find __vdso_getcpu ok 2 selftests: vDSO: vdso_test_getcpu # SKIP
However, one major difference noted with kselftests is the use of more directives than the "# SKIP" directive. kselftest also supports XPASS and XFAIL directives. Some additional examples found in kselftests:
not ok 5 selftests: netfilter: nft_concat_range.sh # TIMEOUT 45 seconds not ok 45 selftests: kvm: kvm_binary_stats_test # exit=127
Should the specification be expanded to include these directives?
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
Please No! (Unless I'm misinterpreting that line).
If you remove the '#', that makes the 'exit=127' part of the test description. The test description should be invariant between success and failure cases. The exit=127 (if I'm reading that line right) is part of data explaining or describing the failure. It would be 'exit=0' if the test succeeded.
In order to correlate test cases (ie show results over time), the description should not change from one invocation of the test to the next. At least Fuego uses the test description this way. That is, the testcase description is used as a testcase identifier, and if it changes from run to run things get confusing.
However, the general format for kselftests with nested test results seems to differ from the above specification. It seems that a general format for nested tests is as follows:
TAP version 13 1..2 # selftests: membarrier: membarrier_test_single_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 1 selftests: membarrier: membarrier_test_single_thread # selftests: membarrier: membarrier_test_multi_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 2 selftests: membarrier: membarrier_test_multi_thread
The major differences here, that do not match the above specification, are use of "# " as an indentation and using a TAP version line to denote a new test with subtests rather than the subtest header line described above. If these are widely utilized formats in kselftests, should we include both versions in the specification or should we attempt to agree on a single format for nested tests? I personally believe we should try to agree on a single format for nested tests. This would allow for a cleaner specification of KTAP and would reduce possible confusion.
We already use "# " and the nested TAP lines to denote subtests. Without a good reason to change it, we should avoid the churn with the existing parsers.
====
So what do people think about the above specification? How should we handle the differences with kselftests?
I'm probably a broken record by now, but kselftests _is_ the specification. ;) What about it needs changing, and why?
If this specification is accepted, where should the specification be documented?
I imagine some .rst file under Documentation/dev-tools/, linked to from kselftest.rst and kunit/...rst
-- Kees Cook
On Mon, Aug 30, 2021 at 10:34:14AM -0700, Brendan Higgins wrote:
On Sat, Aug 28, 2021 at 1:20 AM Kees Cook keescook@chromium.org wrote:
Please keep me in CC for these kinds of proposals. And thanks for looking
Oh sorry, we will make sure to do that in the future. I figured most people who were interested who weren't directly CC'ed were on the kselftest list.
I mainly use mailing list member as a way to have thread history available. I have too many subscriptions to read them all. :)
is that we already _have_ a specification, and it's already being parsed
We do?! Where is it? Can you link it here?
I am, perhaps, too glib. What I mean is that we have an _implementation_, and we cannot create a specification that doesn't match the existing implementation without very good reason.
by machines, so making changes without strong justification is going to be met with resistance. :) So, with that in mind, on to my reply...
Of course, but if there is no specification or it is not being universally followed, then changes will need to be made, no?
Sure, but I'm being pragmatic: the implementation is already operating. We can't pretend it doesn't exist. :)
The fundamental purpose of the kernel's TAP is to have many independent tests runnable and parseable, specifically without any embedded framework knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses (kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well as several hand-rolled instances in Shell, Python, and C. There used to be more, but I have been steadily fixing their syntax[2] and merging[3] separate implementations for a while now.
Yep! And I believe there are still some inconsistencies - hence part of the motivation.
Agreed -- I'm happy to create a specification around what we _have_ (and fix any existing mistakes). But the proposed spec very much did not match what exists.
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing parsers.
"TAP version Linux.1" would not break existing parsers? If so, that would be great, I just expect that it would break them too.
https://metacpan.org/dist/Test-Harness/source/lib/TAP/Parser.pm and https://github.com/python-tap/tappy/blob/master/tap/parser.py won't like anything non-numeric.
But neither have any idea about "#" nor nested tests either. (Those considerations are entirely from the ability to construct automated parsing of test-separated logs.)
LAVA doesn't parse the TAP line at all, as it chunks by known test names, doing a simple mapping of name to outcome for the LAVA results: https://github.com/Linaro/test-definitions/blob/master/automated/linux/kself...
So, to that end, I guess "KTAP version 1" is fine.
Plan lines
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested tests".
Why not? I got the impression on the previous thread that everyone was amenable to this?
Because it breaks kselftests's agnosticism. More details on this below...
The directive is a single word -- currently only "SKIP" is recognized by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and error. (Though I would love to hear what "error" is intended to be used for, if different from regular "not ok".)
Can you explain (or link to documentation) on what these other directives mean?
I can link to code. :)
SKIP: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
TIMEOUT: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
XFAIL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
error: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
xpass (not actually a directive, and nothing uses it, I say we remove it): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
"exit=..." (improperly formed directive -- this should probably be changed into a preceding diagnostic line) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
Besides the two kselftest TAP harnesses, there are a few open-coded TAP producers:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... (This last one has a very confused idea about where to put the plan line.)
Diagnostic lines
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
Interesting. I think David probably would have some thoughts here.
The intention with the current implementation is to be as robust as possible against other lines of output existing, since kselftest output ends up in weird places fairly regularly.
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
Cool, any ideas on what we should do then? I personally don't have strong feelings on it.
For this? I say leave it diagnostic. These are mainly for humans to see the results of a specific (usually long high-output) test run.
Also, parsers should treat a new "TAP version..." line at the same nesting level as indication that the prior test has ended (and any tests without result lines should be considered failed).
OK, that is definitely different. I think this will require some further discussion.
This is mostly about making good parsers. Parsers should be able to understand enough context to gracefully recover from unexpected output.
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
Well it doesn't have to, right? The parent test can include the directive. In anycase, something has to know enough to print a test plan.
What happens in kselftest is that the kselftest harness has no idea what is or isn't going to produce TAP output. That is an intentional design goal of kselftest: authors should be able to drop a tool into the tree that returns 0 for "success", and 1 for "failure". (The full set of return codes is actually: "0" for "success", "4" for "skip", "124" for "timeout", and anything else for failure.) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
The runner can see how many binaries it expects to run in a given directory, so it can build TAP output (with a plan line), for a given directory. However, some of those tests have tons of output and effectively have subtests -- but those binaries are _standalone_, in the sense that they produce TAP output without any knowledge of their surroundings. As such, the kselftest runner must indent them to make the results parsable.
Having different output when in one harness or another has and will lead to pain in comparing the results. So, a test run outside of kselftest will be (should be) identical to running inside kselftest, but with indentation.
We can _call_ this a subtest, but it's just nested. A subtest should not need to know it is a subtest.
If nesting level is lost, a parser will understand the nested test to have ended, but probably so did the test runner:
TAP version 13 1..1 TAP version 13 1..2 TAP version 13 1..3 ok 1 - test_1 ok 2 test_2 not ok 1 - sub test unfinished plan ok 2 - test not ok 1 - test_suite
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
KUnit cannot support indentation being required. We print test results to dmesg. At anytime something could get printed to dmesg that could potentially mess up indentation.
Can you explain this? KUnit controls what it prints, yes? Anything outside of the TAP output is an "unknown line". This is why I used "#" for indentation: it is distinguishable from "unknown (dmesg) lines".
The problem that was recognized, as I understand, was that there are multiple "interpretations" of TAP floating around the kernel and that goes against the original point of trying to standardize around TAP.
Right -- this is the part I disagree with. There is 1 interpretation, and a couple mistakes in the various kernel implementations. Let's get that documented first.
I know KUnit's usage is pretty minor in comparison to kselftest, but people do use it and there is no point in us, KUnit, purporting to use TAP and remain compatible with any particular version of it if it is incompatible with kselftest's TAP.
I have tried to keep the kernel's TAP output parsable by TAP 13 consumers. I don't think we have much of a fork currently (the diagnostic lines are a simple addition, and the nesting is trivially extractable with no additional knowledge beyond recognizing the "TAP" line within diagnostic lines).
Additionally, there is no way that we are going to be able to stay on a compatible implementation of TAP unless we specify what TAP is separate from the implementation.
My goals are:
- have kernel test output (and nest test output) parsable by existing tools, with a strong desire to not break LAVA (though arguably LAVA isn't a TAP parser).
- have kernel test output distinguishable from unrelated dmesg output since kselftest is very frequently collected from the console.
- not needlessly break existing tooling without showing a meaningful benefit. (i.e. flag days in parsing under LAVA has been difficult, and I want to minimize those.)
- be able to map meaningful test execution state to a common set of states ("ok" and "not ok" are rarely sufficient), which includes timeouts, expected failures (e.g. missing CONFIG_*s, wrong arch, etc).
- be able to include diagnostic runtime output _in the test result_, because too frequently the binary yes/no of a test is way to far away from the actual test details, which make debugging extremely difficult.
On Mon, Aug 30, 2021 at 05:48:07PM +0000, Tim.Bird@sony.com wrote:
From: Kees Cook keescook@chromium.org
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
IMHO it's useful to keep this called 'diagnostic data'.
The directive is optional, and having this be diagnostic data means you can put explanatory text, or result details, that are independent of the description, on the same line as the result. This is convenient in some cases.
I think diagnostic data should always be prefixed by a hash sign. Or, in the case where diagnostic data appears on the result line, the hash serves as the delimiter. This means that the description text must not include a hash character.
For example: ok 5 - check return code # rcode=0 or not ok 5 - check return code # rcode=17
The directive is currently used to answer the question "why?" as in, "why is this test not ok?" ("a timeout occurred") or "why is this test ok?" ("a SKIP")
The "supporting evidence" for the "why?" answer are specific details about that: "the exit was was NNN" or "the timeout was YYY seconds".
Test diagnostics are all the lines preceding that (currently prefixed with a "#").
Yes, though the optional "- " is strictly part of the optional description.
It's mildly annoying that "-" is optional. It's trivial to deal with in the parser to just ignore it if it's present. But it has no semantic meaning whatsoever. IMHO it would be nice to either mandate it or remove it, for consistency's sake. This could be based solely on the consensus for whether it added or detracted from human readability, since parsers don't care.
I have no strong opinion on the "-". I was surprised to encounter it as it's not part of the TAP 13 spec. I would prefer to drop it, if I had to choose.
ERROR usually means that there was some kind of failure in the testing process that makes the testcase result indeterminate. (like loss of connection to a server, or crash of the program executing the testcase). This can be due to a bug in the test or the test system. I'm, not sure if ERROR is used this way in kselftest. It is, IMHO, a useful distinction to make. TIMEOUT is a special case of ERROR, IMHO.
Well, I think TIMEOUT is a "why not ok?" answer. "not ok" seems to be the same as ERROR, but I guess it's really a catch-all, but must come with the "evidence" portion:
not ok 1 description goes here # TIMEOUT 45 seconds with no results not ok 1 description goes here # ERROR lost connection to server not ok 1 description goes here # ERROR non-zero exit code: NNN
I might go so far as to say the directive is _required_ for the "not ok" state.
As an aside, I've never liked XFAIL or XPASS. Semantically, I believe they are quite confusing, and depend on the level of testcase execution. XPASS is also redundant. Isn't the default case always XPASS?
Nothing currently uses XPASS, and we should remove it.
XFAIL capture the condition of the test logic in that a test for something arm64-specific on a non-arm64 machine is _not_ a "pass". It will fail. But it's an expected failure. And, as a directive, must come with the evidence bit:
not ok 1 PAN catches bad accessg # XFAIL PAN CPU feature is arm64 only
As a test author, I want to see this as distinct from failure and success.
In the case of XFAIL, does it mean that an operation that was expected to fail all the time, failed, and so the test passed? Or does it mean that this is a known bug that a particular developer or site has decided to ignore (as in "we know this is a bug, but we expected it to fail, so just ignore it for now). The latter, I really don't think should be encoded in the tests themselves, but should be put into a layer above the test and parser, that is applied according to developer or testlab policy.
I agree that "just ignore it for now" shouldn't be there.
Other arguments are that xfail tests shouldn't be run at all, but I don't think that's right because it makes it very hard to do large-scale test result comparisons because random stuff is missing, depending on various configs, archs, etc. It blocks a 1-to-1 comparison, and begs questions like "why is the test here in one case and not in another?" where as an XFAIL will encode the reason it is an XFAIL in its output, providing a self-documenting test result.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
I think this would be a very good distinction to make. You might have unknown lines that (inadvertently?) start with a hash, but trying to make sure that the test itself always outputs diagnostic data with a preceding hash gives a reader potentially more information about where the message came from, and could be useful.
Right, so lines with a hash shouldn't be machine-parsed. (And this is a mistake I made in the current kselftest output for the LAVA parser.) The good news for the kernel, is that nothing else in dmesg leads with a "#" currently. :)
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
I agree it is handy to have the test name for sub-tests. However, I don't see any semantic difference whether we define sub-test declarations via diagnostic lines with special syntax or by creating a new syntax separate from diagnostic lines. Either way, we have a new syntax for parsers to understand.
Pragmatically, I agree, which is what lead me to my mistake. However, in reconsidering it, I realize this leads to a slippery slope of just dumping stuff into diagnostic lines and pretending nothing will read it.
I suspect what we need is an optional "starting test" line, like:
test N DESC # diag... # diag... # diag... ok N
The nesting looks like:
TAP version 13 1..3 test 1 seccomp # TAP version 13 # 1..27 # test 1 prctl # ok 1 # test 2 strict # ok 2 # ... # test 27 user_notif # # eek missing CONFIG_.... # not ok 27 user_notif # ERROR can't use ioctl not ok 1 seccomp # ERROR non-zero exit code 1 test 2 netfilter ok 2 netfilter test 3 bpf # exciting things happen ok 3 bpf
This is what we have now except basically just replacing "# $name" with "test $num $name"
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
How consistent is it for selftests to produce test summary lines?
Any of the tools built with kselftest.h will spit it out. (Which is to say several of the "subtests".) The bulk of kselftest output is from the runner, which doesn't produce this summary.
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
I agree that the sub-test should not know about this line. I thought the "parent" test wrote the line.
The suggestion was that the subtest should _not_ emit the "TAP" line, which is what I strongly disagree with: it should be a self-contained TAP-emitting test.
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
I prefer spaces, but whatever. Is the prefix on the 'TAP version 13' line guaranteed to be the prefix on all lines for the (sub-)test?
That's the current implementation, yes.
As for tabs vs spaces, I think it's fine to swap "# " for " ". We just need to patch LAVA to deal with it.
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
I thought that some tests put the diagnostic data before the result line.
I don't care whether it is before or after (and indeed Fuego's parser can handle either case). But it can't handle it when some testcases put diagnostic data before the result line, and some testcases put it after, in the same test. There's no delimiter for the separation between the two different testcases diagnostic data.
Right. The current implementation puts it always before. TAP 13 (14?) has the YAML _after_ the results line, but that's totally unworkable for kselftest, which is designed to be first human readable and second machine readable. Waiting to see diagnostics from a slow test is terrible -- we can't collect it and spit it out later. Also, a parser would have no idea about when a test is finished if diagnostics follow the last test result of a plan.
Making it "always before" is fine by me, and the motivation for making the "summary" lines be non-diagnostics because they technically followed the last test result.
Could we possibly have a rule about diagnostic lines positioning relative to the testcase result line? Maybe a rule like this: For a single test, diagnostic lines should either always preceded the testcase they are related to, or follow the testcase they are related to.
For the kernel, I would rather avoid anything after because that just confuses things.
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
Please No! (Unless I'm misinterpreting that line).
If you remove the '#', that makes the 'exit=127' part of the test description. The test description should be invariant between success and failure cases. The exit=127 (if I'm reading that line right) is part of data explaining or describing the failure. It would be 'exit=0' if the test succeeded.
In order to correlate test cases (ie show results over time), the description should not change from one invocation of the test to the next. At least Fuego uses the test description this way. That is, the testcase description is used as a testcase identifier, and if it changes from run to run things get confusing.
Agreed, I think it should be part of an ERROR directive:
not ok 14 awesome test # ERROR exit=127
-----Original Message----- From: Kees Cook keescook@chromium.org
On Mon, Aug 30, 2021 at 05:48:07PM +0000, Tim.Bird@sony.com wrote:
From: Kees Cook keescook@chromium.org
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
IMHO it's useful to keep this called 'diagnostic data'.
The directive is optional, and having this be diagnostic data means you can put explanatory text, or result details, that are independent of the description, on the same line as the result. This is convenient in some cases.
I think diagnostic data should always be prefixed by a hash sign. Or, in the case where diagnostic data appears on the result line, the hash serves as the delimiter. This means that the description text must not include a hash character.
For example: ok 5 - check return code # rcode=0 or not ok 5 - check return code # rcode=17
The directive is currently used to answer the question "why?" as in, "why is this test not ok?" ("a timeout occurred") or "why is this test ok?" ("a SKIP")
The "supporting evidence" for the "why?" answer are specific details about that: "the exit was was NNN" or "the timeout was YYY seconds".
Test diagnostics are all the lines preceding that (currently prefixed with a "#").
Yes, though the optional "- " is strictly part of the optional description.
It's mildly annoying that "-" is optional. It's trivial to deal with in the parser to just ignore it if it's present. But it has no semantic meaning whatsoever. IMHO it would be nice to either mandate it or remove it, for consistency's sake. This could be based solely on the consensus for whether it added or detracted from human readability, since parsers don't care.
I have no strong opinion on the "-". I was surprised to encounter it as it's not part of the TAP 13 spec. I would prefer to drop it, if I had to choose.
ERROR usually means that there was some kind of failure in the testing process that makes the testcase result indeterminate. (like loss of connection to a server, or crash of the program executing the testcase). This can be due to a bug in the test or the test system. I'm, not sure if ERROR is used this way in kselftest. It is, IMHO, a useful distinction to make. TIMEOUT is a special case of ERROR, IMHO.
Well, I think TIMEOUT is a "why not ok?" answer. "not ok" seems to be the same as ERROR, but I guess it's really a catch-all, but must come with the "evidence" portion:
'not ok' should not mean the same thing as ERROR. 'not ok' means the testcase was run successfully, and the result is that the test determined that the testcase failed. ERROR means that the testcase was not run successfully, so no result can be assumed.
But ERROR is confusing. Maybe there's a better word for the concept that is less likely to be conflated with the result.
not ok 1 description goes here # TIMEOUT 45 seconds with no results not ok 1 description goes here # ERROR lost connection to server not ok 1 description goes here # ERROR non-zero exit code: NNN
I might go so far as to say the directive is _required_ for the "not ok" state.
As I said above, I don't think they are the same. I would reserve ERROR for the case where a test error occurred and the test case result could not be obtained.
If we go to the trouble of making our own spec, maybe it would be better to support a third result value: 'unknown'. This would break all the TAP parsers, but it's much clearer, IMHO.
'ok 1 description' = test case executed successfully, and the testcase condition passed 'not ok 1 description' = test case executed successfully, and the testcase condition failed 'unknown 1 description # SKIP # reason' = test case does not apply for some reason 'unknown 1 description # ERROR # reason' = test case could not be executed, for some reason
In the case of SKIP and ERROR, the result should be ignored as it is either inapplicable or wasn't obtained.
I remember we had a discussion about whether SKIP should have an "ok" or "not ok" result. IMHO it doesn't matter, because the result has no meaning when a SKIP directive is used. (Although it's useful for humans to see why something was skipped.)
As an aside, I've never liked XFAIL or XPASS. Semantically, I believe they are quite confusing, and depend on the level of testcase execution. XPASS is also redundant. Isn't the default case always XPASS?
Nothing currently uses XPASS, and we should remove it.
XFAIL capture the condition of the test logic in that a test for something arm64-specific on a non-arm64 machine is _not_ a "pass". It will fail. But it's an expected failure. And, as a directive, must come with the evidence bit:
not ok 1 PAN catches bad accessg # XFAIL PAN CPU feature is arm64 only
As a test author, I want to see this as distinct from failure and success.
I agree that this case is different from a normal failure.
In the example you provide of a feature that applies to only specific architectures, I would use a SKIP instead. It's not that the testcase failed. It's that it does not even make sense for the device under test.
Whether something is 1) a testcase failure or 2) a testcase that does not apply is a matter of perspective. Is it the case that the feature should exist on all architectures, and it's just not been written yet? I could call this a 'not ok' or an XFAIL. However, if it's a feature that is never intended to be written for arm32 or i386, I would report that as a SKIP.
But all of this differs from what I originally described, which is labs using XFAIL to annotate bugs that they knew about but didn't intend to fix.
In the case of XFAIL, does it mean that an operation that was expected to fail all the time, failed, and so the test passed? Or does it mean that this is a known bug that a particular developer or site has decided to ignore (as in "we know this is a bug, but we expected it to fail, so just ignore it for now). The latter, I really don't think should be encoded in the tests themselves, but should be put into a layer above the test and parser, that is applied according to developer or testlab policy.
I agree that "just ignore it for now" shouldn't be there.
Other arguments are that xfail tests shouldn't be run at all, but I don't think that's right because it makes it very hard to do large-scale test result comparisons because random stuff is missing, depending on various configs, archs, etc. It blocks a 1-to-1 comparison, and begs questions like "why is the test here in one case and not in another?"
I completely agree with this. It really messes up tables of results when the list of testcases varies depending on some board or configuration attribute. It's very hard for users to parse and compare results in that case.
where as an XFAIL will encode the reason it is an XFAIL in its output, providing a self-documenting test result.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
I think this would be a very good distinction to make. You might have unknown lines that (inadvertently?) start with a hash, but trying to make sure that the test itself always outputs diagnostic data with a preceding hash gives a reader potentially more information about where the message came from, and could be useful.
Right, so lines with a hash shouldn't be machine-parsed. (And this is a mistake I made in the current kselftest output for the LAVA parser.) The good news for the kernel, is that nothing else in dmesg leads with a "#" currently. :)
Oh. That's actually nice to know. Interesting.
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
I agree it is handy to have the test name for sub-tests. However, I don't see any semantic difference whether we define sub-test declarations via diagnostic lines with special syntax or by creating a new syntax separate from diagnostic lines. Either way, we have a new syntax for parsers to understand.
Pragmatically, I agree, which is what lead me to my mistake. However, in reconsidering it, I realize this leads to a slippery slope of just dumping stuff into diagnostic lines and pretending nothing will read it.
Good point. Whether it starts with a hash or not, TAP 13 parsers should ignore anything that is not a result line (starts with 'ok' or 'not ok'). So I guess we are free to make a new syntax without upsetting any existing parsers.
I suspect what we need is an optional "starting test" line, like:
test N DESC # diag... # diag... # diag... ok N
The nesting looks like:
TAP version 13 1..3 test 1 seccomp # TAP version 13 # 1..27 # test 1 prctl # ok 1 # test 2 strict # ok 2 # ... # test 27 user_notif # # eek missing CONFIG_.... # not ok 27 user_notif # ERROR can't use ioctl not ok 1 seccomp # ERROR non-zero exit code 1 test 2 netfilter ok 2 netfilter test 3 bpf # exciting things happen ok 3 bpf
This is what we have now except basically just replacing "# $name" with "test $num $name"
Actually, I like this idea. Including the number is a nice touch. I think that will help to distinguish such a line from other 'unknown line' output. I don't know if I'd leave the description off of the result line, though. A lot of parsers are looking for that.
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
How consistent is it for selftests to produce test summary lines?
Any of the tools built with kselftest.h will spit it out. (Which is to say several of the "subtests".) The bulk of kselftest output is from the runner, which doesn't produce this summary.
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
I agree that the sub-test should not know about this line. I thought the "parent" test wrote the line.
The suggestion was that the subtest should _not_ emit the "TAP" line, which is what I strongly disagree with: it should be a self-contained TAP-emitting test.
OK - agreed. Yes. The sub-test should not know whether they are running standalone or nested (with the exception of some line prefix opaquely conveyed to them)
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
I prefer spaces, but whatever. Is the prefix on the 'TAP version 13' line guaranteed to be the prefix on all lines for the (sub-)test?
That's the current implementation, yes.
As for tabs vs spaces, I think it's fine to swap "# " for " ". We just need to patch LAVA to deal with it.
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
I thought that some tests put the diagnostic data before the result line.
I don't care whether it is before or after (and indeed Fuego's parser can handle either case). But it can't handle it when some testcases put diagnostic data before the result line, and some testcases put it after, in the same test. There's no delimiter for the separation between the two different testcases diagnostic data.
Right. The current implementation puts it always before. TAP 13 (14?) has the YAML _after_ the results line, but that's totally unworkable for kselftest, which is designed to be first human readable and second machine readable. Waiting to see diagnostics from a slow test is terrible -- we can't collect it and spit it out later. Also, a parser would have no idea about when a test is finished if diagnostics follow the last test result of a plan.
Making it "always before" is fine by me, and the motivation for making the "summary" lines be non-diagnostics because they technically followed the last test result.
Could we possibly have a rule about diagnostic lines positioning relative to the testcase result line? Maybe a rule like this: For a single test, diagnostic lines should either always preceded the testcase they are related to, or follow the testcase they are related to.
For the kernel, I would rather avoid anything after because that just confuses things.
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
Please No! (Unless I'm misinterpreting that line).
If you remove the '#', that makes the 'exit=127' part of the test description. The test description should be invariant between success and failure cases. The exit=127 (if I'm reading that line right) is part of data explaining or describing the failure. It would be 'exit=0' if the test succeeded.
In order to correlate test cases (ie show results over time), the description should not change from one invocation of the test to the next. At least Fuego uses the test description this way. That is, the testcase description is used as a testcase identifier, and if it changes from run to run things get confusing.
Agreed, I think it should be part of an ERROR directive:
not ok 14 awesome test # ERROR exit=127
As long as the line above means that testcase '14 awesome test' could not be run, and exit=127 helps diagnose that, then I agree. If 'exit=127' means that testcase '14 awesome test' was run and exit=127 was a failing result, I would say that the result line should just be: not ok 14 awesome test # exit=127
-- Tim Bird
-----Original Message----- From: Kees Cook keescook@chromium.org
On Mon, Aug 30, 2021 at 05:48:07PM +0000, Tim.Bird@sony.com wrote:
From: Kees Cook keescook@chromium.org
...
Yes, though the optional "- " is strictly part of the optional description.
It's mildly annoying that "-" is optional. It's trivial to deal with in the parser to just ignore it if it's present. But it has no semantic meaning whatsoever. IMHO it would be nice to either mandate it or remove it, for consistency's sake. This could be based solely on the consensus for whether it added or detracted from human readability, since parsers don't care.
I have no strong opinion on the "-". I was surprised to encounter it as it's not part of the TAP 13 spec. I would prefer to drop it, if I had to choose.
The TAP 13 specification does not mention "-", but a dash on the result line is used in most of the examples shown in the specification here: http://testanything.org/tap-specification.html
In the top two examples on that page, the first one does not use dashes and the second one does. It's kind of irritating to have that kind of loosey-goosey syntax in a specification. IMHO the syntax should be more strictly specified than this. I don't have a strong opinion either on whether to use dashes or not. But it would be nice to make it consistent. -- Tim Bird
On Sat, Aug 28, 2021 at 4:20 PM Kees Cook keescook@chromium.org wrote:
[+kernelci, +Guillaume]
Hi!
Please keep me in CC for these kinds of proposals. And thanks for looking at this again! Please understand that while I may be coming across as rather negative here, I would like to have a rational and documented specification for the kernel's test output, too. My main objection here is that we already _have_ a specification, and it's already being parsed by machines, so making changes without strong justification is going to be met with resistance. :) So, with that in mind, on to my reply...
On Tue, Aug 10, 2021 at 04:25:03PM -0700, Rae Moar wrote:
We are looking to further standardise the output format used by kernel test frameworks like kselftest and KUnit. Thus far we have used the TAP (Test Anything Protocol) specification, but it has been extended in many different ways, so we would like to agree on a common "Kernel TAP" (KTAP) format to resolve these differences. Thus, below is a draft of a specification of KTAP. Note that this specification is largely based on the current format of test results for KUnit tests.
The kernel's largest producer of TAP is kselftest, and the largest consumer is LAVA[1]. I would want buy-in from at least those responsible for either side of those two things. (And I'm one of the people working on both sides of it.)
The fundamental purpose of the kernel's TAP is to have many independent tests runnable and parseable, specifically without any embedded framework knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses (kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well as several hand-rolled instances in Shell, Python, and C. There used to be more, but I have been steadily fixing their syntax[2] and merging[3] separate implementations for a while now.
[1] https://github.com/Linaro/test-definitions/commit/8bd338bbcfa5a03efcf1d12e25... [2] https://git.kernel.org/linus/b0df366bbd701c45e93af0dcb87ce22398589d1d [3] https://git.kernel.org/linus/e80068be21824e4d2726eeea41cac6178d212415
Thanks a lot for doing this, by the way!
I think many of the issues here stem from the original TAP spec having been insufficient for kernel stuff, and a bit of divergent evolution having occurred between kselftest, KUnit, and the dormant TAP 14 spec. This proposed spec does approach things more from the KUnit side, just because that's what we're more familiar with, but I agree that kselftest and LAVA are the bigger fish in this pond. KUnit's parser has also been a bit stricter in what it accepts, and the TAP producing code is shared between all of the KUnit tests, which makes prototyping changes a bit easier.
Fortunately, most of these differences seem pretty minor in the grand scheme of things, so I'm sure we can adapt this spec to fit what kselftest is doing better, while still leaving enough of the structure the KUnit tooling requires.
In any case, here are some of my initial thoughts:
Additionally, this specification was heavily inspired by the KTAP specification draft by Tim Bird (https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD8...). However, there are some notable differences to his specification. One such difference is the format of nested tests is more fully specified in the following specification. However, they are specified in a way which may not be compatible with many kselftest nested tests.
I commented extensively on that thread. :)
===================== Specification of KTAP =====================
TAP, or the Test Anything Protocol is a format for specifying test results used by a number of projects. It's website and specification are found at: https://testanything.org/. The Linux Kernel uses TAP output for test results. However, KUnit (and other Kernel testing frameworks such as kselftest) have some special needs for test results which don't gel perfectly with the original TAP specification. Thus, a "Kernel TAP" (KTAP) format is specified to extend and alter TAP to support these use-cases.
KTAP Output consists of 5 major elements (all line-based):
- The version line
- Plan lines
- Test case result lines
These are required.
- Diagnostic lines
This is optional.
- A bail out line
Bail out should be optional, and possibly not included at all. (See below.)
Yeah, I'd happily jettison this totally if no-one's using it.
An important component in this specification of KTAP is the specification of the format of nested tests. This can be found in the section on nested tests below.
The version line
The first line of KTAP output must be the version line. As this specification documents the first version of KTAP, the recommended version line is "KTAP version 1". However, since all kernel testing frameworks use TAP version lines, "TAP version 14" and "TAP version 13" are all acceptable version lines. Version lines with other versions of TAP or KTAP will not cause the parsing of the test results to fail but it will produce an error.
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing parsers.
I'd be okay with this. I personally think "KTAP" better describes how much of a deviation this is turning out to be from the official TAP specs, but if this fixes more parsers than it breaks, it's worth doing.
Plan lines
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested tests".
I don't think there's fundamentally a distinction worth preserving here. "Subtests" is the language used by the TAP 14 spec, but if we want to reframe it as nested tests, that's not a problem (and I agree 'subtests' is a little misleading here).
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
I suggest:
<result> <number>[ [- ]<description>][# <directive> [<directive details>]]
The result can be either "ok", which indicates the test case passed, or "not ok", which indicates that the test case failed.
Yes.
The number represents the number of the test case or suite being performed. The first test case or suite must have the number 1 and the number must increase by 1 for each additional test case or result at the same level and within the same testing suite.
Yes, though parsers should handle ordering failures and missed test results (this is the purpose of the "plan" line).
For the record, KUnit's parser prints an error for each test with the "wrong number", but still reports its result.
The "-" character is optional.
The description is a description of the test, generally the name of the test, and can be any string of words (can't include #). The description is optional.
Yes, though the optional "- " is strictly part of the optional description.
I think the goal of specifying the "- " explicitly is that it's very ugly to have test names/descriptions start with a "-" if one is used.
The directive is used to indicate if a test was skipped. The format
The directive is a single word -- currently only "SKIP" is recognized by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and error. (Though I would love to hear what "error" is intended to be used for, if different from regular "not ok".)
(This could also be called "context" rather than "directive".)
The name here matches the TAP spec, but I don't think we need to be too beholden to it.
Personally, I think it'd make more sense to have SKIP, XFAIL, etc be different statuses like "ok" and "not ok", rather than a directive after the name, but that's possibly too radical a departure from TAP.
for the directive is: "# SKIP [<skip_description>]". The skip_description is optional and can be any string of words to describe why the test was skipped.
I would call this "directive details".
Agreed.
The result of the test case result line can be either "ok" or "not ok" if the skip directive is used. Finally, note that TAP 14 specification includes TODO directives but these are not supported for KTAP.
Examples of test case result lines:
Test passed:
ok 1 - test_case_name
Test was skipped:
not ok 1 - test_case_name # SKIP test_case_name should be skipped
Test failed:
not_ok 1 - test_case_name
This isn't valid. No "_" is recognized for "ok" vs "not ok".
Diagnostic lines
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
I'm in two minds about this. On one hand, I totally agree that there's a useful distinction here, and that a lot of dmesg output can get nondeterministically mixed in. On the other, there are some tests for which the dmesg output which gets mixed in is important output which shouldn't be stripped out (for example, the KUnit KASAN tests print KASAN reports, which we want to treat as "diagnostics" because they're useful to understand what's happening in the tests.
That being said, there's nothing actually stopping us from maintaining this distinction, and just having the KUnit parser treat "unknown" lines the same as "diagnostic" lines, while maintaining the distinction in theory.
Diagnostic lines can be anywhere in the test output after the first two lines.
I don't see a reason for this strictness. They can be anywhere.
I think this was just there because because we wanted to have room for the TAP version header (or the 'Subtest' line in this spec/TAP14) and the test plan, which needed to be the first two lines.
There are a few special diagnostic lines.
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
Yeah: the problem here is that we do have a pretty random mix of lines starting with "#" which need parsing, and those which don't. Some of this is inherited from the TAP or TAP14 spec. Using "# " for nesting makes this a bit confusing as well.
Maybe this just needs reframing to something like "lines starting with '#' which are not parseable as any of the following are diagnostic lines", rather than the current implication that it's "anything that starts with '#'".
Or we could try to clean up the mess and get rid of the leading '#' everywhere else, but that's also a fair bit of work.
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
KUnit has just added a similar line, though it's optional, and is mainly used as a way of providing sums of nested tests at different levels: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/co...
It's currently optional, though, so we might want to change that if we rely on it.
In general, though, I agree that "Bail out!" is pretty useless. I suspect it's here mostly because "it was in the TAP spec" rather than "it's something we actually want to use".
Also, parsers should treat a new "TAP version..." line at the same nesting level as indication that the prior test has ended (and any tests without result lines should be considered failed).
I do like this idea as a way of making it possible to just cat test results together and get a sensibly merged result. I think we want to count any tests without result lines marked as "error" (test execution didn't complete/crashed/etc), rather than "failed", but that's just a detail.
Nested tests
The new specification for KTAP will support an arbitrary number of nested subtests. Thus, tests can now have subtests and those subtests can have subtests. This can be useful to further categorize tests and organize test results.
The new required format for a test with subtests consists of: a subtest header line, a plan line, all subtests, and a final test result line.
The first line of the test must be the subtest header line with the format: "# Subtest: <test_name>".
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
This originally came from the draft TAP 14 spec here: https://github.com/TestAnything/testanything.github.io/pull/36/files
On the KUnit side, we were labouring under the impression TAP 14 would be ratified and implemented, so a lot of TAP 14 things ended up in our implementation, including this line. Given that it stalled a long time ago and no-one seems to care, I don't think that compatibility with it needs to be an explicit goal any more, but it is still something that KUnit uses at the moment.
The other thing worth noting is that, in KUnit, while nested test support is limited, the framework does need to keep track of the nesting depth in order to print the indentation.
The second line of the test must be the plan line, which is formatted as "1..N", where N is the number of subtests.
Following the plan line, all lines pertaining to the subtests will follow.
Finally, the last line of the test is a final test result line with the format: "(ok|not ok) <number> [-] [<description>] [<directive>] [<diagnostic data>]", which follows the same format as the general test result lines described in this section. The result line should indicate the result of the subtests. Thus, if one of the subtests fail, the test should fail. The description in the final test result line should match the name of the test in the subtest header.
An example format:
KTAP version 1 1..1 # Subtest: test_suite 1..2 ok 1 - test_1 ok 2 - test_2 ok 1 - test_suite
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
An example format with multiple levels of nested testing:
KTAP version 1 1..1 # Subtest: test_suite 1..2 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 2 - test ok 1 - test_suite
In the instance that the plan line is missing, the end of the test will be denoted by the final result line containing a description that matches the name of the test given in the subtest header. Note that thus, if the plan line is missing and one of the subtests have a matching name to the test suite this will cause errors.
A plan line is required. No special cases are needed. :)
If nesting level is lost, a parser will understand the nested test to have ended, but probably so did the test runner:
TAP version 13 1..1 TAP version 13 1..2 TAP version 13 1..3 ok 1 - test_1 ok 2 test_2 not ok 1 - sub test unfinished plan ok 2 - test not ok 1 - test_suite
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
Whether this is "# " or " " I don't really care, as long as the change is coordinated. " " is easier for humans to read, but may make parsing of "unknown" lines more difficult for machines.
As one of the people who has an irrational distaste for syntactically significant indentation, I'd prefer using something like the TAP version header or the test plan as a way of indicating nesting. The TAP 14 spec says that they should be indented four spaces, but "# " is probably better, even if it doesn't gel perfectly with my mental model of "#" lines not needing parsing (which went out the window ages ago).
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
- TODO directive not allowed
I would just say "unrecognized".
I think this is actually supported in at least some versions of TAP...? So maybe this isn't a difference at all.
- KTAP allows for an arbitrary number of tests to be nested with
specified nested test format
Yup.
Again, I think this is actually a difference between TAP14/KTAP and KUnit's current logic. TAP 14 does support nested tests using this format, but KUnit only supported one level of nesting (Suite->Test).
Example of KTAP
KTAP version 1 1..1 # Subtest: test_suite 1..1 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 1 - test_suite
For a good example, please include all the possible combinations (SKIP, not ok, diagnostics, etc etc)
========================================= Note on incompatibilities with kselftests =========================================
To my knowledge, the above specification seems to generally accept the TAP format of many non-nested test results of kselftests.
An example of a common kselftests TAP format for non-nested test results that are accepted by the above specification:
TAP version 13 1..2 # selftests: vDSO: vdso_test_gettimeofday # The time is 1628024856.096879 ok 1 selftests: vDSO: vdso_test_gettimeofday # selftests: vDSO: vdso_test_getcpu # Could not find __vdso_getcpu ok 2 selftests: vDSO: vdso_test_getcpu # SKIP
However, one major difference noted with kselftests is the use of more directives than the "# SKIP" directive. kselftest also supports XPASS and XFAIL directives. Some additional examples found in kselftests:
not ok 5 selftests: netfilter: nft_concat_range.sh # TIMEOUT 45 seconds not ok 45 selftests: kvm: kvm_binary_stats_test # exit=127
Should the specification be expanded to include these directives?
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
However, the general format for kselftests with nested test results seems to differ from the above specification. It seems that a general format for nested tests is as follows:
TAP version 13 1..2 # selftests: membarrier: membarrier_test_single_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 1 selftests: membarrier: membarrier_test_single_thread # selftests: membarrier: membarrier_test_multi_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 2 selftests: membarrier: membarrier_test_multi_thread
The major differences here, that do not match the above specification, are use of "# " as an indentation and using a TAP version line to denote a new test with subtests rather than the subtest header line described above. If these are widely utilized formats in kselftests, should we include both versions in the specification or should we attempt to agree on a single format for nested tests? I personally believe we should try to agree on a single format for nested tests. This would allow for a cleaner specification of KTAP and would reduce possible confusion.
We already use "# " and the nested TAP lines to denote subtests. Without a good reason to change it, we should avoid the churn with the existing parsers.
Like with the "Subtests" line, this came from the draft TAP 14 spec, via KUnit. Personally, I prefer the whitespace indent from a readability point of view, and because it gels a bit better with my idea of what the "#" was supposed to represent: a non-parseable additional diagnostic, though that's not consistently been the case for ages, so I don't care too much.
====
So what do people think about the above specification? How should we handle the differences with kselftests?
I'm probably a broken record by now, but kselftests _is_ the specification. ;) What about it needs changing, and why?
I think there's value in describing the spec more explicitly, particularly since there's also the TAP spec, TAP 14 draft spec, and KUnit's implementation, all of which are different in some ways. And while a lot of the implicit spec can be gleaned from reading example kselftest tests, that only tells you what is valid, not what isn't, which makes writing parsers a bit trickier.
If this specification is accepted, where should the specification be documented?
I imagine some .rst file under Documentation/dev-tools/, linked to from kselftest.rst and kunit/...rst
Agreed.
Hello everyone! Thank you for all of your comments! I am glad to see some discussion on this email. First of all, my primary goal with this email is to advocate for a documented specification for the kernel's test output. I presented my first email largely from the perspective of KUnit and thus, I am not surprised there are points of contention in the proposed specification. That being said my comments are as follows:
On Sat, Aug 28, 2021 at 4:20 AM Kees Cook keescook@chromium.org wrote:
[+kernelci, +Guillaume]
Hi!
Please keep me in CC for these kinds of proposals.
Very sorry for not including you as a CC. I will be sure to do this in the future.
And thanks for looking at this again! Please understand that while I may be coming across as rather negative here, I would like to have a rational and documented specification for the kernel's test output, too. My main objection here is that we already _have_ a specification,
I totally agree that the documented specification should generally follow the agreed upon formats for kernel test results. However, there seems to be some prominent differences in the formats of test results between KUnit and kselftests, as well as within kselftests (requiring test plan line, use of different directives). So I am primarily advocating for the specification to be agreed upon and documented.
and it's already being parsed by machines, so making changes without strong justification is going to be met with resistance. :) So, with that in mind, on to my reply...
On Tue, Aug 10, 2021 at 04:25:03PM -0700, Rae Moar wrote:
We are looking to further standardise the output format used by kernel test frameworks like kselftest and KUnit. Thus far we have used the TAP (Test Anything Protocol) specification, but it has been extended in many different ways, so we would like to agree on a common "Kernel TAP" (KTAP) format to resolve these differences. Thus, below is a draft of a specification of KTAP. Note that this specification is largely based on the current format of test results for KUnit tests.
The kernel's largest producer of TAP is kselftest, and the largest consumer is LAVA[1]. I would want buy-in from at least those responsible for either side of those two things. (And I'm one of the people working on both sides of it.)
The fundamental purpose of the kernel's TAP is to have many independent tests runnable and parseable, specifically without any embedded framework knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses (kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well as several hand-rolled instances in Shell, Python, and C. There used to be more, but I have been steadily fixing their syntax[2] and merging[3] separate implementations for a while now.
[1] https://github.com/Linaro/test-definitions/commit/8bd338bbcfa5a03efcf1d12e25... [2] https://git.kernel.org/linus/b0df366bbd701c45e93af0dcb87ce22398589d1d [3] https://git.kernel.org/linus/e80068be21824e4d2726eeea41cac6178d212415
Additionally, this specification was heavily inspired by the KTAP specification draft by Tim Bird (https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD8...). However, there are some notable differences to his specification. One such difference is the format of nested tests is more fully specified in the following specification. However, they are specified in a way which may not be compatible with many kselftest nested tests.
I commented extensively on that thread. :)
===================== Specification of KTAP =====================
TAP, or the Test Anything Protocol is a format for specifying test results used by a number of projects. It's website and specification are found at: https://testanything.org/. The Linux Kernel uses TAP output for test results. However, KUnit (and other Kernel testing frameworks such as kselftest) have some special needs for test results which don't gel perfectly with the original TAP specification. Thus, a "Kernel TAP" (KTAP) format is specified to extend and alter TAP to support these use-cases.
KTAP Output consists of 5 major elements (all line-based):
- The version line
- Plan lines
- Test case result lines
These are required.
Agreed.
- Diagnostic lines
This is optional.
Agreed.
- A bail out line
Bail out should be optional, and possibly not included at all. (See below.)
I am fine with the Bail out line being removed if it is not in use.
An important component in this specification of KTAP is the specification of the format of nested tests. This can be found in the section on nested tests below.
The version line
The first line of KTAP output must be the version line. As this specification documents the first version of KTAP, the recommended version line is "KTAP version 1". However, since all kernel testing frameworks use TAP version lines, "TAP version 14" and "TAP version 13" are all acceptable version lines. Version lines with other versions of TAP or KTAP will not cause the parsing of the test results to fail but it will produce an error.
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing parsers.
I think it would be best to maintain a numerical version number. So I would prefer to either maintain 'TAP version #' or 'KTAP version 1', to indicate the test results follow the newly specified KTAP. However, I am flexible on this.
Plan lines
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested tests".
I am fine changing this concept. It might be slightly clearer.
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
I should have defined the [<diagnostic data>] more fully. My intention was that the diagnostic data would accommodate including information in the test result that is not a directive or the description of the test. For example, this would accommodate the examples that Tim Bird mentioned: ok 5 - check return code # rcode=0
Note that both the directive and diagnostic data would be after a # symbol. So the more exact format I was intending would be as follows:
<result> <number> [-] [<description>] [# [<directive>][<diagnostic data>]]
I suggest:
<result> <number>[ [- ]<description>][# <directive> [<directive details>]]
The result can be either "ok", which indicates the test case passed, or "not ok", which indicates that the test case failed.
Yes.
The number represents the number of the test case or suite being performed. The first test case or suite must have the number 1 and the number must increase by 1 for each additional test case or result at the same level and within the same testing suite.
Yes, though parsers should handle ordering failures and missed test results (this is the purpose of the "plan" line).
The "-" character is optional.
The description is a description of the test, generally the name of the test, and can be any string of words (can't include #). The description is optional.
Yes, though the optional "- " is strictly part of the optional description.
Does this mean you would want the "-" parsed as part of the description? I think having "-" at the beginning of test descriptions might be confusing.
Additionally, I am flexible on whether the "-" is necessary. I personally think it looks a bit better with the "-" but that is preference.
The directive is used to indicate if a test was skipped. The format
The directive is a single word -- currently only "SKIP" is recognized by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and error. (Though I would love to hear what "error" is intended to be used for, if different from regular "not ok".)
(This could also be called "context" rather than "directive".)
for the directive is: "# SKIP [<skip_description>]". The skip_description is optional and can be any string of words to describe why the test was skipped.
I would call this "directive details".
The result of the test case result line can be either "ok" or "not ok" if the skip directive is used. Finally, note that TAP 14 specification includes TODO directives but these are not supported for KTAP.
Examples of test case result lines:
Test passed:
ok 1 - test_case_name
Test was skipped:
not ok 1 - test_case_name # SKIP test_case_name should be skipped
Test failed:
not_ok 1 - test_case_name
This isn't valid. No "_" is recognized for "ok" vs "not ok".
Oops. My typo here. I meant: not ok 1 - test_case_name
Diagnostic lines
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
I would be fine with maintaining that distinction.
Diagnostic lines can be anywhere in the test output after the first two lines.
I don't see a reason for this strictness. They can be anywhere.
This was an attempt to ensure that the first two lines of the KTAP results are the version line and the test plan. However, looking back on this, it was a bit too strict.
There are a few special diagnostic lines.
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
Also, parsers should treat a new "TAP version..." line at the same nesting level as indication that the prior test has ended (and any tests without result lines should be considered failed).
Nested tests
The new specification for KTAP will support an arbitrary number of nested subtests. Thus, tests can now have subtests and those subtests can have subtests. This can be useful to further categorize tests and organize test results.
The new required format for a test with subtests consists of: a subtest header line, a plan line, all subtests, and a final test result line.
The first line of the test must be the subtest header line with the format: "# Subtest: <test_name>".
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
To clarify if this was implemented in kselftests, I believe the parent test would print this subtest header. So the executing test would not have this knowledge about the execution depth and thus, if the test was not nested, it would not print this subtest header. This is my understanding.
Additionally, this is the current format for nested tests for KUnit. As David mentioned, KUnit adopted this format from the proposed TAP 14 draft spec (https://github.com/TestAnything/testanything.github.io/pull/36/files). Currently KUnit relies on this Subtest header line to parse nested tests. I hope this is taken into account with discussions.
However, I hope we can decide on a singular format for these nested tests. I am flexible with this specification. I would be fine with including the nested version lines.
The second line of the test must be the plan line, which is formatted as "1..N", where N is the number of subtests.
Following the plan line, all lines pertaining to the subtests will follow.
Finally, the last line of the test is a final test result line with the format: "(ok|not ok) <number> [-] [<description>] [<directive>] [<diagnostic data>]", which follows the same format as the general test result lines described in this section. The result line should indicate the result of the subtests. Thus, if one of the subtests fail, the test should fail. The description in the final test result line should match the name of the test in the subtest header.
An example format:
KTAP version 1 1..1 # Subtest: test_suite 1..2 ok 1 - test_1 ok 2 - test_2 ok 1 - test_suite
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
An example format with multiple levels of nested testing:
KTAP version 1 1..1 # Subtest: test_suite 1..2 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 2 - test ok 1 - test_suite
In the instance that the plan line is missing, the end of the test will be denoted by the final result line containing a description that matches the name of the test given in the subtest header. Note that thus, if the plan line is missing and one of the subtests have a matching name to the test suite this will cause errors.
A plan line is required. No special cases are needed. :)
I am good with this. This was an attempt to be flexible because it does seem there are tests that leave out the test plan or place the test plan at the end of the test.
If nesting level is lost, a parser will understand the nested test to have ended, but probably so did the test runner:
TAP version 13 1..1 TAP version 13 1..2 TAP version 13 1..3 ok 1 - test_1 ok 2 test_2 not ok 1 - sub test unfinished plan ok 2 - test not ok 1 - test_suite
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
I think this could be a little strict to require indentation. For example, in the instance of a nested test with a prefix, there could be difficulties with correctly parsing the indentation level:
TAP version 13 1..1 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime
(using a truncated version of an example Tim Bird used - note he used it for a different point)
This could mostly be corrected with the use of "# " as the indentation. But I personally prefer " " as an indentation because the use of "# " can be confusing with the format of diagnostic lines also starting with "# ". Thus, I think it is safer to use the version lines or a subtest header lines to parse nested tests.
Whether this is "# " or " " I don't really care, as long as the change is coordinated. " " is easier for humans to read, but may make parsing of "unknown" lines more difficult for machines.
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
- TODO directive not allowed
I would just say "unrecognized".
I am good with this change.
- KTAP allows for an arbitrary number of tests to be nested with
specified nested test format
Yup.
Example of KTAP
KTAP version 1 1..1 # Subtest: test_suite 1..1 # Subtest: sub_test_suite 1..2 ok 1 - test_1 ok 2 test_2 ok 1 - sub_test_suite ok 1 - test_suite
For a good example, please include all the possible combinations (SKIP, not ok, diagnostics, etc etc)
========================================= Note on incompatibilities with kselftests =========================================
To my knowledge, the above specification seems to generally accept the TAP format of many non-nested test results of kselftests.
An example of a common kselftests TAP format for non-nested test results that are accepted by the above specification:
TAP version 13 1..2 # selftests: vDSO: vdso_test_gettimeofday # The time is 1628024856.096879 ok 1 selftests: vDSO: vdso_test_gettimeofday # selftests: vDSO: vdso_test_getcpu # Could not find __vdso_getcpu ok 2 selftests: vDSO: vdso_test_getcpu # SKIP
However, one major difference noted with kselftests is the use of more directives than the "# SKIP" directive. kselftest also supports XPASS and XFAIL directives. Some additional examples found in kselftests:
not ok 5 selftests: netfilter: nft_concat_range.sh # TIMEOUT 45 seconds not ok 45 selftests: kvm: kvm_binary_stats_test # exit=127
Should the specification be expanded to include these directives?
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
Sounds good. Are there specific directives that should or should not be included?
However, the general format for kselftests with nested test results seems to differ from the above specification. It seems that a general format for nested tests is as follows:
TAP version 13 1..2 # selftests: membarrier: membarrier_test_single_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 1 selftests: membarrier: membarrier_test_single_thread # selftests: membarrier: membarrier_test_multi_thread # TAP version 13 # 1..2 # ok 1 sys_membarrier available # ok 2 sys membarrier invalid command test: command = -1, flags = 0, errno = 22. Failed as expected ok 2 selftests: membarrier: membarrier_test_multi_thread
The major differences here, that do not match the above specification, are use of "# " as an indentation and using a TAP version line to denote a new test with subtests rather than the subtest header line described above. If these are widely utilized formats in kselftests, should we include both versions in the specification or should we attempt to agree on a single format for nested tests? I personally believe we should try to agree on a single format for nested tests. This would allow for a cleaner specification of KTAP and would reduce possible confusion.
We already use "# " and the nested TAP lines to denote subtests. Without a good reason to change it, we should avoid the churn with the existing parsers.
====
So what do people think about the above specification? How should we handle the differences with kselftests?
I'm probably a broken record by now, but kselftests _is_ the specification. ;) What about it needs changing, and why?
I think the change that needs to happen is that the specification is documented so it is accessible to everyone. Otherwise I hope this proposed specification helps to incite needed discussion so we hopefully can create this accepted specification. Again thank you to everyone for their comments.
If this specification is accepted, where should the specification be documented?
I imagine some .rst file under Documentation/dev-tools/, linked to from kselftest.rst and kunit/...rst
-- Kees Cook
On Tue, Aug 31, 2021 at 6:04 AM Tim.Bird@sony.com wrote:
-----Original Message----- From: Kees Cook keescook@chromium.org
On Mon, Aug 30, 2021 at 05:48:07PM +0000, Tim.Bird@sony.com wrote:
From: Kees Cook keescook@chromium.org
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
IMHO it's useful to keep this called 'diagnostic data'.
The directive is optional, and having this be diagnostic data means you can put explanatory text, or result details, that are independent of the description, on the same line as the result. This is convenient in some cases.
I think diagnostic data should always be prefixed by a hash sign. Or, in the case where diagnostic data appears on the result line, the hash serves as the delimiter. This means that the description text must not include a hash character.
For example: ok 5 - check return code # rcode=0 or not ok 5 - check return code # rcode=17
The directive is currently used to answer the question "why?" as in, "why is this test not ok?" ("a timeout occurred") or "why is this test ok?" ("a SKIP")
The "supporting evidence" for the "why?" answer are specific details about that: "the exit was was NNN" or "the timeout was YYY seconds".
Test diagnostics are all the lines preceding that (currently prefixed with a "#").
Yes, though the optional "- " is strictly part of the optional description.
It's mildly annoying that "-" is optional. It's trivial to deal with in the parser to just ignore it if it's present. But it has no semantic meaning whatsoever. IMHO it would be nice to either mandate it or remove it, for consistency's sake. This could be based solely on the consensus for whether it added or detracted from human readability, since parsers don't care.
I have no strong opinion on the "-". I was surprised to encounter it as it's not part of the TAP 13 spec. I would prefer to drop it, if I had to choose.
ERROR usually means that there was some kind of failure in the testing process that makes the testcase result indeterminate. (like loss of connection to a server, or crash of the program executing the testcase). This can be due to a bug in the test or the test system. I'm, not sure if ERROR is used this way in kselftest. It is, IMHO, a useful distinction to make. TIMEOUT is a special case of ERROR, IMHO.
Well, I think TIMEOUT is a "why not ok?" answer. "not ok" seems to be the same as ERROR, but I guess it's really a catch-all, but must come with the "evidence" portion:
'not ok' should not mean the same thing as ERROR. 'not ok' means the testcase was run successfully, and the result is that the test determined that the testcase failed. ERROR means that the testcase was not run successfully, so no result can be assumed.
But ERROR is confusing. Maybe there's a better word for the concept that is less likely to be conflated with the result.
In KUnit, we've largely used ERROR to refer to test cases with malformed or corrupted output (e.g. the test code crashed, or there are missing results according to the test plan, etc). Otherwise, the test fails (or is skipped, in some circumstances). This means that it's not been something tests explicitly sets, it's something the parser outputs. That's obviously less of a sensible thing for kselftest, which doesn't provide its own TAP parser (though the bits of test harness code could maybe try to output something if a subprocess failed, or the like.)
not ok 1 description goes here # TIMEOUT 45 seconds with no results not ok 1 description goes here # ERROR lost connection to server not ok 1 description goes here # ERROR non-zero exit code: NNN
I might go so far as to say the directive is _required_ for the "not ok" state.
As I said above, I don't think they are the same. I would reserve ERROR for the case where a test error occurred and the test case result could not be obtained.
If we go to the trouble of making our own spec, maybe it would be better to support a third result value: 'unknown'. This would break all the TAP parsers, but it's much clearer, IMHO.
'ok 1 description' = test case executed successfully, and the testcase condition passed 'not ok 1 description' = test case executed successfully, and the testcase condition failed 'unknown 1 description # SKIP # reason' = test case does not apply for some reason 'unknown 1 description # ERROR # reason' = test case could not be executed, for some reason
In the case of SKIP and ERROR, the result should be ignored as it is either inapplicable or wasn't obtained.
I remember we had a discussion about whether SKIP should have an "ok" or "not ok" result. IMHO it doesn't matter, because the result has no meaning when a SKIP directive is used. (Although it's useful for humans to see why something was skipped.)
My feeling is that if we deviate from TAP enough to add another result value, let's just make "skip" (and maybe "error", "xfail", "xpass" if they make sense) it's own value rather than writing "unknown".
As an aside, I've never liked XFAIL or XPASS. Semantically, I believe they are quite confusing, and depend on the level of testcase execution. XPASS is also redundant. Isn't the default case always XPASS?
Nothing currently uses XPASS, and we should remove it.
XFAIL capture the condition of the test logic in that a test for something arm64-specific on a non-arm64 machine is _not_ a "pass". It will fail. But it's an expected failure. And, as a directive, must come with the evidence bit:
not ok 1 PAN catches bad accessg # XFAIL PAN CPU feature is arm64 only
As a test author, I want to see this as distinct from failure and success.
I agree that this case is different from a normal failure.
In the example you provide of a feature that applies to only specific architectures, I would use a SKIP instead. It's not that the testcase failed.
)> It's that it does not even make sense for the device under test.
Whether something is 1) a testcase failure or 2) a testcase that does not apply is a matter of perspective. Is it the case that the feature should exist on all architectures, and it's just not been written yet? I could call this a 'not ok' or an XFAIL. However, if it's a feature that is never intended to be written for arm32 or i386, I would report that as a SKIP.
But all of this differs from what I originally described, which is labs using XFAIL to annotate bugs that they knew about but didn't intend to fix.
IIRC, the TAP spec specifies a TODO which is basically the same thing as this. We haven't implemented either it or XPASS/XFAIL in KUnit, though. (I tend to agree that this is a nicer thing to have in a separate layer, rather than the tests themselves, but can see the argument for not wanting another list somewhere.)
In the case of XFAIL, does it mean that an operation that was expected to fail all the time, failed, and so the test passed? Or does it mean that this is a known bug that a particular developer or site has decided to ignore (as in "we know this is a bug, but we expected it to fail, so just ignore it for now). The latter, I really don't think should be encoded in the tests themselves, but should be put into a layer above the test and parser, that is applied according to developer or testlab policy.
I agree that "just ignore it for now" shouldn't be there.
Other arguments are that xfail tests shouldn't be run at all, but I don't think that's right because it makes it very hard to do large-scale test result comparisons because random stuff is missing, depending on various configs, archs, etc. It blocks a 1-to-1 comparison, and begs questions like "why is the test here in one case and not in another?"
I completely agree with this. It really messes up tables of results when the list of testcases varies depending on some board or configuration attribute. It's very hard for users to parse and compare results in that case.
where as an XFAIL will encode the reason it is an XFAIL in its output, providing a self-documenting test result.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
I think this would be a very good distinction to make. You might have unknown lines that (inadvertently?) start with a hash, but trying to make sure that the test itself always outputs diagnostic data with a preceding hash gives a reader potentially more information about where the message came from, and could be useful.
Right, so lines with a hash shouldn't be machine-parsed. (And this is a mistake I made in the current kselftest output for the LAVA parser.) The good news for the kernel, is that nothing else in dmesg leads with a "#" currently. :)
Oh. That's actually nice to know. Interesting.
If this is the case, I'm tempted to want a "#" in front of all KTAP lines, just so that we have an easy way to isolate test output in general (including the machine-parsable bits).
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
I agree it is handy to have the test name for sub-tests. However, I don't see any semantic difference whether we define sub-test declarations via diagnostic lines with special syntax or by creating a new syntax separate from diagnostic lines. Either way, we have a new syntax for parsers to understand.
Pragmatically, I agree, which is what lead me to my mistake. However, in reconsidering it, I realize this leads to a slippery slope of just dumping stuff into diagnostic lines and pretending nothing will read it.
Good point. Whether it starts with a hash or not, TAP 13 parsers should ignore anything that is not a result line (starts with 'ok' or 'not ok'). So I guess we are free to make a new syntax without upsetting any existing parsers.
TAP 14 definitely parses a few things other than just the result line (including some which start with '#'), but it does look like no-one liked TAP 14 enough to finish the spec and use it (other than KUnit stealing some ideas from it).
I suspect what we need is an optional "starting test" line, like:
test N DESC # diag... # diag... # diag... ok N
The nesting looks like:
TAP version 13 1..3 test 1 seccomp # TAP version 13 # 1..27 # test 1 prctl # ok 1 # test 2 strict # ok 2 # ... # test 27 user_notif # # eek missing CONFIG_.... # not ok 27 user_notif # ERROR can't use ioctl not ok 1 seccomp # ERROR non-zero exit code 1 test 2 netfilter ok 2 netfilter test 3 bpf # exciting things happen ok 3 bpf
This is what we have now except basically just replacing "# $name" with "test $num $name"
Actually, I like this idea. Including the number is a nice touch. I think that will help to distinguish such a line from other 'unknown line' output. I don't know if I'd leave the description off of the result line, though. A lot of parsers are looking for that.
Yeah, this is something I'd quite like, personally. If we're prepared to deviate from TAP this much, having the name/description of the test before getting the results makes it easier to read. I'd agree with keeping the description on the result line, too: not only is it what existing parsers are looking for, it makes reading the results manually much nicer.
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
How consistent is it for selftests to produce test summary lines?
Any of the tools built with kselftest.h will spit it out. (Which is to say several of the "subtests".) The bulk of kselftest output is from the runner, which doesn't produce this summary.
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
I agree that the sub-test should not know about this line. I thought the "parent" test wrote the line.
The suggestion was that the subtest should _not_ emit the "TAP" line, which is what I strongly disagree with: it should be a self-contained TAP-emitting test.
OK - agreed. Yes. The sub-test should not know whether they are running standalone or nested (with the exception of some line prefix opaquely conveyed to them)
Yeah, this all came from the TAP 14 spec. In KUnit it's not a problem, as the harness does all of the printing of these headers, so the test code itself never has to worry about it. It does make more sense for kselftest to make tests more independent, though, so let's leave the TAP header line in.
My only concern here is that this could complicate attempting to parse multiple test suites in the same output slightly: e.g., if we just grep the input naively for "TAP version blahdiblah", then parse whatever we get, there's a possibility we could double-parse nested results.
We don't hit this at the moment, though, and it could be worked around pretty easily, so I don't think it's worth holding back something which otherwise works.
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
I prefer spaces, but whatever. Is the prefix on the 'TAP version 13' line guaranteed to be the prefix on all lines for the (sub-)test?
That's the current implementation, yes.
As for tabs vs spaces, I think it's fine to swap "# " for " ". We just need to patch LAVA to deal with it.
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
I thought that some tests put the diagnostic data before the result line.
I don't care whether it is before or after (and indeed Fuego's parser can handle either case). But it can't handle it when some testcases put diagnostic data before the result line, and some testcases put it after, in the same test. There's no delimiter for the separation between the two different testcases diagnostic data.
Right. The current implementation puts it always before. TAP 13 (14?) has the YAML _after_ the results line, but that's totally unworkable for kselftest, which is designed to be first human readable and second machine readable. Waiting to see diagnostics from a slow test is terrible -- we can't collect it and spit it out later. Also, a parser would have no idea about when a test is finished if diagnostics follow the last test result of a plan.
Making it "always before" is fine by me, and the motivation for making the "summary" lines be non-diagnostics because they technically followed the last test result.
Could we possibly have a rule about diagnostic lines positioning relative to the testcase result line? Maybe a rule like this: For a single test, diagnostic lines should either always preceded the testcase they are related to, or follow the testcase they are related to.
For the kernel, I would rather avoid anything after because that just confuses things.
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
Please No! (Unless I'm misinterpreting that line).
If you remove the '#', that makes the 'exit=127' part of the test description. The test description should be invariant between success and failure cases. The exit=127 (if I'm reading that line right) is part of data explaining or describing the failure. It would be 'exit=0' if the test succeeded.
In order to correlate test cases (ie show results over time), the description should not change from one invocation of the test to the next. At least Fuego uses the test description this way. That is, the testcase description is used as a testcase identifier, and if it changes from run to run things get confusing.
Agreed, I think it should be part of an ERROR directive:
not ok 14 awesome test # ERROR exit=127
As long as the line above means that testcase '14 awesome test' could not be run, and exit=127 helps diagnose that, then I agree. If 'exit=127' means that testcase '14 awesome test' was run and exit=127 was a failing result, I would say that the result line should just be: not ok 14 awesome test # exit=127
I think this is fine, though I don't think it gives us that much more over just making the last diagnostic line "# exit=127". If we make "skip" and the like first-class results, that makes this even simpler, of course.
-- David
On Mon, Aug 30, 2021 at 10:04:24PM +0000, Tim.Bird@sony.com wrote:
-----Original Message----- From: Kees Cook keescook@chromium.org On Mon, Aug 30, 2021 at 05:48:07PM +0000, Tim.Bird@sony.com wrote:
From: Kees Cook keescook@chromium.org
XFAIL capture the condition of the test logic in that a test for something arm64-specific on a non-arm64 machine is _not_ a "pass". It will fail. But it's an expected failure. And, as a directive, must come with the evidence bit:
not ok 1 PAN catches bad accessg # XFAIL PAN CPU feature is arm64 only
As a test author, I want to see this as distinct from failure and success.
I agree that this case is different from a normal failure.
In the example you provide of a feature that applies to only specific architectures, I would use a SKIP instead. It's not that the testcase failed. It's that it does not even make sense for the device under test.
That's what all the arm64 tests are doing at the minute - they're skipping tests on systems where the feature being tested isn't supported. We've determined that the test can't meaningfully be run so we're going to skip it. We also don't use XFAIL for tests that are checking that an error of some kind is generated, the test is formulated as "error is reported when we do X" meaning that PASS/FAIL makes sense directly.
There's a couple of XFAIL uses in other selftests but it's pretty infrequent.
This is what we have now except basically just replacing "# $name" with "test $num $name"
Actually, I like this idea. Including the number is a nice touch. I think that will help to distinguish such a line from other 'unknown line' output. I don't know if I'd leave the description off of the result line, though. A lot of parsers are looking for that.
It's also important that humans are able to understand the output and the descriptions do help with that.
On 8/30/21 2:58 PM, Kees Cook wrote:
On Mon, Aug 30, 2021 at 05:48:07PM +0000, Tim.Bird@sony.com wrote:
From: Kees Cook keescook@chromium.org
Test case result lines
Test case result lines must have the format:
<result> <number> [-] [<description>] [<directive>] [<diagnostic data>]
"[<diagnostic data>]" is not defined below. I think what you mean is "directive details".
IMHO it's useful to keep this called 'diagnostic data'.
The directive is optional, and having this be diagnostic data means you can put explanatory text, or result details, that are independent of the description, on the same line as the result. This is convenient in some cases.
I think diagnostic data should always be prefixed by a hash sign. Or, in the case where diagnostic data appears on the result line, the hash serves as the delimiter. This means that the description text must not include a hash character.
For example: ok 5 - check return code # rcode=0 or not ok 5 - check return code # rcode=17
The directive is currently used to answer the question "why?" as in, "why is this test not ok?" ("a timeout occurred") or "why is this test ok?" ("a SKIP")
The "supporting evidence" for the "why?" answer are specific details about that: "the exit was was NNN" or "the timeout was YYY seconds".
Test diagnostics are all the lines preceding that (currently prefixed with a "#").
Yes, though the optional "- " is strictly part of the optional description.
It's mildly annoying that "-" is optional. It's trivial to deal with in the parser to just ignore it if it's present. But it has no semantic meaning whatsoever. IMHO it would be nice to either mandate it or remove it, for consistency's sake. This could be based solely on the consensus for whether it added or detracted from human readability, since parsers don't care.
I have no strong opinion on the "-". I was surprised to encounter it as it's not part of the TAP 13 spec. I would prefer to drop it, if I had to choose.
ERROR usually means that there was some kind of failure in the testing process that makes the testcase result indeterminate. (like loss of connection to a server, or crash of the program executing the testcase). This can be due to a bug in the test or the test system. I'm, not sure if ERROR is used this way in kselftest. It is, IMHO, a useful distinction to make. TIMEOUT is a special case of ERROR, IMHO.
Well, I think TIMEOUT is a "why not ok?" answer. "not ok" seems to be the same as ERROR, but I guess it's really a catch-all, but must come with the "evidence" portion:
not ok 1 description goes here # TIMEOUT 45 seconds with no results not ok 1 description goes here # ERROR lost connection to server not ok 1 description goes here # ERROR non-zero exit code: NNN
I might go so far as to say the directive is _required_ for the "not ok" state.
As an aside, I've never liked XFAIL or XPASS. Semantically, I believe they are quite confusing, and depend on the level of testcase execution. XPASS is also redundant. Isn't the default case always XPASS?
Nothing currently uses XPASS, and we should remove it.
XFAIL capture the condition of the test logic in that a test for something arm64-specific on a non-arm64 machine is _not_ a "pass". It will fail. But it's an expected failure. And, as a directive, must come with the evidence bit:
not ok 1 PAN catches bad accessg # XFAIL PAN CPU feature is arm64 only
As a test author, I want to see this as distinct from failure and success.
In the case of XFAIL, does it mean that an operation that was expected to fail all the time, failed, and so the test passed? Or does it mean that this is a known bug that a particular developer or site has decided to ignore (as in "we know this is a bug, but we expected it to fail, so just ignore it for now). The latter, I really don't think should be encoded in the tests themselves, but should be put into a layer above the test and parser, that is applied according to developer or testlab policy.
I agree that "just ignore it for now" shouldn't be there.
Other arguments are that xfail tests shouldn't be run at all, but I don't think that's right because it makes it very hard to do large-scale test result comparisons because random stuff is missing, depending on various configs, archs, etc. It blocks a 1-to-1 comparison, and begs questions like "why is the test here in one case and not in another?" where as an XFAIL will encode the reason it is an XFAIL in its output, providing a self-documenting test result.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
I think this would be a very good distinction to make. You might have unknown lines that (inadvertently?) start with a hash, but trying to make sure that the test itself always outputs diagnostic data with a preceding hash gives a reader potentially more information about where the message came from, and could be useful.
Right, so lines with a hash shouldn't be machine-parsed. (And this is a mistake I made in the current kselftest output for the LAVA parser.) The good news for the kernel, is that nothing else in dmesg leads with a "#" currently. :)
Actually, not correct. Devicetree unittest (drivers/of/unittest.c) leads messages with "### dt-test ### ". But the good news is that this is easily changeable.
-Frank
No; diagnostic lines must have no meaning: they are for humans and nothing else. If other details are needed for machines, we should explicitly create new format lines. I made a mistake when I used a diagnostic line for embedding the test names. :( There is a need for parsers to discover the name of a given test, though, so we do likely need something for this.
I agree it is handy to have the test name for sub-tests. However, I don't see any semantic difference whether we define sub-test declarations via diagnostic lines with special syntax or by creating a new syntax separate from diagnostic lines. Either way, we have a new syntax for parsers to understand.
Pragmatically, I agree, which is what lead me to my mistake. However, in reconsidering it, I realize this leads to a slippery slope of just dumping stuff into diagnostic lines and pretending nothing will read it.
I suspect what we need is an optional "starting test" line, like:
test N DESC # diag... # diag... # diag... ok N
The nesting looks like:
TAP version 13 1..3 test 1 seccomp # TAP version 13 # 1..27 # test 1 prctl # ok 1 # test 2 strict # ok 2 # ... # test 27 user_notif # # eek missing CONFIG_.... # not ok 27 user_notif # ERROR can't use ioctl not ok 1 seccomp # ERROR non-zero exit code 1 test 2 netfilter ok 2 netfilter test 3 bpf # exciting things happen ok 3 bpf
This is what we have now except basically just replacing "# $name" with "test $num $name"
Diagnostic lines of the format "# Subtest: <test_name>" indicate the start of a test with subtests. Also, diagnostic lines of the format "# <test_name>: <description>" refer to a specific test and tend to occur before the test result line of that test but are optional.
I don't think the above should be included in the spec -- diag lines should have no parseable meaning.
Bail out line
A bail out line can occur anywhere in the KTAP output and will indicate that a test has crashed. The format of a bail out line is "Bail out! [<description>]", where the description can give information on why the bail out occurred and can be any string.
I'm not a fan of the Bail out line. It's not a problem, exactly, but I find it redundant. If we want an "end of test" line, let's make one. "Bail out" is a special case of exit. If we want to handle test exit, let's define it. E.g. make kselftest's test summary lines not diagnostic lines:
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
How consistent is it for selftests to produce test summary lines?
Any of the tools built with kselftest.h will spit it out. (Which is to say several of the "subtests".) The bulk of kselftest output is from the runner, which doesn't produce this summary.
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
I agree that the sub-test should not know about this line. I thought the "parent" test wrote the line.
The suggestion was that the subtest should _not_ emit the "TAP" line, which is what I strongly disagree with: it should be a self-contained TAP-emitting test.
Again, I see only downsides to this. Nesting for the spec is simple indentation-based recursion. Let's just keep what we already have:
TAP version 13 1..1 # TAP version 13 # 1..2 # ok 1 - test_1 # ok 2 - test_2 ok 1 - test_suite
I prefer spaces, but whatever. Is the prefix on the 'TAP version 13' line guaranteed to be the prefix on all lines for the (sub-)test?
That's the current implementation, yes.
As for tabs vs spaces, I think it's fine to swap "# " for " ". We just need to patch LAVA to deal with it.
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
Major differences between TAP 14 and KTAP specification
Note that the major differences between TAP 14 and KTAP specification:
- yaml and json are not allowed in diagnostic messages
Agreed -- these are overkill (and very difficult to implement as they _follow_ the test result line: anything generating them has already finished running).
I thought that some tests put the diagnostic data before the result line.
I don't care whether it is before or after (and indeed Fuego's parser can handle either case). But it can't handle it when some testcases put diagnostic data before the result line, and some testcases put it after, in the same test. There's no delimiter for the separation between the two different testcases diagnostic data.
Right. The current implementation puts it always before. TAP 13 (14?) has the YAML _after_ the results line, but that's totally unworkable for kselftest, which is designed to be first human readable and second machine readable. Waiting to see diagnostics from a slow test is terrible -- we can't collect it and spit it out later. Also, a parser would have no idea about when a test is finished if diagnostics follow the last test result of a plan.
Making it "always before" is fine by me, and the motivation for making the "summary" lines be non-diagnostics because they technically followed the last test result.
Could we possibly have a rule about diagnostic lines positioning relative to the testcase result line? Maybe a rule like this: For a single test, diagnostic lines should either always preceded the testcase they are related to, or follow the testcase they are related to.
For the kernel, I would rather avoid anything after because that just confuses things.
Yes. (Though "exit=" is a mistake in runner.sh -- this should probably be reported without the '#')
Please No! (Unless I'm misinterpreting that line).
If you remove the '#', that makes the 'exit=127' part of the test description. The test description should be invariant between success and failure cases. The exit=127 (if I'm reading that line right) is part of data explaining or describing the failure. It would be 'exit=0' if the test succeeded.
In order to correlate test cases (ie show results over time), the description should not change from one invocation of the test to the next. At least Fuego uses the test description this way. That is, the testcase description is used as a testcase identifier, and if it changes from run to run things get confusing.
Agreed, I think it should be part of an ERROR directive:
not ok 14 awesome test # ERROR exit=127
linux-kselftest-mirror@lists.linaro.org