In the middle of the "RFC - kernel test result specification (KTAP)" thread, started in August 2021, Tim Bird made a suggestion to allow a prefix to the KTAP data format:
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
The thread was discussing many other items, but this is the one that I want to focus on in this new RFC thread.
Tim's original email was:
https://lore.kernel.org/r/BYAPR13MB2503A4B79074D8ED5579345DFDCB9@BYAPR13MB25...
There was one reply to this that commented on Tim's suggestion (and also many other items in the thread) at:
https://lore.kernel.org/r/202108301226.800F3D6D4@keescook
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
I found Tim's original suggestion to be useful, so I have come up with two possible ways to modify the KTAP specification to implement what Tim was thinking about. I would not be surprised if someone else has a better suggestion than mine, but I will reply to this email with my two alternatives to start a discussion. My alternatives are not in the form of patches, but if discussion leads to a good result then I will create a patch for review.
-Frank
On 5/12/22 00:59, Frank Rowand wrote:
In the middle of the "RFC - kernel test result specification (KTAP)" thread, started in August 2021, Tim Bird made a suggestion to allow a prefix to the KTAP data format:
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
The thread was discussing many other items, but this is the one that I want to focus on in this new RFC thread.
Tim's original email was:
https://lore.kernel.org/r/BYAPR13MB2503A4B79074D8ED5579345DFDCB9@BYAPR13MB25...
There was one reply to this that commented on Tim's suggestion (and also many other items in the thread) at:
https://lore.kernel.org/r/202108301226.800F3D6D4@keescook
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
I found Tim's original suggestion to be useful, so I have come up with two possible ways to modify the KTAP specification to implement what Tim was thinking about. I would not be surprised if someone else has a better suggestion than mine, but I will reply to this email with my two alternatives to start a discussion. My alternatives are not in the form of patches, but if discussion leads to a good result then I will create a patch for review.
-Frank
================================================================================ Alternative 1 - Add an optional <prefix string> to test output. ================================================================================ ## <prefix string>
If the optional [<prefix string>] is present for any line of the KTAP formatted output, then it must be present for all lines of the KTAP formatted output (except for "Diagnostic lines", are allowed to not be prefixed with <prefix string>, as described below).
The "Version lines" format is changed from:
KTAP version 1
to:
[<prefix string>] KTAP version 1
The "Plan lines" format is changed from:
"1..N"
to:
[<prefix string>] "1..N"
The "Test case result lines" format is changed from:
<result> <number> [<description>][ # [<directive>] [<diagnostic data>]]
to:
[<prefix string>] <result> <number> [<description>][ # [<directive>] [<diagnostic data>]]
<prefix content is a constant string>
All subsequent test lines are prefixed with the <prefix string>.
Indentation for "Nested tests" follows <prefix string>. The indentation does NOT precede <prefix string>.
"Diagnostic lines" are prefixed with the <prefix string>.
"Unknown lines" may optionally be prefixed with the <prefix string>, but are not required to be prefixed with the <prefix string>. It is allowed for some "Unknown lines" to not be prefixed with the <prefix string>, even if one or more other "Unknown lines" are prefixed with the <prefix string>.
================================================================================ #### discussion notes:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - PRO: minimally invasive to specification.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CON:
KTAP does not include any mechanism to describe the value of <prefix string> to test harnesses and test output processing programs. The test output processing programs must infer the value of <prefix string> by detecting the <prefix string> in the "Version lines".
The detection of a "Version lines" might be a match of the regex:
"^.*KTAP version 2$"
This risks falsely detecting a "Version lines", but the risk is small???
The impact of falsely detecting a "Version lines" may be large if the beginning of test detection is a state machine that moves into a "detect either Plan lines or Test case result lines" after falsely detecting a "Version lines". The impact may be small if the state maching instead moves into a "detect Version lines, Plan lines, or Test case result lines".
================================================================================
On Wed, May 11, 2022 at 11:01 PM Frank Rowand frowand.list@gmail.com wrote:
================================================================================ #### discussion notes:
PRO: minimally invasive to specification.
CON:
KTAP does not include any mechanism to describe the value of <prefix string> to test harnesses and test output processing programs. The test output processing programs must infer the value of <prefix string> by detecting the <prefix string> in the "Version lines".
The detection of a "Version lines" might be a match of the regex:
"^.*KTAP version 2$"
This risks falsely detecting a "Version lines", but the risk is small???
Agree this is a risk and also think it's probably small, but it's hard to say. I think the $ anchoring the regex is probably safe enough.
As noted earlier, this tracks with what kunit.py already does. That was necessitated by dynamic prefixes such as timestamps, etc. So I think this is probably a fine risk to take.
I imagine we could add constraints of prefix string, e.g. must have [] around it, etc. if we want to try and minimize this risk. But I don't know if it's necessarily worth it, given what we know right now.
Along those lines, I think I like this approach (Alternative 1) more than Alternative 2/2b. I'm not sure we need a structured way to specify metadata in KTAP yet? The prefix seems like a reasonable candidate, but do others have ideas of other bits of metadata we'd want to be able to declare?
Daniel
On 5/12/22 10:25, Daniel Latypov wrote:
On Wed, May 11, 2022 at 11:01 PM Frank Rowand frowand.list@gmail.com wrote:
================================================================================ #### discussion notes:
PRO: minimally invasive to specification.
CON:
KTAP does not include any mechanism to describe the value of <prefix string> to test harnesses and test output processing programs. The test output processing programs must infer the value of <prefix string> by detecting the <prefix string> in the "Version lines".
The detection of a "Version lines" might be a match of the regex:
"^.*KTAP version 2$"
This risks falsely detecting a "Version lines", but the risk is small???
Agree this is a risk and also think it's probably small, but it's hard to say. I think the $ anchoring the regex is probably safe enough.
As noted earlier, this tracks with what kunit.py already does. That was necessitated by dynamic prefixes such as timestamps, etc.f
That is a good observation. I nearly always have the prefix timestamp on my console output, and thus remove the timestamp with a regex when processing the data.
-Frank
So I think this is probably a fine risk to take.
I imagine we could add constraints of prefix string, e.g. must have [] around it, etc. if we want to try and minimize this risk. But I don't know if it's necessarily worth it, given what we know right now.
Along those lines, I think I like this approach (Alternative 1) more than Alternative 2/2b. I'm not sure we need a structured way to specify metadata in KTAP yet? The prefix seems like a reasonable candidate, but do others have ideas of other bits of metadata we'd want to be able to declare?
Daniel
On Fri, May 13, 2022 at 1:26 AM Frank Rowand frowand.list@gmail.com wrote:
On 5/12/22 10:25, Daniel Latypov wrote:
On Wed, May 11, 2022 at 11:01 PM Frank Rowand frowand.list@gmail.com wrote:
================================================================================ #### discussion notes:
PRO: minimally invasive to specification.
CON:
KTAP does not include any mechanism to describe the value of <prefix string> to test harnesses and test output processing programs. The test output processing programs must infer the value of <prefix string> by detecting the <prefix string> in the "Version lines".
The detection of a "Version lines" might be a match of the regex:
"^.*KTAP version 2$"
This risks falsely detecting a "Version lines", but the risk is small???
Agree this is a risk and also think it's probably small, but it's hard to say. I think the $ anchoring the regex is probably safe enough.
As noted earlier, this tracks with what kunit.py already does. That was necessitated by dynamic prefixes such as timestamps, etc.f
That is a good observation. I nearly always have the prefix timestamp on my console output, and thus remove the timestamp with a regex when processing the data.
Yeah: the use of a prefix length for timestamps (and similar 'dynamic prefixes') works really well as a way to automatically find KTAP output in the kernel log. It's definitely less useful as a way of disambiguating several (potentially interleaved) KTAP streams, though.
Personally, my conception of KTAP is that the prefix is not a part of the format, but a way (possibly one we should recommend in the spec) to isolate KTAP output from (e.g.) the kernel log. So, in theory, the processing pipeline for log -> result is: [dmesg] ---<remove prefix>--> [KTAP] ---<parse>---> [results]
I wouldn't object to having the prefix be a part of KTAP if it were particularly useful to anyone, though, and definitely agree we should note ways to do it.
Either way, we do end up with two (slightly conflicting) uses of prefixes: a) To isolate tests from a log. b) To allow multiple concurrent tests.
For a), Alternative 1 seems obviously correct to me, since whatever module is being used might not know what random prefixes are being added to its log lines. For b), either would work, but we do potentially get conflicts with a) if we're trying to do both. So maybe alternative 2 makes sense for that.
So I think this is probably a fine risk to take.
I imagine we could add constraints of prefix string, e.g. must have [] around it, etc. if we want to try and minimize this risk. But I don't know if it's necessarily worth it, given what we know right now.
Along those lines, I think I like this approach (Alternative 1) more than Alternative 2/2b.
I agree, though note that we might need both if we want two separate types of prefix as mentioned above.
I'm not sure we need a structured way to specify metadata in KTAP yet? The prefix seems like a reasonable candidate, but do others have ideas of other bits of metadata we'd want to be able to declare?
That's the important takeaway from Alternative 2: some generic way of having metadata could be quite handy. (Of them, Alternative 2b feels a little more like a hack than I'd prefer, albeit a clever one. I'd prefer a cleaner "configuration info line" personally, though if reusing the test results for test 0 makes parsing significantly easier for others, it may nevertheless be worth doing.)
As for other pieces of metadata, there are all sorts of useful information about the test which could be useful to store alongside the results (what tool generated it, kernel version, etc), though those might be more useful when saving results elsewhere than embedded into the kernel output.
Equally, if we extended these configuration/info lines to individual test/subtests, things like test size, originating module, etc could potentially be stored. Suite/test init functions could potentially add useful metadata, too (e.g., number of cores/threads for kcsan, etc).
None of those are, I think, quite compelling enough individually to add a config system, though. If we already had one for the prefix, and other things which are necessary for parsing, maybe...
-- David
On 5/12/22 00:59, Frank Rowand wrote:
In the middle of the "RFC - kernel test result specification (KTAP)" thread, started in August 2021, Tim Bird made a suggestion to allow a prefix to the KTAP data format:
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
The thread was discussing many other items, but this is the one that I want to focus on in this new RFC thread.
Tim's original email was:
https://lore.kernel.org/r/BYAPR13MB2503A4B79074D8ED5579345DFDCB9@BYAPR13MB25...
There was one reply to this that commented on Tim's suggestion (and also many other items in the thread) at:
https://lore.kernel.org/r/202108301226.800F3D6D4@keescook
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
I found Tim's original suggestion to be useful, so I have come up with two possible ways to modify the KTAP specification to implement what Tim was thinking about. I would not be surprised if someone else has a better suggestion than mine, but I will reply to this email with my two alternatives to start a discussion. My alternatives are not in the form of patches, but if discussion leads to a good result then I will create a patch for review.
-Frank
================================================================================ Alternative 2 - Add an optional <prefix string> to test output. - Add "Configuration info lines", which are used to provide information about the KTAP format to programs that interpret the KTAP data. The only type of "Configuration info lines" proposed provides the value of <prefix string> for programs that process the KTAP output. - Further types of "Configuration info lines" could be added.
Alternative 2b - Add an optional <prefix string> to test output. - Add a <prefix string> definition line of the form: ok 0 <prefix string> This line is essentially a phony "Test case result lines" for test 0.
================================================================================ ## Configuration info lines
Occur in zero or more test case result lines, where <number> is 0 (zero), following the "Plan line", before any other "Test case result" line.
If there is no "Plan line" at the beginning of the test, then the "Configuration info lines" follow the "Version line", before any other "Test case result" line.
format:
ok 0 <description> # [<directive>] [<diagnostic data>]]
<result> must be "ok".
The type of each "Configuration info line" is defined by the value of <description>.
Each value of <description> used for a "Configuration info line" must be listed in the specification.
Whether # <directive> is optional or required is defined for each type of "Configuration info line".
#### Should '[<diagnostic data>]' be included in the format?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # [output] prefix
ok 0 output_prefix #<prefix string>
<prefix content is a constant string> Note: there is no space between "#" and "<prefix string>". If there is one or more spaces after the "#", then there are part of the <prefix string>
## <diagnostic data> must not exist unless there is a way to determine the ## end of <prefix string>. ## ## Adding a way to determine the end of <prefix string> adds much complexity ## to the consumers of ktap output. For example, if the format was: ## ## ok 0 output_prefix #<prefix string>[#<diagnostic data>] ## ## then it is not possible for <prefix string> to contain '#', unless ## a method to escape '#' is provided. E.G.: ## ## ok 0 output_prefix #XXX test result ####<diagnostic data> ## ## would result in <prefix string> value of 'XXX test result ###' ## ## My recomendation: do not allow <diagnostic data> in output_prefix format.
All subsequent test lines are prefixed with the <prefix string>.
Indentation for "Nested tests" follows <prefix string>. The indentation does NOT precede <prefix string>.
"Diagnostic lines" are prefixed with the <prefix string>.
"Unknown lines" may optionally be prefixed with the <prefix string>, but are not required to be prefixed with the <prefix string>. It is allowed for some "Unknown lines" to not be prefixed with the <prefix string>, even if one or more other "Unknown lines" are prefixed with the <prefix string>.
#### Does prefixing begin immediately (even for a subsequent "Configuration #### info line" or does it begin with the test 1 "Test case result line" #### line? #### #### This question might be simplified if the output_prefix line is #### required to be the last "Configuration info line". (Or maybe even if #### required to be the first "Configuration info line".
================================================================================ #### discussion notes:
PRO: Test output processing programs can detect the value of <prefix string> more deterministically than Alternative 1.
CON: More complex than Alternative 1. (But alternative 2b is less complex than Alternative 2.)
================================================================================
-----Original Message----- From: Frank Rowand frowand.list@gmail.com
On 5/12/22 00:59, Frank Rowand wrote:
In the middle of the "RFC - kernel test result specification (KTAP)" thread, started in August 2021, Tim Bird made a suggestion to allow a prefix to the KTAP data format:
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
The thread was discussing many other items, but this is the one that I want to focus on in this new RFC thread.
Tim's original email was:
https://lore.kernel.org/r/BYAPR13MB2503A4B79074D8ED5579345DFDCB9@BYAPR13MB25...
There was one reply to this that commented on Tim's suggestion (and also many other items in the thread) at:
https://lore.kernel.org/r/202108301226.800F3D6D4@keescook
Oh, interesting. This would also allow parallel (unique) test execution to be parsable. That sounds workable. (Again, this needs LAVA patching again...)
I found Tim's original suggestion to be useful, so I have come up with two possible ways to modify the KTAP specification to implement what Tim was thinking about. I would not be surprised if someone else has a better suggestion than mine, but I will reply to this email with my two alternatives to start a discussion. My alternatives are not in the form of patches, but if discussion leads to a good result then I will create a patch for review.
-Frank
================================================================================ Alternative 2
- Add an optional <prefix string> to test output.
- Add "Configuration info lines", which are used to provide information about the KTAP format to programs that interpret the KTAP data. The only type of "Configuration info lines" proposed provides the value of <prefix string> for programs that process the KTAP output.
- Further types of "Configuration info lines" could be added.
Alternative 2b
- Add an optional <prefix string> to test output.
- Add a <prefix string> definition line of the form: ok 0 <prefix string> This line is essentially a phony "Test case result lines" for test 0.
================================================================================
For what it's worth, I much prefer Alternative 1, using the prefix on the KTAP line as the prefix for the lines associated with that test's output. I think it's much simpler.
Thanks for following up on this. -- Tim
## Configuration info lines
Occur in zero or more test case result lines, where <number> is 0 (zero), following the "Plan line", before any other "Test case result" line.
If there is no "Plan line" at the beginning of the test, then the "Configuration info lines" follow the "Version line", before any other "Test case result" line.
format:
ok 0 <description> # [<directive>] [<diagnostic data>]]
<result> must be "ok".
The type of each "Configuration info line" is defined by the value of <description>.
Each value of <description> used for a "Configuration info line" must be listed in the specification.
Whether # <directive> is optional or required is defined for each type of "Configuration info line".
#### Should '[<diagnostic data>]' be included in the format?
# [output] prefix
ok 0 output_prefix #<prefix string>
<prefix content is a constant string> Note: there is no space between "#" and "<prefix string>". If there is one or more spaces after the "#", then there are part of the <prefix string>
## <diagnostic data> must not exist unless there is a way to determine the ## end of <prefix string>. ## ## Adding a way to determine the end of <prefix string> adds much complexity ## to the consumers of ktap output. For example, if the format was: ## ## ok 0 output_prefix #<prefix string>[#<diagnostic data>] ## ## then it is not possible for <prefix string> to contain '#', unless ## a method to escape '#' is provided. E.G.: ## ## ok 0 output_prefix #XXX test result ####<diagnostic data> ## ## would result in <prefix string> value of 'XXX test result ###' ## ## My recomendation: do not allow <diagnostic data> in output_prefix format.
All subsequent test lines are prefixed with the <prefix string>.
Indentation for "Nested tests" follows <prefix string>. The indentation does NOT precede <prefix string>.
"Diagnostic lines" are prefixed with the <prefix string>.
"Unknown lines" may optionally be prefixed with the <prefix string>, but are not required to be prefixed with the <prefix string>. It is allowed for some "Unknown lines" to not be prefixed with the <prefix string>, even if one or more other "Unknown lines" are prefixed with the <prefix string>.
#### Does prefixing begin immediately (even for a subsequent "Configuration #### info line" or does it begin with the test 1 "Test case result line" #### line? #### #### This question might be simplified if the output_prefix line is #### required to be the last "Configuration info line". (Or maybe even if #### required to be the first "Configuration info line".
================================================================================ #### discussion notes:
PRO: Test output processing programs can detect the value of <prefix string> more deterministically than Alternative 1.
CON: More complex than Alternative 1. (But alternative 2b is less complex than Alternative 2.)
================================================================================
On Wed, May 11, 2022 at 10:59 PM Frank Rowand frowand.list@gmail.com wrote:
In the middle of the "RFC - kernel test result specification (KTAP)" thread, started in August 2021, Tim Bird made a suggestion to allow a prefix to the KTAP data format:
Just as a side note, in some Fuego tests, it was very useful to include an identifier in thethe prefix nested tests. The output looked like this:
TAP version 13 1..2 [batch_id 4] TAP version 13 [batch_id 4] 1..2 [batch_id 4] ok 1 - cyclictest with 1000 cycles [batch_id 4] # problem setting CLOCK_REALTIME [batch_id 4] not ok 2 - cyclictest with CLOCK_REALTIME not ok 1 - check realtime [batch_id 4] TAP version 13 [batch_id 4] 1..1 [batch_id 4] ok 1 - IOZone read/write 4k blocks ok 2 - check I/O performance
Can I propose that the prefix not be fixed by the spec, but that the spec indicates that whatever the prefix is on the TAP version line, that prefix must be used with the output for all lines from the test (with the exception of unknown lines)?
Just chiming in since I didn't see it mentioned after a quick skim of the original thread:
This is already basically the behavior of kunit.py's TAP parser since commit afc63da64f1e5e41875c98707020e85050f8a0c5 Author: Heidi Fahim heidifahim@google.com Date: Mon Mar 16 13:21:24 2020 -0700
kunit: kunit_parser: make parser more robust
Previously, kunit_parser did not properly handle kunit TAP output that - had any prefixes (generated from different configs e.g. CONFIG_PRINTK_TIME) ...
The notable difference is that only the prefix _length_ is fixed, not the contents of the string itself.
So ignoring a dynamic prefix is a practical necessity if we want to parse TAP from kernelspace/printk across a range of configs. But I don't know if this dynamic version is worth including in the spec. The static prefix makes more sense to me to formalize, and if we go down that route, at least kunit.py will already be compliant :)
Daniel
linux-kselftest-mirror@lists.linaro.org