On Mon, Aug 30, 2021 at 10:34:14AM -0700, Brendan Higgins wrote:
On Sat, Aug 28, 2021 at 1:20 AM Kees Cook keescook@chromium.org wrote:
Please keep me in CC for these kinds of proposals. And thanks for looking
Oh sorry, we will make sure to do that in the future. I figured most people who were interested who weren't directly CC'ed were on the kselftest list.
I mainly use mailing list member as a way to have thread history available. I have too many subscriptions to read them all. :)
is that we already _have_ a specification, and it's already being parsed
We do?! Where is it? Can you link it here?
I am, perhaps, too glib. What I mean is that we have an _implementation_, and we cannot create a specification that doesn't match the existing implementation without very good reason.
by machines, so making changes without strong justification is going to be met with resistance. :) So, with that in mind, on to my reply...
Of course, but if there is no specification or it is not being universally followed, then changes will need to be made, no?
Sure, but I'm being pragmatic: the implementation is already operating. We can't pretend it doesn't exist. :)
The fundamental purpose of the kernel's TAP is to have many independent tests runnable and parseable, specifically without any embedded framework knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses (kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well as several hand-rolled instances in Shell, Python, and C. There used to be more, but I have been steadily fixing their syntax[2] and merging[3] separate implementations for a while now.
Yep! And I believe there are still some inconsistencies - hence part of the motivation.
Agreed -- I'm happy to create a specification around what we _have_ (and fix any existing mistakes). But the proposed spec very much did not match what exists.
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing parsers.
"TAP version Linux.1" would not break existing parsers? If so, that would be great, I just expect that it would break them too.
https://metacpan.org/dist/Test-Harness/source/lib/TAP/Parser.pm and https://github.com/python-tap/tappy/blob/master/tap/parser.py won't like anything non-numeric.
But neither have any idea about "#" nor nested tests either. (Those considerations are entirely from the ability to construct automated parsing of test-separated logs.)
LAVA doesn't parse the TAP line at all, as it chunks by known test names, doing a simple mapping of name to outcome for the LAVA results: https://github.com/Linaro/test-definitions/blob/master/automated/linux/kself...
So, to that end, I guess "KTAP version 1" is fine.
Plan lines
Plan lines must follow the format of "1..N" where N is the number of subtests. The second line of KTAP output must be a plan line, which indicates the number of tests at the highest level, such that the tests do not have a parent. Also, in the instance of a test having subtests, the second line of the test after the subtest header must be a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested tests".
Why not? I got the impression on the previous thread that everyone was amenable to this?
Because it breaks kselftests's agnosticism. More details on this below...
The directive is a single word -- currently only "SKIP" is recognized by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and error. (Though I would love to hear what "error" is intended to be used for, if different from regular "not ok".)
Can you explain (or link to documentation) on what these other directives mean?
I can link to code. :)
SKIP: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
TIMEOUT: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
XFAIL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
error: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
xpass (not actually a directive, and nothing uses it, I say we remove it): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
"exit=..." (improperly formed directive -- this should probably be changed into a preceding diagnostic line) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
Besides the two kselftest TAP harnesses, there are a few open-coded TAP producers:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool... (This last one has a very confused idea about where to put the plan line.)
Diagnostic lines
Diagnostic lines are used for description of testing operations. Diagnostic lines are generally formatted as "# <diagnostic_description>", where the description can be any string. However, in practice, diagnostic lines are all lines that don't follow the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines" and "unknown lines". For example, if kselftest is running on a console, the dmesg output may be intermixed, and possibly temporally offset. Such lines may not be useful, and may not be strictly correlated, where-as the output from kselftest is at least self-consistent.
Interesting. I think David probably would have some thoughts here.
The intention with the current implementation is to be as robust as possible against other lines of output existing, since kselftest output ends up in weird places fairly regularly.
# FAILED: 85 / 87 tests passed. # Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
Cool, any ideas on what we should do then? I personally don't have strong feelings on it.
For this? I say leave it diagnostic. These are mainly for humans to see the results of a specific (usually long high-output) test run.
Also, parsers should treat a new "TAP version..." line at the same nesting level as indication that the prior test has ended (and any tests without result lines should be considered failed).
OK, that is definitely different. I think this will require some further discussion.
This is mostly about making good parsers. Parsers should be able to understand enough context to gracefully recover from unexpected output.
Please no. There is no reason to force a nested test to suddenly have to know about its test execution depth/environment. A subtest is just a nested TAP result. That it is nested is only meaningful to the parser, not the executing test, and it must have the same output, nested or not. (e.g. running the test alone or running the test under a full kselftest run, the only difference is the indentation level.)
Well it doesn't have to, right? The parent test can include the directive. In anycase, something has to know enough to print a test plan.
What happens in kselftest is that the kselftest harness has no idea what is or isn't going to produce TAP output. That is an intentional design goal of kselftest: authors should be able to drop a tool into the tree that returns 0 for "success", and 1 for "failure". (The full set of return codes is actually: "0" for "success", "4" for "skip", "124" for "timeout", and anything else for failure.) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
The runner can see how many binaries it expects to run in a given directory, so it can build TAP output (with a plan line), for a given directory. However, some of those tests have tons of output and effectively have subtests -- but those binaries are _standalone_, in the sense that they produce TAP output without any knowledge of their surroundings. As such, the kselftest runner must indent them to make the results parsable.
Having different output when in one harness or another has and will lead to pain in comparing the results. So, a test run outside of kselftest will be (should be) identical to running inside kselftest, but with indentation.
We can _call_ this a subtest, but it's just nested. A subtest should not need to know it is a subtest.
If nesting level is lost, a parser will understand the nested test to have ended, but probably so did the test runner:
TAP version 13 1..1 TAP version 13 1..2 TAP version 13 1..3 ok 1 - test_1 ok 2 test_2 not ok 1 - sub test unfinished plan ok 2 - test not ok 1 - test_suite
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
KUnit cannot support indentation being required. We print test results to dmesg. At anytime something could get printed to dmesg that could potentially mess up indentation.
Can you explain this? KUnit controls what it prints, yes? Anything outside of the TAP output is an "unknown line". This is why I used "#" for indentation: it is distinguishable from "unknown (dmesg) lines".
The problem that was recognized, as I understand, was that there are multiple "interpretations" of TAP floating around the kernel and that goes against the original point of trying to standardize around TAP.
Right -- this is the part I disagree with. There is 1 interpretation, and a couple mistakes in the various kernel implementations. Let's get that documented first.
I know KUnit's usage is pretty minor in comparison to kselftest, but people do use it and there is no point in us, KUnit, purporting to use TAP and remain compatible with any particular version of it if it is incompatible with kselftest's TAP.
I have tried to keep the kernel's TAP output parsable by TAP 13 consumers. I don't think we have much of a fork currently (the diagnostic lines are a simple addition, and the nesting is trivially extractable with no additional knowledge beyond recognizing the "TAP" line within diagnostic lines).
Additionally, there is no way that we are going to be able to stay on a compatible implementation of TAP unless we specify what TAP is separate from the implementation.
My goals are:
- have kernel test output (and nest test output) parsable by existing tools, with a strong desire to not break LAVA (though arguably LAVA isn't a TAP parser).
- have kernel test output distinguishable from unrelated dmesg output since kselftest is very frequently collected from the console.
- not needlessly break existing tooling without showing a meaningful benefit. (i.e. flag days in parsing under LAVA has been difficult, and I want to minimize those.)
- be able to map meaningful test execution state to a common set of states ("ok" and "not ok" are rarely sufficient), which includes timeouts, expected failures (e.g. missing CONFIG_*s, wrong arch, etc).
- be able to include diagnostic runtime output _in the test result_, because too frequently the binary yes/no of a test is way to far away from the actual test details, which make debugging extremely difficult.