Re: RFC - kernel test result specification (KTAP)

30 Aug 2021


      On Mon, Aug 30, 2021 at 10:34:14AM -0700, Brendan Higgins wrote:
...
On Sat, Aug 28, 2021 at 1:20 AM Kees Cook keescook@chromium.org wrote:
...
Please keep me in CC for these kinds of proposals. And thanks for looking
Oh sorry, we will make sure to do that in the future. I figured most
people who were interested who weren't directly CC'ed were on the
kselftest list.
I mainly use mailing list member as a way to have thread history
available. I have too many subscriptions to read them all. :)
...
...
is that we already _have_ a specification, and it's already being parsed
We do?! Where is it? Can you link it here?
I am, perhaps, too glib. What I mean is that we have an
_implementation_, and we cannot create a specification that doesn't
match the existing implementation without very good reason.
...
...
by machines, so making changes without strong justification is going to
be met with resistance. :) So, with that in mind, on to my reply...
Of course, but if there is no specification or it is not being
universally followed, then changes will need to be made, no?
Sure, but I'm being pragmatic: the implementation is already operating.
We can't pretend it doesn't exist. :)
...
...
The fundamental purpose of the kernel's TAP is to have many independent
tests runnable and parseable, specifically without any embedded framework
knowledge (or, even, any knowledge of TAP).
The tests run by kselftest come from 2 different TAP-producing harnesses
(kselftest.h for C, kselftest/runner.sh for TAP-agnostic tests) as well
as several hand-rolled instances in Shell, Python, and C. There used to
be more, but I have been steadily fixing their syntax[2] and merging[3]
separate implementations for a while now.
Yep! And I believe there are still some inconsistencies - hence part
of the motivation.
Agreed -- I'm happy to create a specification around what we _have_ (and
fix any existing mistakes). But the proposed spec very much did not
match what exists.
...
...
Maybe "TAP version Linux.1" ? I don't want to needlessly break existing
parsers.
"TAP version Linux.1" would not break existing parsers? If so, that
would be great, I just expect that it would break them too.
https://metacpan.org/dist/Test-Harness/source/lib/TAP/Parser.pm
and
https://github.com/python-tap/tappy/blob/master/tap/parser.py
won't like anything non-numeric.
But neither have any idea about "#" nor nested tests either. (Those
considerations are entirely from the ability to construct automated
parsing of test-separated logs.)
LAVA doesn't parse the TAP line at all, as it chunks by known test
names, doing a simple mapping of name to outcome for the LAVA results:
https://github.com/Linaro/test-definitions/blob/master/automated/linux/kself...
So, to that end, I guess "KTAP version 1" is fine.
...
...
...
Plan lines
Plan lines must follow the format of "1..N" where N is the number of
subtests. The second line of KTAP output must be a plan line, which
indicates the number of tests at the highest level, such that the
tests do not have a parent. Also, in the instance of a test having
subtests, the second line of the test after the subtest header must be
a plan line which indicates the number of subtests within that test.
I do not want "subtests" as a specification concept, only "nested
tests".
Why not? I got the impression on the previous thread that everyone was
amenable to this?
Because it breaks kselftests's agnosticism. More details on this
below...
...
...
The directive is a single word -- currently only "SKIP" is recognized
by TAP 14(?), but kselftest uses also XFAIL, XPASS, TIMEOUT, and
error. (Though I would love to hear what "error" is intended to be used
for, if different from regular "not ok".)
Can you explain (or link to documentation) on what these other directives mean?
I can link to code. :)
SKIP:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
TIMEOUT:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
XFAIL:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
xpass (not actually a directive, and nothing uses it, I say we remove
it):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
"exit=..." (improperly formed directive -- this should probably be
changed into a preceding diagnostic line)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
Besides the two kselftest TAP harnesses, there are a few open-coded
TAP producers:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
(This last one has a very confused idea about where to put the plan line.)
...
...
...
Diagnostic lines
Diagnostic lines are used for description of testing operations.
Diagnostic lines are generally formatted as "#
<diagnostic_description>", where the description can be any string.
However, in practice, diagnostic lines are all lines that don't follow
the format of any other KTAP line format.
I still think there should be a distinction between "diagnostic lines"
and "unknown lines". For example, if kselftest is running on a console,
the dmesg output may be intermixed, and possibly temporally offset. Such
lines may not be useful, and may not be strictly correlated, where-as
the output from kselftest is at least self-consistent.
Interesting. I think David probably would have some thoughts here.
The intention with the current implementation is to be as robust as
possible against other lines of output existing, since kselftest output
ends up in weird places fairly regularly.
...
...
# FAILED: 85 / 87 tests passed.
# Totals: pass:83 fail:2 xfail:0 xpass:0 skip:2 error:0
Cool, any ideas on what we should do then? I personally don't have
strong feelings on it.
For this? I say leave it diagnostic. These are mainly for humans to see
the results of a specific (usually long high-output) test run.
...
...
Also, parsers should treat a new "TAP version..." line at the same
nesting level as indication that the prior test has ended (and any tests
without result lines should be considered failed).
OK, that is definitely different. I think this will require some
further discussion.
This is mostly about making good parsers. Parsers should be able to
understand enough context to gracefully recover from unexpected output.
...
...
Please no. There is no reason to force a nested test to suddenly have
to know about its test execution depth/environment. A subtest is just a
nested TAP result. That it is nested is only meaningful to the parser, not
the executing test, and it must have the same output, nested or not. (e.g.
running the test alone or running the test under a full kselftest run,
the only difference is the indentation level.)
Well it doesn't have to, right? The parent test can include the
directive. In anycase, something has to know enough to print a test
plan.
What happens in kselftest is that the kselftest harness has no idea
what is or isn't going to produce TAP output. That is an intentional
design goal of kselftest: authors should be able to drop a tool into
the tree that returns 0 for "success", and 1 for "failure". (The full
set of return codes is actually: "0" for "success", "4" for "skip",
"124" for "timeout", and anything else for failure.)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
The runner can see how many binaries it expects to run in a given
directory, so it can build TAP output (with a plan line), for a
given directory. However, some of those tests have tons of output and
effectively have subtests -- but those binaries are _standalone_, in
the sense that they produce TAP output without any knowledge of their
surroundings. As such, the kselftest runner must indent them to make
the results parsable.
Having different output when in one harness or another has and will lead
to pain in comparing the results. So, a test run outside of kselftest
will be (should be) identical to running inside kselftest, but with
indentation.
We can _call_ this a subtest, but it's just nested. A subtest should not
need to know it is a subtest.
...
...
If nesting level is lost, a parser will understand the nested test to
have ended, but probably so did the test runner:
TAP version 13
1..1
    TAP version 13
    1..2
        TAP version 13
        1..3
        ok 1 - test_1
        ok 2 test_2
    not ok 1 - sub test unfinished plan
    ok 2 - test
not ok 1 - test_suite
...
Lastly, indentation is also recommended for improved readability.
Indentation is _required_. :)
KUnit cannot support indentation being required. We print test results
to dmesg. At anytime something could get printed to dmesg that could
potentially mess up indentation.
Can you explain this? KUnit controls what it prints, yes? Anything
outside of the TAP output is an "unknown line". This is why I used "#"
for indentation: it is distinguishable from "unknown (dmesg) lines".
...
The problem that was recognized, as I understand, was that there are
multiple "interpretations" of TAP floating around the kernel and that
goes against the original point of trying to standardize around TAP.
Right -- this is the part I disagree with. There is 1 interpretation,
and a couple mistakes in the various kernel implementations. Let's get
that documented first.
...
I know KUnit's usage is pretty minor in comparison to kselftest, but
people do use it and there is no point in us, KUnit, purporting to use
TAP and remain compatible with any particular version of it if it is
incompatible with kselftest's TAP.
I have tried to keep the kernel's TAP output parsable by TAP 13
consumers. I don't think we have much of a fork currently (the
diagnostic lines are a simple addition, and the nesting is trivially
extractable with no additional knowledge beyond recognizing the "TAP"
line within diagnostic lines).
...
Additionally, there is no way that we are going to be able to stay on
a compatible implementation of TAP unless we specify what TAP is
separate from the implementation.
My goals are:
- have kernel test output (and nest test output) parsable by existing
  tools, with a strong desire to not break LAVA (though arguably LAVA
  isn't a TAP parser).
- have kernel test output distinguishable from unrelated dmesg output
  since kselftest is very frequently collected from the console.
- not needlessly break existing tooling without showing a meaningful
  benefit. (i.e. flag days in parsing under LAVA has been difficult,
  and I want to minimize those.)
- be able to map meaningful test execution state to a common set of
  states ("ok" and "not ok" are rarely sufficient), which includes
  timeouts, expected failures (e.g. missing CONFIG_*s, wrong arch,
  etc).
- be able to include diagnostic runtime output _in the test result_,
  because too frequently the binary yes/no of a test is way to far away
  from the actual test details, which make debugging extremely
  difficult.
-- 
Kees Cook

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: RFC - kernel test result specification (KTAP)

Plan lines

Diagnostic lines