On 17 October 2017 at 16:08, Greg KH <gregkh@google.com> wrote:
On Tue, Oct 17, 2017 at 09:02:18AM -0500, Dan Rue wrote:
> What I would like to see, and I don't know if it is even possible, is
> something that actually measures test coverage based on code paths in
> the linux kernel so that we have a means to actually measure our
> effectiveness. If we knew we were testing 43% (number pulled out of thin
> air) of the linux kernel code paths, then we would know what areas to
> focus on to bring that number up, and we would know which subsystems to
> have some confidence in, and which are uncovered.

Please read:
        http://blog.ploeh.dk/2015/11/16/code-coverage-is-a-useless-target-measure/

I worked with a team of developers over a decade ago trying to help with
code-coverage analysis of the Linux kernel (many of those tests ended up
in LTP).  I'm pretty sure the ability is still there, but it turned out,
in the end, that it means nothing at all.

Heck, even when you turn on fun things like "fail kmalloc() X% of the
time to exercise error paths", you still don't really test the overall
system.

So please, never think in terms of code coverage, but feature coverage,
like what LTP is trying to accomplish, is a great metric to strive for.


It depends a lot on the focus of the upstream team, but this approach is reflected in the in-build tests and install time tests of various userspace projects. However, package based tests alone are not a good way to test the complete system. Equally neither are full system tests necessarily - it is all too easy to generate a test suite of hundreds of thousands of results which becomes all but impossible to debug when something subtle goes wrong but which the test suite doesn't explicitly check. A targeted package-based or feature specific test would identify the problem much more quickly.

It needs to be a layered approach combining small and large tests, package-based and system-based, which returns to Dan's original point that we need to iterate to get to a stable platform and then step up with wider tests. Userspace still has an effect on kernel support, especially at the level of init, as we found with the systemd getty race condition issue. A wider range of devices and a wider range of userspace software (including an extra distribution at a point in the future) helps in the triage. The systemd issue was first spotted on about 3% of jobs on x86_64 but it wasn't until it was reproducible on 50% of test jobs on the X15 that it became clear that this wasn't a kernel or hardware issue.

Reproducible bugs can be easy - intermittent bugs need wider and repetitive testing and rapidly become rabbit holes which devour engineering time. I would like to see the term "coverage" including this wider, more varied, support which becomes essential with the more difficult bugs.

What we will also need is a map of which tests are stressing which features - so a sane metric for feature coverage inside and outside the kernel would be needed here.

--